Parallel Append implementation

Started by Amit Khandekarabout 9 years ago147 messages

amitdkhan.pg@gmail.com

about 9 years ago

2 attachment(s)

Currently an Append plan node does not execute its subplans in
parallel. There is no distribution of workers across its subplans. The
second subplan starts running only after the first subplan finishes,
although the individual subplans may be running parallel scans.

Secondly, we create a partial Append path for an appendrel, but we do
that only if all of its member subpaths are partial paths. If one or
more of the subplans is a non-parallel path, there will be only a
non-parallel Append. So whatever node is sitting on top of Append is
not going to do a parallel plan; for example, a select count(*) won't
divide it into partial aggregates if the underlying Append is not
partial.

The attached patch removes both of the above restrictions. There has
already been a mail thread [1]Old mail thread : /messages/by-id/9A28C8860F777E439AA12E8AEA7694F80115DEB8@BPXM15GP.gisp.nec.co.jp that discusses an approach suggested by
Robert Haas for implementing this feature. This patch uses this same
approach.

Attached is pgbench_create_partition.sql (derived from the one
included in the above thread) that distributes pgbench_accounts table
data into 3 partitions pgbench_account_[1-3]. The below queries use
this schema.

Consider a query such as :
select count(*) from pgbench_accounts;

Now suppose, these two partitions do not allow parallel scan :
alter table pgbench_accounts_1 set (parallel_workers=0);
alter table pgbench_accounts_2 set (parallel_workers=0);

On HEAD, due to some of the partitions having non-parallel scans, the
whole Append would be a sequential scan :

Aggregate
-> Append
-> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts
-> Seq Scan on pgbench_accounts_1
-> Seq Scan on pgbench_accounts_2
-> Seq Scan on pgbench_accounts_3

Whereas, with the patch, the Append looks like this :

Finalize Aggregate
-> Gather
Workers Planned: 6
-> Partial Aggregate
-> Parallel Append
-> Parallel Seq Scan on pgbench_accounts
-> Seq Scan on pgbench_accounts_1
-> Seq Scan on pgbench_accounts_2
-> Parallel Seq Scan on pgbench_accounts_3

Above, Parallel Append is generated, and it executes all these
subplans in parallel, with 1 worker executing each of the sequential
scans, and multiple workers executing each of the parallel subplans.

======= Implementation details ========

------- Adding parallel-awareness -------

In a given worker, this Append plan node will be executing just like
the usual partial Append node. It will run a subplan until completion.
The subplan may or may not be a partial parallel-aware plan like
parallelScan. After the subplan is done, Append will choose the next
subplan. It is here where it will be different than the current
partial Append plan: it is parallel-aware. The Append nodes in the
workers will be aware that there are other Append nodes running in
parallel. The partial Append will have to coordinate with other Append
nodes while choosing the next subplan.

------- Distribution of workers --------

The coordination info is stored in a shared array, each element of
which describes the per-subplan info. This info contains the number of
workers currently executing the subplan, and the maximum number of
workers that should be executing it at the same time. For non-partial
sublans, max workers would always be 1. For choosing the next subplan,
the Append executor will sequentially iterate over the array to find a
subplan having the least number of workers currently being executed,
AND which is not already being executed by the maximum number of
workers assigned for the subplan. Once it gets one, it increments
current_workers, and releases the Spinlock, so that other workers can
choose their next subplan if they are waiting.

This way, workers would be fairly distributed across subplans.

The shared array needs to be initialized and made available to
workers. For this, we can do exactly what sequential scan does for
being parallel-aware : Using function ExecAppendInitializeDSM()
similar to ExecSeqScanInitializeDSM() in the backend to allocate the
array. Similarly, for workers, have ExecAppendInitializeWorker() to
retrieve the shared array.

-------- Generating Partial Append plan having non-partial subplans --------

In set_append_rel_pathlist(), while generating a partial path for
Append, also include the non-partial child subpaths, besides the
partial subpaths. This way, it can contain a mix of partial and
non-partial children paths. But for a given child, its path would be
either the cheapest partial path or the cheapest non-partial path.

For a non-partial child path, it will only be included if it is
parallel-safe. If there is no parallel-safe path, a partial Append
path would not be generated. This behaviour also automatically
prevents paths that have a Gather node beneath.

Finally when it comes to create a partial append path using these
child paths, we also need to store a bitmap set indicating which of
the child paths are non-partial paths. For this, have a new BitmapSet
field : Append->partial_subplans. At execution time, this will be used
to set the maximum workers for a non-partial subpath to 1.

-------- Costing -------

For calculating per-worker parallel Append path cost, it first
calculates a total of child subplan costs considering all of their
workers, and then divides it by the Append node's parallel_divisor,
similar to how parallel scan uses this "parallel_divisor".

For startup cost, it is assumed that Append would start returning
tuples when the child node having the lowest startup cost is done
setting up. So Append startup cost is equal to startup cost of the
child with minimum startup cost.

-------- Scope --------

There are two different code paths where Append path is generated.
1. One is where append rel is generated : Inheritance table, and UNION
ALL clause.
2. Second codepath is in prepunion.c. This gets executed for UNION
(without ALL) and INTERSECT/EXCEPT [ALL]. The patch does not support
Parallel Append in this scenario. It can be later taken up as
extension, once this patch is reviewed.

======= Performance =======

There is a clear benefit in case of ParallelAppend in scenarios where
one or more subplans don't have partial paths, because in such cases,
on HEAD it does not generate Partial Append. For example, the below
query took around 30 secs with the patch
(max_parallel_workers_per_gather should be 3 or more), whereas, it
took 74 secs on HEAD.

explain analyze select avg(aid) from (
select aid from pgbench_accounts_1 inner join bid_tab b using (bid)
UNION ALL
select aid from pgbench_accounts_2 inner join bid_tab using (bid)
UNION ALL
select aid from pgbench_accounts_3 inner join bid_tab using (bid)
) p;

--- With HEAD ---

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=6415493.67..6415493.67 rows=1 width=32) (actual
time=74135.821..74135.822 rows=1 loops=1)
-> Append (cost=1541552.36..6390743.54 rows=9900047 width=4)
(actual time=73829.985..74125.048 rows=100000 loops=1)
-> Hash Join (cost=1541552.36..2097249.67 rows=3300039
width=4) (actual time=25758.592..25758.592 rows=0 loops=1)
Hash Cond: (pgbench_accounts_1.bid = b.bid)
-> Seq Scan on pgbench_accounts_1
(cost=0.00..87099.39 rows=3300039 width=8) (actual time=0.118..778.097
rows=3300000 loops=1)
-> Hash (cost=721239.16..721239.16 rows=50000016
width=4) (actual time=24426.433..24426.433 rows=49999902 loops=1)
Buckets: 131072 Batches: 1024 Memory Usage: 2744kB
-> Seq Scan on bid_tab b (cost=0.00..721239.16
rows=50000016 width=4) (actual time=0.105..10112.995 rows=49999902
loops=1)
-> Hash Join (cost=1541552.36..2097249.67 rows=3300039
width=4) (actual time=24063.761..24063.761 rows=0 loops=1)
Hash Cond: (pgbench_accounts_2.bid = bid_tab.bid)
-> Seq Scan on pgbench_accounts_2
(cost=0.00..87099.39 rows=3300039 width=8) (actual time=0.065..779.498
rows=3300000 loops=1)
-> Hash (cost=721239.16..721239.16 rows=50000016
width=4) (actual time=22708.377..22708.377 rows=49999902 loops=1)
Buckets: 131072 Batches: 1024 Memory Usage: 2744kB
-> Seq Scan on bid_tab (cost=0.00..721239.16
rows=50000016 width=4) (actual time=0.024..9513.032 rows=49999902
loops=1)
-> Hash Join (cost=1541552.36..2097243.73 rows=3299969
width=4) (actual time=24007.628..24297.067 rows=100000 loops=1)
Hash Cond: (pgbench_accounts_3.bid = bid_tab_1.bid)
-> Seq Scan on pgbench_accounts_3
(cost=0.00..87098.69 rows=3299969 width=8) (actual time=0.049..782.230
rows=3300000 loops=1)
-> Hash (cost=721239.16..721239.16 rows=50000016
width=4) (actual time=22943.413..22943.413 rows=49999902 loops=1)
Buckets: 131072 Batches: 1024 Memory Usage: 2744kB
-> Seq Scan on bid_tab bid_tab_1
(cost=0.00..721239.16 rows=50000016 width=4) (actual
time=0.022..9601.753 rows=49999902 loops=1)
Planning time: 0.366 ms
Execution time: 74138.043 ms
(22 rows)

--- With Patch ---

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2139493.66..2139493.67 rows=1 width=32)
(actual time=29658.825..29658.825 rows=1 loops=1)
-> Gather (cost=2139493.34..2139493.65 rows=3 width=32) (actual
time=29568.957..29658.804 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
-> Partial Aggregate (cost=2138493.34..2138493.35 rows=1
width=32) (actual time=22086.324..22086.325 rows=1 loops=4)
-> Parallel Append (cost=0.00..2130243.42
rows=3299969 width=4) (actual time=22008.945..22083.536 rows=25000
loops=4)
-> Hash Join (cost=1541552.36..2097243.73
rows=3299969 width=4) (actual time=29568.605..29568.605 rows=0
loops=1)
Hash Cond: (pgbench_accounts_1.bid = b.bid)
-> Seq Scan on pgbench_accounts_1
(cost=0.00..87098.69 rows=3299969 width=8) (actual time=0.024..841.598
rows=3300000 loops=1)
-> Hash (cost=721239.16..721239.16
rows=50000016 width=4) (actual time=28134.596..28134.596 rows=49999902
loops=1)
Buckets: 131072 Batches: 1024
Memory Usage: 2744kB
-> Seq Scan on bid_tab b
(cost=0.00..721239.16 rows=50000016 width=4) (actual
time=0.076..11598.097 rows=49999902 loops=1)
-> Hash Join (cost=1541552.36..2097243.73
rows=3299969 width=4) (actual time=29127.085..29127.085 rows=0
loops=1)
Hash Cond: (pgbench_accounts_2.bid = bid_tab.bid)
-> Seq Scan on pgbench_accounts_2
(cost=0.00..87098.69 rows=3299969 width=8) (actual time=0.022..837.027
rows=3300000 loops=1)
-> Hash (cost=721239.16..721239.16
rows=50000016 width=4) (actual time=27658.276..27658.276 rows=49999902
loops=1)
-> Seq Scan on bid_tab
(cost=0.00..721239.16 rows=50000016 width=4) (actual
time=0.022..11561.530 rows=49999902 loops=1)
-> Hash Join (cost=1541552.36..2097243.73
rows=3299969 width=4) (actual time=29340.081..29632.180 rows=100000
loops=1)
Hash Cond: (pgbench_accounts_3.bid = bid_tab_1.bid)
-> Seq Scan on pgbench_accounts_3
(cost=0.00..87098.69 rows=3299969 width=8) (actual time=0.027..821.875
rows=3300000 loops=1)
-> Hash (cost=721239.16..721239.16
rows=50000016 width=4) (actual time=28186.009..28186.009 rows=49999902
loops=1)
-> Seq Scan on bid_tab bid_tab_1
(cost=0.00..721239.16 rows=50000016 width=4) (actual
time=0.019..11594.461 rows=49999902 loops=1)
Planning time: 0.493 ms
Execution time: 29662.791 ms
(24 rows)

Thanks to Robert Haas and Rushabh Lathia for their valuable inputs
while working on this feature.

[1]: Old mail thread : /messages/by-id/9A28C8860F777E439AA12E8AEA7694F80115DEB8@BPXM15GP.gisp.nec.co.jp
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F80115DEB8@BPXM15GP.gisp.nec.co.jp

Thanks
-Amit Khandekar

Attachments:

ParallelAppend.patchapplication/octet-stream; name=ParallelAppend.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 8a6f844..ad9ad92 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeSeqscan.h"
@@ -199,6 +200,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -247,6 +252,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecSeqScanInitializeDSM((SeqScanState *) planstate,
 										 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
@@ -724,6 +733,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 			case T_SeqScanState:
 				ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a26bd63..a6d2d63 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct
+{
+	int		pa_num_workers;		/* workers currently executing the subplan */
+	int		pa_max_workers;		/* max workers that should run the subplan */
+}	parallel_append_info;
+
+struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
+};
+
+typedef struct ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * For Parallel Append, AppendState->as_whichplan can have PA_INVALID_PLAN
+ * value, which indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+
+static void exec_append_scan_first(AppendState *appendstate);
 static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
 
+static void exec_append_scan_first(AppendState *appendstate)
+{
+	appendstate->as_whichplan = 0;
+}
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -77,6 +115,22 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+		return parallel_append_next(appendstate);
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -178,8 +232,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	/*
 	 * initialize to scan first subplan
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	exec_append_scan_first(appendstate);
 
 	return appendstate;
 }
@@ -198,6 +251,14 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		/* Check if we are already finished plans from parallel append */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : all plans already finished",
+						 MyProcPid);
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
@@ -219,14 +280,17 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We have got NULL. There might be other workers still processing the
+		 * last chunk of rows for this same node, but there's no point for new
+		 * workers to run this node, so mark this node as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -270,6 +334,7 @@ ExecReScanAppend(AppendState *node)
 	for (i = 0; i < node->as_nplans; i++)
 	{
 		PlanState  *subnode = node->appendplans[i];
+		ParallelAppendDesc padesc = node->as_padesc;
 
 		/*
 		 * ExecReScan doesn't know about my subplans, so I have to do
@@ -284,7 +349,233 @@ ExecReScanAppend(AppendState *node)
 		 */
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
+
+		if (padesc)
+		{
+			/*
+			 * Just setting all the number of workers to 0 is enough. The logic
+			 * of choosing the next plan will take care of everything else.
+			 * pa_max_workers is already set initially.
+			 */
+			padesc->pa_info[i].pa_num_workers = 0;
+		}
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	exec_append_scan_first(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_info),
+				 sizeof(*node->as_padesc->pa_info) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	int		i;
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	for (i = 0; i < node->as_nplans; i++)
+	{
+		/*
+		 * Just setting all the number of workers to 0 is enough. The logic
+		 * of choosing the next plan in workers will take care of everything
+		 * else.
+		 */
+		padesc->pa_info[i].pa_num_workers = 0;
+
+		/* Is this a partial subplan ? */
+		if (bms_is_member(i, ((Append*)node->ps.plan)->partial_subplans))
+		{
+			/*
+			 * We are distributing workers equally among subplans. So, just set
+			 * the max_workers to maximum possible value.
+			 */
+			padesc->pa_info[i].pa_max_workers =
+				max_parallel_workers_per_gather;
+		}
+		else
+		{
+			/*
+			 * Non-partial plan essentially needs to be run by one and only
+			 * one worker.
+			 */
+			padesc->pa_info[i].pa_max_workers = 1;
+		}
+	}
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ * Indicate that this child plan node is about to be finished, so no other
+ * workers should take up this node. Workers who are already executing this
+ * node will continue to do so, but workers looking for next nodes to pick
+ * up would skip this node after this function is called. It is possible that
+ * multiple workers call this function for the same node at the same time,
+ * because these workers were executing the same node and they finished with
+ * it at the same time. The spinlock is not for this purpose. The spinlock is
+ * used so that it does not change the pa_num_workers field while workers are
+ * choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	elog(DEBUG2, "Parallelappend : pid %d : finishing plan %d",
+				 MyProcPid, whichplan);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_info[whichplan].pa_num_workers = -1;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the optimal subplan that should be executed. The logic is to
+ *		choose the subplan that is being executed by the least number of
+ *		workers.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		min_whichplan = PA_INVALID_PLAN;
+	int		min_workers = -1; /* Keep compiler quiet */
+
+	Assert(padesc != NULL);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Choose the plan with the least number of workers */
+	for (whichplan = 0; whichplan < state->as_nplans; whichplan++)
+	{
+		parallel_append_info *painfo = &padesc->pa_info[whichplan];
+
+		/* Ignore plans that are already done processing */
+		if (painfo->pa_num_workers == -1)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d"
+						 " since pa_num_workers is -1",
+						 MyProcPid, whichplan);
+			continue;
+		}
+
+		/* Ignore plans that are already being processed by max_workers */
+		if (painfo->pa_num_workers == painfo->pa_max_workers)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d,"
+						 " since reached max_worker count %d",
+						 MyProcPid, whichplan, painfo->pa_max_workers);
+			continue;
+		}
+
+		/* Keep track of the node with the least workers so far. For the very
+		 * first plan, choose that one as the least-workers node.
+		 */
+		if (min_whichplan == PA_INVALID_PLAN ||
+			painfo->pa_num_workers < min_workers)
+		{
+			min_whichplan = whichplan;
+			min_workers = painfo->pa_num_workers;
+		}
+	}
+
+	/* Increment worker count for the chosen node, if at all we found one. */
+	if (min_whichplan != PA_INVALID_PLAN)
+	{
+		padesc->pa_info[min_whichplan].pa_num_workers++;
+	}
+
+	/*
+	 * Save the chosen plan index. It can be PA_INVALID_PLAN, which means we
+	 * are done with all nodes (Note : this meaning applies only to *parallel*
+	 * append).
+	 */
+	state->as_whichplan = min_whichplan;
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+	SpinLockRelease(&padesc->pa_mutex);
+	elog(DEBUG2, "ParallelAppend : pid %d : Chosen plan : %d",
+				 MyProcPid, min_whichplan);
+
+	/*
+	 * If we didn't find any node to work on, it means each subplan is either
+	 * finished or has reached it's pa_max_workers. In such case, should this
+	 * worker wait for some subplan to have its worker count drop below its
+	 * pa_max_workers so that it can choose that subplan ? It turns out that
+	 * it's not worth again finding a subplan to work on. Non-partial subplan
+	 * anyway can have only one worker, and that worker will execute it to
+	 * completion. For a partial subplan, if at all it reaches pa_max_workers,
+	 * it's worker count will reduce only when it's workers find that there is
+	 * nothing more to be executed, so there is no point taking up such node if
+	 * it's worker count reduces. In conclusion, just stop executing once we
+	 * don't find nodes to work on. Indicate the same by returning false.
+	 */
+	return (min_whichplan == PA_INVALID_PLAN ? false : true);
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d973225..ee9c640 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -217,6 +217,7 @@ _copyAppend(const Append *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_BITMAPSET_FIELD(partial_subplans);
 	COPY_NODE_FIELD(appendplans);
 
 	return newnode;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7258c03..73f47cc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -358,6 +358,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	_outPlanInfo(str, (const Plan *) node);
 
+	WRITE_BITMAPSET_FIELD(partial_subplans);
 	WRITE_NODE_FIELD(appendplans);
 }
 
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d608530..7f1c2e1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1520,6 +1520,7 @@ _readAppend(void)
 
 	ReadCommonPlan(&local_node->plan);
 
+	READ_BITMAPSET_FIELD(partial_subplans);
 	READ_NODE_FIELD(appendplans);
 
 	READ_DONE();
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9753a26..62eefdb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -98,7 +98,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_append_subpath(List *subpaths, Path *path,
+									   Bitmapset **partial_subpaths_set);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1126,6 +1127,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	Bitmapset  *partial_subpath_set = NULL;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1185,14 +1187,52 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path,
+												 NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
+		{
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-									   linitial(childrel->partial_pathlist));
+									   linitial(childrel->partial_pathlist),
+									   &partial_subpath_set);
+		}
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the first unparameterized, parallel-safe one among the
+			 * child paths.
+			 */
+			Path *parallel_safe_path = NULL;
+			foreach(lcp, childrel->pathlist)
+			{
+				Path *child_path = (Path *) lfirst(lcp);
+				if (child_path->parallel_safe &&
+					child_path->param_info == NULL)
+				{
+					parallel_safe_path = child_path;
+					break;
+				}
+			}
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_append_subpath(partial_subpaths,
+											  parallel_safe_path, NULL);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1267,7 +1307,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+		add_path(rel, (Path *) create_append_path(rel, subpaths,
+												  NULL, NULL, 0));
 
 	/*
 	 * Consider an append of partial unordered, unparameterized partial paths.
@@ -1278,23 +1319,32 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
+		/* Decide on the number of workers to request for this append path. */
 		foreach(lc, partial_subpaths)
 		{
 			Path	   *path = lfirst(lc);
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
+			/*
+			 * partial_subpaths can have non-partial subpaths so
+			 * path->parallel_workers can be 0. For such paths, allocate one
+			 * worker.
+			 */
+			parallel_workers +=
+				(path->parallel_workers > 0 ? path->parallel_workers : 1);
+			ereport(DEBUG2,
+				(errmsg_internal("added %d more workers for Parallel Append",
+				(path->parallel_workers > 0 ? path->parallel_workers : 1))));
 		}
 		Assert(parallel_workers > 0);
 
+		/* In no case use more than max_parallel_workers_per_gather. */
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers);
+		appendpath = create_append_path(rel, partial_subpaths,
+										partial_subpath_set,
+										NULL, parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1341,12 +1391,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			subpaths = accumulate_append_subpath(subpaths, subpath, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0));
+						  create_append_path(rel, subpaths,
+											 NULL, required_outer, 0));
 	}
 }
 
@@ -1428,9 +1479,11 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				startup_neq_total = true;
 
 			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
+				accumulate_append_subpath(startup_subpaths,
+										  cheapest_startup, NULL);
 			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+				accumulate_append_subpath(total_subpaths,
+										  cheapest_total, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1521,6 +1574,43 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 	return cheapest;
 }
 
+/* concat_append_subpaths
+ * 		helper function for accumulate_append_subpath()
+ *
+ * child_partial_subpaths_set is the bitmap set to indicate which of the
+ * childpaths are partial paths. This is currently non-NULL only in case
+ * the childpaths belong to an Append path.
+ */
+static List *
+concat_append_subpaths(List *append_subpaths, List *childpaths,
+					   Bitmapset **partial_subpaths_set,
+					   Bitmapset *child_partial_subpaths_set)
+{
+	int 	i;
+	int append_subpath_len = list_length(append_subpaths);
+
+	if (partial_subpaths_set)
+	{
+		for (i = 0; i < list_length(childpaths); i++)
+		{
+			/*
+			 * The child paths themselves may or may not be partial paths. The
+			 * bitmapset numbers of these paths will need to be set considering
+			 * that these are to be appended onto the partial_subpaths_set.
+			 */
+			if (!child_partial_subpaths_set ||
+				bms_is_member(i, child_partial_subpaths_set))
+			{
+				*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
+													   append_subpath_len + i);
+			}
+		}
+	}
+
+	/* list_copy is important here to avoid sharing list substructure */
+	return list_concat(append_subpaths, list_copy(childpaths));
+}
+
 /*
  * accumulate_append_subpath
  *		Add a subpath to the list being built for an Append or MergeAppend
@@ -1534,26 +1624,34 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ *
+ * If partial_subpaths_set is not NULL, it means we are building a
+ * partial subpaths list, and so we need to add the path (or its child paths
+ * in case it's Append or MergeAppend) into the partial_subpaths bitmap set.
  */
 static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+accumulate_append_subpath(List *append_subpaths, Path *path,
+						  Bitmapset **partial_subpaths_set)
 {
 	if (IsA(path, AppendPath))
 	{
-		AppendPath *apath = (AppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((AppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  ((AppendPath*)path)->partial_subpaths);
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
-		MergeAppendPath *mpath = (MergeAppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((MergeAppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  NULL);
 	}
 	else
-		return lappend(subpaths, path);
+		return concat_append_subpaths(append_subpaths,
+									  list_make1(path),
+									  partial_subpaths_set,
+									  NULL);
 }
 
 /*
@@ -1576,7 +1674,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 415edad..65dd027 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -238,23 +239,7 @@ cost_seqscan(Path *path, PlannerInfo *root,
 	/* Adjust costing for parallelism, if used. */
 	if (path->parallel_workers > 0)
 	{
-		double		parallel_divisor = path->parallel_workers;
-		double		leader_contribution;
-
-		/*
-		 * Early experience with parallel query suggests that when there is
-		 * only one worker, the leader often makes a very substantial
-		 * contribution to executing the parallel portion of the plan, but as
-		 * more workers are added, it does less and less, because it's busy
-		 * reading tuples from the workers and doing whatever non-parallel
-		 * post-processing is needed.  By the time we reach 4 workers, the
-		 * leader no longer makes a meaningful contribution.  Thus, for now,
-		 * estimate that the leader spends 30% of its time servicing each
-		 * worker, and the remainder executing the parallel plan.
-		 */
-		leader_contribution = 1.0 - (0.3 * path->parallel_workers);
-		if (leader_contribution > 0)
-			parallel_divisor += leader_contribution;
+		double		parallel_divisor;
 
 		/*
 		 * In the case of a parallel plan, the row count needs to represent
@@ -263,6 +248,7 @@ cost_seqscan(Path *path, PlannerInfo *root,
 		 * because they'll anticipate receiving more rows than any given copy
 		 * will actually get.
 		 */
+		parallel_divisor = get_parallel_divisor(path->parallel_workers);
 		path->rows = clamp_row_est(path->rows / parallel_divisor);
 
 		/* The CPU cost is divided among all the workers. */
@@ -391,6 +377,36 @@ cost_gather(GatherPath *path, PlannerInfo *root,
 }
 
 /*
+ * get_parallel_divisor
+ * For given number of parallel workers, return parallel divisor, which
+ * can then be used by the caller to estimate per worker cost or per worker
+ * rows.
+ */
+int
+get_parallel_divisor(int parallel_workers)
+{
+	double	parallel_divisor = parallel_workers;
+	double	leader_contribution;
+
+	/*
+	 * Early experience with parallel query suggests that when there is
+	 * only one worker, the leader often makes a very substantial
+	 * contribution to executing the parallel portion of the plan, but as
+	 * more workers are added, it does less and less, because it's busy
+	 * reading tuples from the workers and doing whatever non-parallel
+	 * post-processing is needed.  By the time we reach 4 workers, the
+	 * leader no longer makes a meaningful contribution.  Thus, for now,
+	 * estimate that the leader spends 30% of its time servicing each
+	 * worker, and the remainder executing the parallel plan.
+	 */
+	leader_contribution = 1.0 - (0.3 * parallel_workers);
+	if (leader_contribution > 0)
+		parallel_divisor += leader_contribution;
+
+	return parallel_divisor;
+}
+
+/*
  * cost_index
  *	  Determines and returns the cost of scanning a relation using an index.
  *
@@ -1570,6 +1586,82 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, Relids required_outer)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int parallel_divisor;
+
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath->parallel_workers);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up.
+			 */
+			path->startup_cost = Min(path->startup_cost,
+												  subpath->startup_cost);
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path->parallel_workers);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 01d4fea..20dbc9d 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1197,7 +1197,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ad49674..94b474f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -193,7 +193,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, Bitmapset *partial_subpaths,
+						   List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1001,7 +1002,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, best_path->partial_subpaths, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -4941,7 +4942,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, Bitmapset *partial_subpaths, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -4951,6 +4952,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->partial_subplans = bms_copy(partial_subpaths);
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41dde50..1bc3ca2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3404,10 +3404,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NULL, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b714783..7169126 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -567,7 +567,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -679,7 +679,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6d3ccfd..7ecce5a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1202,50 +1202,28 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, Bitmapset *partial_subpaths,
+				   Relids required_outer, int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
-	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
+
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
+	pathnode->partial_subpaths = partial_subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
-	foreach(l, subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(l);
-
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
-		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
-
-		/* All child paths must have same parameterization */
-		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
-	}
+	cost_append(&pathnode->path, subpaths, required_outer);
 
 	return pathnode;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a025117..a2bf746 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -894,6 +894,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
+
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 51c381e..1311b9c 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d43ec56..94bbab0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1167,6 +1168,7 @@ typedef struct ModifyTableState
 									/* Per partition tuple conversion map */
 } ModifyTableState;
 
+
 /* ----------------
  *	 AppendState information
  *
@@ -1174,12 +1176,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index e2fbc7d..428ca66 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -208,6 +208,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	Bitmapset  *partial_subplans;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3a1255a..7172861 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1107,6 +1107,7 @@ typedef struct AppendPath
 {
 	Path		path;
 	List	   *subpaths;		/* list of component Paths */
+	Bitmapset  *partial_subpaths;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 2a4df2f..ecda17f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,6 +66,7 @@ extern bool enable_nestloop;
 extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -98,6 +99,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths, Relids required_outer);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
@@ -183,6 +185,7 @@ extern void set_cte_size_estimates(PlannerInfo *root, RelOptInfo *rel,
 					   double cte_rows);
 extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
+extern int get_parallel_divisor(int parallel_workers);
 
 /*
  * prototypes for clausesel.c
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 71d9154..69ddf4c 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -61,8 +62,9 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					   List *subpaths, Bitmapset *partial_subpaths,
+					   Relids required_outer, int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index f06cfa4..858d81b 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -1,18 +1,19 @@
 SELECT name, setting FROM pg_settings WHERE name LIKE 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(11 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(12 rows)
 
 CREATE TABLE foo2(fooid int, f2 int);
 INSERT INTO foo2 VALUES(1, 11);
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 18e21b7..f6c4b41 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -17,9 +17,9 @@ explain (costs off)
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star

pgbench_create_partition.sqlapplication/octet-stream; name=pgbench_create_partition.sqlDownload

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Amit Khandekar (#1)

Re: Parallel Append implementation

On Fri, Dec 23, 2016 at 10:51 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Currently an Append plan node does not execute its subplans in
parallel. There is no distribution of workers across its subplans. The
second subplan starts running only after the first subplan finishes,
although the individual subplans may be running parallel scans.

Secondly, we create a partial Append path for an appendrel, but we do
that only if all of its member subpaths are partial paths. If one or
more of the subplans is a non-parallel path, there will be only a
non-parallel Append. So whatever node is sitting on top of Append is
not going to do a parallel plan; for example, a select count(*) won't
divide it into partial aggregates if the underlying Append is not
partial.

The attached patch removes both of the above restrictions. There has
already been a mail thread [1] that discusses an approach suggested by
Robert Haas for implementing this feature. This patch uses this same
approach.

The first goal requires some kind of synchronization which will allow workers
to be distributed across the subplans. The second goal requires some kind of
synchronization to prevent multiple workers from executing non-parallel
subplans. The patch uses different mechanisms to achieve the goals. If
we create two different patches addressing each goal, those may be
easier to handle.

We may want to think about a third goal: preventing too many workers
from executing the same plan. As per comment in get_parallel_divisor()
we do not see any benefit if more than 4 workers execute the same
node. So, an append node can distribute more than 4 worker nodes
equally among the available subplans. It might be better to do that as
a separate patch.

Attached is pgbench_create_partition.sql (derived from the one
included in the above thread) that distributes pgbench_accounts table
data into 3 partitions pgbench_account_[1-3]. The below queries use
this schema.

Consider a query such as :
select count(*) from pgbench_accounts;

Now suppose, these two partitions do not allow parallel scan :
alter table pgbench_accounts_1 set (parallel_workers=0);
alter table pgbench_accounts_2 set (parallel_workers=0);

On HEAD, due to some of the partitions having non-parallel scans, the
whole Append would be a sequential scan :

Aggregate
-> Append
-> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts
-> Seq Scan on pgbench_accounts_1
-> Seq Scan on pgbench_accounts_2
-> Seq Scan on pgbench_accounts_3

Whereas, with the patch, the Append looks like this :

Finalize Aggregate
-> Gather
Workers Planned: 6
-> Partial Aggregate
-> Parallel Append
-> Parallel Seq Scan on pgbench_accounts
-> Seq Scan on pgbench_accounts_1
-> Seq Scan on pgbench_accounts_2
-> Parallel Seq Scan on pgbench_accounts_3

Above, Parallel Append is generated, and it executes all these
subplans in parallel, with 1 worker executing each of the sequential
scans, and multiple workers executing each of the parallel subplans.

======= Implementation details ========

------- Adding parallel-awareness -------

In a given worker, this Append plan node will be executing just like
the usual partial Append node. It will run a subplan until completion.
The subplan may or may not be a partial parallel-aware plan like
parallelScan. After the subplan is done, Append will choose the next
subplan. It is here where it will be different than the current
partial Append plan: it is parallel-aware. The Append nodes in the
workers will be aware that there are other Append nodes running in
parallel. The partial Append will have to coordinate with other Append
nodes while choosing the next subplan.

------- Distribution of workers --------

The coordination info is stored in a shared array, each element of
which describes the per-subplan info. This info contains the number of
workers currently executing the subplan, and the maximum number of
workers that should be executing it at the same time. For non-partial
sublans, max workers would always be 1. For choosing the next subplan,
the Append executor will sequentially iterate over the array to find a
subplan having the least number of workers currently being executed,
AND which is not already being executed by the maximum number of
workers assigned for the subplan. Once it gets one, it increments
current_workers, and releases the Spinlock, so that other workers can
choose their next subplan if they are waiting.

This way, workers would be fairly distributed across subplans.

The shared array needs to be initialized and made available to
workers. For this, we can do exactly what sequential scan does for
being parallel-aware : Using function ExecAppendInitializeDSM()
similar to ExecSeqScanInitializeDSM() in the backend to allocate the
array. Similarly, for workers, have ExecAppendInitializeWorker() to
retrieve the shared array.

-------- Generating Partial Append plan having non-partial subplans --------

In set_append_rel_pathlist(), while generating a partial path for
Append, also include the non-partial child subpaths, besides the
partial subpaths. This way, it can contain a mix of partial and
non-partial children paths. But for a given child, its path would be
either the cheapest partial path or the cheapest non-partial path.

For a non-partial child path, it will only be included if it is
parallel-safe. If there is no parallel-safe path, a partial Append
path would not be generated. This behaviour also automatically
prevents paths that have a Gather node beneath.

Finally when it comes to create a partial append path using these
child paths, we also need to store a bitmap set indicating which of
the child paths are non-partial paths. For this, have a new BitmapSet
field : Append->partial_subplans. At execution time, this will be used
to set the maximum workers for a non-partial subpath to 1.

We may be able to eliminate this field. Please check comment 6 below.

-------- Costing -------

For calculating per-worker parallel Append path cost, it first
calculates a total of child subplan costs considering all of their
workers, and then divides it by the Append node's parallel_divisor,
similar to how parallel scan uses this "parallel_divisor".

For startup cost, it is assumed that Append would start returning
tuples when the child node having the lowest startup cost is done
setting up. So Append startup cost is equal to startup cost of the
child with minimum startup cost.

-------- Scope --------

There are two different code paths where Append path is generated.
1. One is where append rel is generated : Inheritance table, and UNION
ALL clause.
2. Second codepath is in prepunion.c. This gets executed for UNION
(without ALL) and INTERSECT/EXCEPT [ALL]. The patch does not support
Parallel Append in this scenario. It can be later taken up as
extension, once this patch is reviewed.

Here are some review comments

1. struct ParallelAppendDescData is being used at other places. The declaration
style doesn't seem to be very common in the code or in the directory where the
file is located.
+struct ParallelAppendDescData
+{
+    slock_t        pa_mutex;        /* mutual exclusion to choose
next subplan */
+    parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
+};
Defining it like
typdef struct ParallelAppendDescData
{
    slock_t        pa_mutex;        /* mutual exclusion to choose next
subplan */
    parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
};
will make its use handy. Instead of struct ParallelAppendDescData, you will
need to use just ParallelAppendDescData. May be we want to rename
parallel_append_info as ParallelAppendInfo and change the style to match other
declarations.

2. The comment below refers to the constant which it describes, which looks
odd. May be it should be worded as "A special value of
AppendState::as_whichplan, to indicate no plans left to be executed.". Also
using INVALID for "no plans left ..." seems to be a misnomer.
/*
* For Parallel Append, AppendState::as_whichplan can have PA_INVALID_PLAN
* value, which indicates there are no plans left to be executed.
*/
#define PA_INVALID_PLAN -1

3. The sentence "We have got NULL", looks odd. Probably we don't need it as
it's clear from the code above that this code deals with the case when the
current subplan didn't return any row.
/*
* We have got NULL. There might be other workers still processing the
* last chunk of rows for this same node, but there's no point for new
* workers to run this node, so mark this node as finished.
*/
4. In the same comment, I guess, the word "node" refers to "subnode" and not
the node pointed by variable "node". May be you want to use word "subplan"
here.

4. set_finished()'s prologue has different indentation compared to other
functions in the file.

5. Multilevel comment starts with an empty line.
+ /* Keep track of the node with the least workers so far. For the very

6. By looking at parallel_worker field of a path, we can say whether it's
partial or not. We probably do not require to maintain a bitmap for that at in
the Append path. The bitmap can be constructed, if required, at the time of
creating the partial append plan. The reason to take this small step is 1. we
want to minimize our work at the time of creating paths, 2. while freeing a
path in add_path, we don't free the internal structures, in this case the
Bitmap, which will waste memory if the path is not chosen while planning.

7. If we consider 6, we don't need concat_append_subpaths(), but still here are
some comments about that function. Instead of accepting two separate arguments
childpaths and child_partial_subpaths_set, which need to be in sync, we can
just pass the path which contains both of those. In the same following code may
be optimized by adding a utility function to Bitmapset, which advances
all members
by given offset and using that function with bms_union() to merge the
bitmapset e.g.
bms_union(*partial_subpaths_set,
bms_advance_members(bms_copy(child_partial_subpaths_set), append_subpath_len));
if (partial_subpaths_set)
{
for (i = 0; i < list_length(childpaths); i++)
{
/*
* The child paths themselves may or may not be partial paths. The
* bitmapset numbers of these paths will need to be set considering
* that these are to be appended onto the partial_subpaths_set.
*/
if (!child_partial_subpaths_set ||
bms_is_member(i, child_partial_subpaths_set))
{
*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
append_subpath_len + i);
}
}
}

8.
-            parallel_workers = Max(parallel_workers, path->parallel_workers);
+            /*
+             * partial_subpaths can have non-partial subpaths so
+             * path->parallel_workers can be 0. For such paths, allocate one
+             * worker.
+             */
+            parallel_workers +=
+                (path->parallel_workers > 0 ? path->parallel_workers : 1);

This looks odd. Earlier code was choosing maximum of all parallel workers,
whereas new code adds them all. E.g. if parallel_workers for subpaths is 3, 4,
3, without your change, it will pick up 4. But with your change it will pick
10. I think, you intend to write this as
parallel_workers = Max(parallel_workers, path->parallel_workers ?
path->parallel_workers : 1);

If you do that probably you don't need since parallel_workers are never set
more than max_parallel_workers_per_gather.
+        /* In no case use more than max_parallel_workers_per_gather. */
+        parallel_workers = Min(parallel_workers,
+                               max_parallel_workers_per_gather);
+

9. Shouldn't this funciton return double?
int
get_parallel_divisor(int parallel_workers)

9. In get_parallel_divisor(), if parallel_worker is 0 i.e. it's a partial path
the return value will be 2, which isn't true. This function is being called for
all the subpaths to get the original number of rows and costs of partial paths.
I think we don't need to call this function on subpaths which are not partial
paths or make it work parallel_workers = 0.

10. We should probably move the parallel_safe calculation out of cost_append().
+            path->parallel_safe = path->parallel_safe &&
+                                  subpath->parallel_safe;

11. This check shouldn't be part of cost_append().
+            /* All child paths must have same parameterization */
+            Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

13. No braces required for single line block
+    /* Increment worker count for the chosen node, if at all we found one. */
+    if (min_whichplan != PA_INVALID_PLAN)
+    {
+        padesc->pa_info[min_whichplan].pa_num_workers++;
+    }

14. exec_append_scan_first() is a one-liner function, should we just inline it?

15. This patch replaces exec_append_initialize_next() with
exec_append_scan_first(). The earlier function was handling backward and
forward scans separately, but the later function doesn't do that. Why?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#2)

Re: Parallel Append implementation

Thanks Ashutosh for the feedback.

On 6 January 2017 at 17:04, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Dec 23, 2016 at 10:51 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Currently an Append plan node does not execute its subplans in
parallel. There is no distribution of workers across its subplans. The
second subplan starts running only after the first subplan finishes,
although the individual subplans may be running parallel scans.

Secondly, we create a partial Append path for an appendrel, but we do
that only if all of its member subpaths are partial paths. If one or
more of the subplans is a non-parallel path, there will be only a
non-parallel Append. So whatever node is sitting on top of Append is
not going to do a parallel plan; for example, a select count(*) won't
divide it into partial aggregates if the underlying Append is not
partial.

The attached patch removes both of the above restrictions. There has
already been a mail thread [1] that discusses an approach suggested by
Robert Haas for implementing this feature. This patch uses this same
approach.

The first goal requires some kind of synchronization which will allow workers
to be distributed across the subplans. The second goal requires some kind of
synchronization to prevent multiple workers from executing non-parallel
subplans. The patch uses different mechanisms to achieve the goals. If
we create two different patches addressing each goal, those may be
easier to handle.

Goal A : Allow non-partial subpaths in Partial Append.
Goal B : Distribute workers across the Append subplans.
Both of these require some kind of synchronization while choosing the
next subplan. So, goal B is achieved by doing all the synchronization
stuff. And implementation of goal A requires that goal B is
implemented. So there is a dependency between these two goals. While
implementing goal B, we should keep in mind that it should also work
for goal A; it does not make sense later changing the synchronization
logic in goal A.

I am ok with splitting the patch into 2 patches :
a) changes required for goal A
b) changes required for goal B.
But I think we should split it only when we are ready to commit them
(commit for B, immediately followed by commit for A). Until then, we
should consider both of these together because they are interconnected
as explained above.

We may want to think about a third goal: preventing too many workers
from executing the same plan. As per comment in get_parallel_divisor()
we do not see any benefit if more than 4 workers execute the same
node. So, an append node can distribute more than 4 worker nodes
equally among the available subplans. It might be better to do that as
a separate patch.

I think that comment is for calculating leader contribution. It does
not say that 4 workers is too many workers in general.

But yes, I agree, and I have it in mind as the next improvement.
Basically, it does not make sense to give more than 3 workers to a
subplan when parallel_workers for that subplan are 3. For e.g., if
gather max workers is 10, and we have 2 Append subplans s1 and s2 with
parallel workers 3 and 5 respectively. Then, with the current patch,
it will distribute 4 workers to each of these workers. What we should
do is : once both of the subplans get 3 workers each, we should give
the 7th and 8th worker to s2.

Now that I think of that, I think for implementing above, we need to
keep track of per-subplan max_workers in the Append path; and with
that, the bitmap will be redundant. Instead, it can be replaced with
max_workers. Let me check if it is easy to do that. We don't want to
have the bitmap if we are sure it would be replaced by some other data
structure.

Here are some review comments

I will handle the other comments, but first, just a quick response to
some important ones :

6. By looking at parallel_worker field of a path, we can say whether it's
partial or not. We probably do not require to maintain a bitmap for that at in
the Append path. The bitmap can be constructed, if required, at the time of
creating the partial append plan. The reason to take this small step is 1. we
want to minimize our work at the time of creating paths, 2. while freeing a
path in add_path, we don't free the internal structures, in this case the
Bitmap, which will waste memory if the path is not chosen while planning.

Let me try keeping the per-subplan max_worker info in Append path
itself, like I mentioned above. If that works, the bitmap will be
replaced by max_worker field. In case of non-partial subpath,
max_worker will be 1. (this is the same info kept in AppendState node
in the patch, but now we might need to keep it in Append path node as
well).

7. If we consider 6, we don't need concat_append_subpaths(), but still here are
some comments about that function. Instead of accepting two separate arguments
childpaths and child_partial_subpaths_set, which need to be in sync, we can
just pass the path which contains both of those. In the same following code may
be optimized by adding a utility function to Bitmapset, which advances
all members
by given offset and using that function with bms_union() to merge the
bitmapset e.g.
bms_union(*partial_subpaths_set,
bms_advance_members(bms_copy(child_partial_subpaths_set), append_subpath_len));
if (partial_subpaths_set)
{
for (i = 0; i < list_length(childpaths); i++)
{
/*
* The child paths themselves may or may not be partial paths. The
* bitmapset numbers of these paths will need to be set considering
* that these are to be appended onto the partial_subpaths_set.
*/
if (!child_partial_subpaths_set ||
bms_is_member(i, child_partial_subpaths_set))
{
*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
append_subpath_len + i);
}
}
}

Again, for the reason mentioned above, we will defer this point for now.

8.
-            parallel_workers = Max(parallel_workers, path->parallel_workers);
+            /*
+             * partial_subpaths can have non-partial subpaths so
+             * path->parallel_workers can be 0. For such paths, allocate one
+             * worker.
+             */
+            parallel_workers +=
+                (path->parallel_workers > 0 ? path->parallel_workers : 1);
This looks odd. Earlier code was choosing maximum of all parallel workers,
whereas new code adds them all. E.g. if parallel_workers for subpaths is 3, 4,
3, without your change, it will pick up 4. But with your change it will pick
10. I think, you intend to write this as
parallel_workers = Max(parallel_workers, path->parallel_workers ?
path->parallel_workers : 1);

The intention is to add all workers, because a parallel-aware Append
is going to need them in order to make the subplans run with their
full capacity in parallel. So with subpaths with 3, 4, and 3 workers,
the Append path will need 10 workers. If it allocates 4 workers, its
not sufficient : Each of them would get only 1 worker, or max 2. In
the existing code, 4 is correct, because all the workers are going to
execute the same subplan node at a time.

9. Shouldn't this funciton return double?
int
get_parallel_divisor(int parallel_workers)

Yes, right, I will do that.

9. In get_parallel_divisor(), if parallel_worker is 0 i.e. it's a partial path
the return value will be 2, which isn't true. This function is being called for
all the subpaths to get the original number of rows and costs of partial paths.
I think we don't need to call this function on subpaths which are not partial
paths or make it work parallel_workers = 0.

I didn't understand this. I checked again get_parallel_divisor()
function code. I think it will return 1 if parallel_workers is 0. But
I may be missing something.

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

WIth the current partial path costing infrastructure, it is assumed
that a partial path node should return the average per-worker cost.
Hence, I thought it would be best to do it in a similar way for
Append. But let me think if we can do something. With the current
parallelism costing infrastructure, I am not sure though.

13. No braces required for single line block
+    /* Increment worker count for the chosen node, if at all we found one. */
+    if (min_whichplan != PA_INVALID_PLAN)
+    {
+        padesc->pa_info[min_whichplan].pa_num_workers++;
+    }
14. exec_append_scan_first() is a one-liner function, should we just inline it?

15. This patch replaces exec_append_initialize_next() with
exec_append_scan_first(). The earlier function was handling backward and
forward scans separately, but the later function doesn't do that. Why?

I will come to these and some other ones later.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Amit Khandekar (#3)

Re: Parallel Append implementation

On Mon, Jan 16, 2017 at 9:49 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks Ashutosh for the feedback.

On 6 January 2017 at 17:04, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Dec 23, 2016 at 10:51 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Currently an Append plan node does not execute its subplans in
parallel. There is no distribution of workers across its subplans. The
second subplan starts running only after the first subplan finishes,
although the individual subplans may be running parallel scans.

Secondly, we create a partial Append path for an appendrel, but we do
that only if all of its member subpaths are partial paths. If one or
more of the subplans is a non-parallel path, there will be only a
non-parallel Append. So whatever node is sitting on top of Append is
not going to do a parallel plan; for example, a select count(*) won't
divide it into partial aggregates if the underlying Append is not
partial.

The attached patch removes both of the above restrictions. There has
already been a mail thread [1] that discusses an approach suggested by
Robert Haas for implementing this feature. This patch uses this same
approach.

The first goal requires some kind of synchronization which will allow workers
to be distributed across the subplans. The second goal requires some kind of
synchronization to prevent multiple workers from executing non-parallel
subplans. The patch uses different mechanisms to achieve the goals. If
we create two different patches addressing each goal, those may be
easier to handle.

Goal A : Allow non-partial subpaths in Partial Append.
Goal B : Distribute workers across the Append subplans.
Both of these require some kind of synchronization while choosing the
next subplan. So, goal B is achieved by doing all the synchronization
stuff. And implementation of goal A requires that goal B is
implemented. So there is a dependency between these two goals. While
implementing goal B, we should keep in mind that it should also work
for goal A; it does not make sense later changing the synchronization
logic in goal A.

I am ok with splitting the patch into 2 patches :
a) changes required for goal A
b) changes required for goal B.
But I think we should split it only when we are ready to commit them
(commit for B, immediately followed by commit for A). Until then, we
should consider both of these together because they are interconnected
as explained above.

For B, we need to know, how much gain that brings and in which cases.
If that gain is not worth the complexity added, we may have to defer
Goal B. Goal A would certainly be useful since it will improve
performance of the targetted cases. The synchronization required for
Goal A is simpler than that of B and thus if we choose to implement
only A, we can live with a simpler synchronization.

BTW, Right now, the patch does not consider non-partial paths for a
child which has partial paths. Do we know, for sure, that a path
containing partial paths for a child, which has it, is always going to
be cheaper than the one which includes non-partial path. If not,
should we build another paths which contains non-partial paths for all
child relations. This sounds like a 0/1 knapsack problem.

Here are some review comments

I will handle the other comments, but first, just a quick response to
some important ones :

6. By looking at parallel_worker field of a path, we can say whether it's
partial or not. We probably do not require to maintain a bitmap for that at in
the Append path. The bitmap can be constructed, if required, at the time of
creating the partial append plan. The reason to take this small step is 1. we
want to minimize our work at the time of creating paths, 2. while freeing a
path in add_path, we don't free the internal structures, in this case the
Bitmap, which will waste memory if the path is not chosen while planning.

Let me try keeping the per-subplan max_worker info in Append path
itself, like I mentioned above. If that works, the bitmap will be
replaced by max_worker field. In case of non-partial subpath,
max_worker will be 1. (this is the same info kept in AppendState node
in the patch, but now we might need to keep it in Append path node as
well).

It will be better if we can fetch that information from each subpath
when creating the plan. As I have explained before, a path is minimal
structure, which should be easily disposable, when throwing away the
path.

7. If we consider 6, we don't need concat_append_subpaths(), but still here are
some comments about that function. Instead of accepting two separate arguments
childpaths and child_partial_subpaths_set, which need to be in sync, we can
just pass the path which contains both of those. In the same following code may
be optimized by adding a utility function to Bitmapset, which advances
all members
by given offset and using that function with bms_union() to merge the
bitmapset e.g.
bms_union(*partial_subpaths_set,
bms_advance_members(bms_copy(child_partial_subpaths_set), append_subpath_len));
if (partial_subpaths_set)
{
for (i = 0; i < list_length(childpaths); i++)
{
/*
* The child paths themselves may or may not be partial paths. The
* bitmapset numbers of these paths will need to be set considering
* that these are to be appended onto the partial_subpaths_set.
*/
if (!child_partial_subpaths_set ||
bms_is_member(i, child_partial_subpaths_set))
{
*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
append_subpath_len + i);
}
}
}

Again, for the reason mentioned above, we will defer this point for now.

Ok.

8.
-            parallel_workers = Max(parallel_workers, path->parallel_workers);
+            /*
+             * partial_subpaths can have non-partial subpaths so
+             * path->parallel_workers can be 0. For such paths, allocate one
+             * worker.
+             */
+            parallel_workers +=
+                (path->parallel_workers > 0 ? path->parallel_workers : 1);
This looks odd. Earlier code was choosing maximum of all parallel workers,
whereas new code adds them all. E.g. if parallel_workers for subpaths is 3, 4,
3, without your change, it will pick up 4. But with your change it will pick
10. I think, you intend to write this as
parallel_workers = Max(parallel_workers, path->parallel_workers ?
path->parallel_workers : 1);
The intention is to add all workers, because a parallel-aware Append
is going to need them in order to make the subplans run with their
full capacity in parallel. So with subpaths with 3, 4, and 3 workers,
the Append path will need 10 workers. If it allocates 4 workers, its
not sufficient : Each of them would get only 1 worker, or max 2. In
the existing code, 4 is correct, because all the workers are going to
execute the same subplan node at a time.

Ok, makes sense if we take up Goal B.

9. In get_parallel_divisor(), if parallel_worker is 0 i.e. it's a partial path
the return value will be 2, which isn't true. This function is being called for
all the subpaths to get the original number of rows and costs of partial paths.
I think we don't need to call this function on subpaths which are not partial
paths or make it work parallel_workers = 0.

I didn't understand this. I checked again get_parallel_divisor()
function code. I think it will return 1 if parallel_workers is 0. But
I may be missing something.

Sorry, I also don't understand why I had that comment. For some
reason, I thought we are sending 1 when parallel_workers = 0 to
get_parallel_divisor(). But I don't understand why I thought so.
Anyway, I will provide better explanation next time I bounce against
this.

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

WIth the current partial path costing infrastructure, it is assumed
that a partial path node should return the average per-worker cost.
Hence, I thought it would be best to do it in a similar way for
Append. But let me think if we can do something. With the current
parallelism costing infrastructure, I am not sure though.

The current parallel mechanism is in sync with that costing. Each
worker is supposed to take the same burden, hence the same (average)
cost. But it will change when a single worker has to scan an entire
child relation and different child relations have different sizes.

Thanks for working on the comments.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

almost 9 years ago

In reply to: Amit Khandekar (#1)

Re: Parallel Append implementation

Hi Amit,

On 2016/12/23 14:21, Amit Khandekar wrote:

Currently an Append plan node does not execute its subplans in
parallel. There is no distribution of workers across its subplans. The
second subplan starts running only after the first subplan finishes,
although the individual subplans may be running parallel scans.

Secondly, we create a partial Append path for an appendrel, but we do
that only if all of its member subpaths are partial paths. If one or
more of the subplans is a non-parallel path, there will be only a
non-parallel Append. So whatever node is sitting on top of Append is
not going to do a parallel plan; for example, a select count(*) won't
divide it into partial aggregates if the underlying Append is not
partial.

The attached patch removes both of the above restrictions. There has
already been a mail thread [1] that discusses an approach suggested by
Robert Haas for implementing this feature. This patch uses this same
approach.

I was looking at the executor portion of this patch and I noticed that in
exec_append_initialize_next():

if (appendstate->as_padesc)
return parallel_append_next(appendstate);

/*
* Not parallel-aware. Fine, just go on to the next subplan in the
* appropriate direction.
*/
if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
appendstate->as_whichplan++;
else
appendstate->as_whichplan--;

which seems to mean that executing Append in parallel mode disregards the
scan direction. I am not immediately sure what implications that has, so
I checked what heap scan does when executing in parallel mode, and found
this in heapgettup():

else if (backward)
{
/* backward parallel scan not supported */
Assert(scan->rs_parallel == NULL);

Perhaps, AppendState.as_padesc would not have been set if scan direction
is backward, because parallel mode would be disabled for the whole query
in that case (PlannerGlobal.parallelModeOK = false). Maybe add an
Assert() similar to one in heapgettup().

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Amit Langote (#5)

Re: Parallel Append implementation

On Tue, Jan 17, 2017 at 2:40 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Hi Amit,

On 2016/12/23 14:21, Amit Khandekar wrote:

Currently an Append plan node does not execute its subplans in
parallel. There is no distribution of workers across its subplans. The
second subplan starts running only after the first subplan finishes,
although the individual subplans may be running parallel scans.

Secondly, we create a partial Append path for an appendrel, but we do
that only if all of its member subpaths are partial paths. If one or
more of the subplans is a non-parallel path, there will be only a
non-parallel Append. So whatever node is sitting on top of Append is
not going to do a parallel plan; for example, a select count(*) won't
divide it into partial aggregates if the underlying Append is not
partial.

The attached patch removes both of the above restrictions. There has
already been a mail thread [1] that discusses an approach suggested by
Robert Haas for implementing this feature. This patch uses this same
approach.

I was looking at the executor portion of this patch and I noticed that in
exec_append_initialize_next():

if (appendstate->as_padesc)
return parallel_append_next(appendstate);

/*
* Not parallel-aware. Fine, just go on to the next subplan in the
* appropriate direction.
*/
if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
appendstate->as_whichplan++;
else
appendstate->as_whichplan--;

which seems to mean that executing Append in parallel mode disregards the
scan direction. I am not immediately sure what implications that has, so
I checked what heap scan does when executing in parallel mode, and found
this in heapgettup():

else if (backward)
{
/* backward parallel scan not supported */
Assert(scan->rs_parallel == NULL);

Perhaps, AppendState.as_padesc would not have been set if scan direction
is backward, because parallel mode would be disabled for the whole query
in that case (PlannerGlobal.parallelModeOK = false). Maybe add an
Assert() similar to one in heapgettup().

There have been some reviews, but the patch has not been updated in
two weeks. Marking as "returned with feedback".
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#4)

1 attachment(s)

Re: Parallel Append implementation

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

We may want to think about a third goal: preventing too many workers
from executing the same plan. As per comment in get_parallel_divisor()
we do not see any benefit if more than 4 workers execute the same
node. So, an append node can distribute more than 4 worker nodes
equally among the available subplans. It might be better to do that as
a separate patch.

I think that comment is for calculating leader contribution. It does
not say that 4 workers is too many workers in general.

But yes, I agree, and I have it in mind as the next improvement.
Basically, it does not make sense to give more than 3 workers to a
subplan when parallel_workers for that subplan are 3. For e.g., if
gather max workers is 10, and we have 2 Append subplans s1 and s2 with
parallel workers 3 and 5 respectively. Then, with the current patch,
it will distribute 4 workers to each of these workers. What we should
do is : once both of the subplans get 3 workers each, we should give
the 7th and 8th worker to s2.

Now that I think of that, I think for implementing above, we need to
keep track of per-subplan max_workers in the Append path; and with
that, the bitmap will be redundant. Instead, it can be replaced with
max_workers. Let me check if it is easy to do that. We don't want to
have the bitmap if we are sure it would be replaced by some other data
structure.

Attached is v2 patch, which implements above. Now Append plan node
stores a list of per-subplan max worker count, rather than the
Bitmapset. But still Bitmapset turned out to be necessary for
AppendPath. More details are in the subsequent comments.

Goal A : Allow non-partial subpaths in Partial Append.
Goal B : Distribute workers across the Append subplans.
Both of these require some kind of synchronization while choosing the
next subplan. So, goal B is achieved by doing all the synchronization
stuff. And implementation of goal A requires that goal B is
implemented. So there is a dependency between these two goals. While
implementing goal B, we should keep in mind that it should also work
for goal A; it does not make sense later changing the synchronization
logic in goal A.

I am ok with splitting the patch into 2 patches :
a) changes required for goal A
b) changes required for goal B.
But I think we should split it only when we are ready to commit them
(commit for B, immediately followed by commit for A). Until then, we
should consider both of these together because they are interconnected
as explained above.

For B, we need to know, how much gain that brings and in which cases.
If that gain is not worth the complexity added, we may have to defer
Goal B. Goal A would certainly be useful since it will improve
performance of the targetted cases. The synchronization required for
Goal A is simpler than that of B and thus if we choose to implement
only A, we can live with a simpler synchronization.

For Goal A , the logic for a worker synchronously choosing a subplan will be :
Go the next subplan. If that subplan has not already assigned max
workers, choose this subplan, otherwise, go the next subplan, and so
on.
For Goal B , the logic will be :
Among the subplans which are yet to achieve max workers, choose the
subplan with the minimum number of workers currently assigned.

I don't think there is a significant difference between the complexity
of the above two algorithms. So I think here the complexity does not
look like a factor based on which we can choose the particular logic.
We should choose the logic which has more potential for benefits. The
logic for goal B will work for goal A as well. And secondly, if the
subplans are using their own different system resources, the resource
contention might be less. One case is : all subplans using different
disks. Second case is : some of the subplans may be using a foreign
scan, so it would start using foreign server resources sooner. These
benefits apply when the Gather max workers count is not sufficient for
running all the subplans in their full capacity. If they are
sufficient, then the workers will be distributed over the subplans
using both the logics. Just the order of assignments of workers to
subplans will be different.

Also, I don't see a disadvantage if we follow the logic of Goal B.

BTW, Right now, the patch does not consider non-partial paths for a
child which has partial paths. Do we know, for sure, that a path
containing partial paths for a child, which has it, is always going to
be cheaper than the one which includes non-partial path. If not,
should we build another paths which contains non-partial paths for all
child relations. This sounds like a 0/1 knapsack problem.

I didn't quite get this. We do create a non-partial Append path using
non-partial child paths anyways.

Here are some review comments

I will handle the other comments, but first, just a quick response to
some important ones :

6. By looking at parallel_worker field of a path, we can say whether it's
partial or not. We probably do not require to maintain a bitmap for that at in
the Append path. The bitmap can be constructed, if required, at the time of
creating the partial append plan. The reason to take this small step is 1. we
want to minimize our work at the time of creating paths, 2. while freeing a
path in add_path, we don't free the internal structures, in this case the
Bitmap, which will waste memory if the path is not chosen while planning.

Let me try keeping the per-subplan max_worker info in Append path
itself, like I mentioned above. If that works, the bitmap will be
replaced by max_worker field. In case of non-partial subpath,
max_worker will be 1. (this is the same info kept in AppendState node
in the patch, but now we might need to keep it in Append path node as
well).

It will be better if we can fetch that information from each subpath
when creating the plan. As I have explained before, a path is minimal
structure, which should be easily disposable, when throwing away the
path.

Now in the v2 patch, we store per-subplan worker count. But still, we
cannot use the path->parallel_workers to determine whether it's a
partial path. This is because even for a non-partial path, it seems
the parallel_workers can be non-zero. For e.g., in
create_subqueryscan_path(), it sets path->parallel_workers to
subpath->parallel_workers. But this path is added as a non-partial
path. So we need a separate info as to which of the subpaths in Append
path are partial subpaths. So in the v2 patch, I continued to use
Bitmapset in AppendPath. But in Append plan node, number of workers is
calculated using this bitmapset. Check the new function
get_append_num_workers().

7. If we consider 6, we don't need concat_append_subpaths(),

As explained above, I have kept the BitmapSet for AppendPath.

but still here are
some comments about that function. Instead of accepting two separate arguments
childpaths and child_partial_subpaths_set, which need to be in sync, we can
just pass the path which contains both of those. In the same following code may
be optimized by adding a utility function to Bitmapset, which advances
all members
by given offset and using that function with bms_union() to merge the
bitmapset e.g.
bms_union(*partial_subpaths_set,
bms_advance_members(bms_copy(child_partial_subpaths_set), append_subpath_len));
if (partial_subpaths_set)

I will get back on this after more thought.

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

WIth the current partial path costing infrastructure, it is assumed
that a partial path node should return the average per-worker cost.
Hence, I thought it would be best to do it in a similar way for
Append. But let me think if we can do something. With the current
parallelism costing infrastructure, I am not sure though.

The current parallel mechanism is in sync with that costing. Each
worker is supposed to take the same burden, hence the same (average)
cost. But it will change when a single worker has to scan an entire
child relation and different child relations have different sizes.

I gave more thought on this. Considering each subplan has different
number of workers, I think it makes sense to calculate average
per-worker cost even in parallel Append. In case of non-partial
subplan, a single worker will execute it, but it will next choose
another subplan. So on average each worker is going to process the
same number of rows, and also the same amount of CPU. And that amount
of CPU cost and rows cost should be calculated by taking the total
count and dividing it by number of workers (parallel_divsor actually).

Here are some review comments
1. struct ParallelAppendDescData is being used at other places. The declaration
style doesn't seem to be very common in the code or in the directory where the
file is located.
+struct ParallelAppendDescData
+{
+    slock_t        pa_mutex;        /* mutual exclusion to choose
next subplan */
+    parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
+};
Defining it like
typdef struct ParallelAppendDescData
{
slock_t        pa_mutex;        /* mutual exclusion to choose next
subplan */
parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
};
will make its use handy. Instead of struct ParallelAppendDescData, you will
need to use just ParallelAppendDescData. May be we want to rename
parallel_append_info as ParallelAppendInfo and change the style to match other
declarations.
2. The comment below refers to the constant which it describes, which looks
odd. May be it should be worded as "A special value of
AppendState::as_whichplan, to indicate no plans left to be executed.". Also
using INVALID for "no plans left ..." seems to be a misnomer.
/*
* For Parallel Append, AppendState::as_whichplan can have PA_INVALID_PLAN
* value, which indicates there are no plans left to be executed.
*/
#define PA_INVALID_PLAN -1

3. The sentence "We have got NULL", looks odd. Probably we don't need it as
it's clear from the code above that this code deals with the case when the
current subplan didn't return any row.
/*
* We have got NULL. There might be other workers still processing the
* last chunk of rows for this same node, but there's no point for new
* workers to run this node, so mark this node as finished.
*/
4. In the same comment, I guess, the word "node" refers to "subnode" and not
the node pointed by variable "node". May be you want to use word "subplan"
here.

4. set_finished()'s prologue has different indentation compared to other
functions in the file.

5. Multilevel comment starts with an empty line.
+ /* Keep track of the node with the least workers so far. For the very

Done 1. to 5. above, as per your suggestions.

9. Shouldn't this funciton return double?
int
get_parallel_divisor(int parallel_workers)

v2 patch is rebased on latest master branch, which already contains
this function returning double.

10. We should probably move the parallel_safe calculation out of cost_append().
+            path->parallel_safe = path->parallel_safe &&
+                                  subpath->parallel_safe;

11. This check shouldn't be part of cost_append().
+            /* All child paths must have same parameterization */
+            Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));

Yet to handle the above ones.

Attachments:

ParallelAppend_v2.patchapplication/octet-stream; name=ParallelAppend_v2.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index e01fe6d..0b50ab9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeSeqscan.h"
@@ -201,6 +202,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -249,6 +254,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecSeqScanInitializeDSM((SeqScanState *) planstate,
 										 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
@@ -725,6 +734,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 			case T_SeqScanState:
 				ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6986cae..97bfc89 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,48 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendInfo
+{
+	int		pa_num_workers;		/* workers currently executing the subplan */
+	int		pa_max_workers;		/* max workers that should run the subplan */
+}	ParallelAppendInfo;
+
+typedef struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+
+static void exec_append_scan_first(AppendState *appendstate);
 static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
 
+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+	appendstate->as_whichplan = 0;
+}
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -77,6 +116,22 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+		return parallel_append_next(appendstate);
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -178,8 +233,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	/*
 	 * initialize to scan first subplan
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	exec_append_scan_first(appendstate);
 
 	return appendstate;
 }
@@ -198,6 +252,14 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		/* Check if we are already finished plans from parallel append */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : all plans already finished",
+						 MyProcPid);
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
@@ -219,14 +281,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -270,6 +336,7 @@ ExecReScanAppend(AppendState *node)
 	for (i = 0; i < node->as_nplans; i++)
 	{
 		PlanState  *subnode = node->appendplans[i];
+		ParallelAppendDesc padesc = node->as_padesc;
 
 		/*
 		 * ExecReScan doesn't know about my subplans, so I have to do
@@ -284,7 +351,223 @@ ExecReScanAppend(AppendState *node)
 		 */
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
+
+		if (padesc)
+		{
+			/*
+			 * Just setting all the number of workers to 0 is enough. The logic
+			 * of choosing the next plan will take care of everything else.
+			 * pa_max_workers is already set initially.
+			 */
+			padesc->pa_info[i].pa_num_workers = 0;
+		}
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	exec_append_scan_first(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_info),
+				 sizeof(*node->as_padesc->pa_info) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+	List	   *num_workers_list = ((Append*)node->ps.plan)->num_workers;
+	ListCell   *lc;
+	int			i;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	Assert(node->as_nplans == list_length(num_workers_list));
+
+	i = 0;
+	foreach(lc, num_workers_list)
+	{
+		/* Initialize the max workers count for each subplan. */
+		padesc->pa_info[i].pa_max_workers = lfirst_int(lc);
+
+		/*
+		 * Also, initialize current number of workers. Just setting all the
+		 * number of workers to 0 is enough. The logic of choosing the next
+		 * plan in workers will take care of initializing everything else.
+		 */
+		padesc->pa_info[i].pa_num_workers = 0;
+
+		i++;
+	}
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ *		Indicate that this child plan node is about to be finished, so no other
+ *		workers should take up this node. Workers who are already executing
+ *		this node will continue to do so, but workers looking for next nodes to
+ *		pick up would skip this node after this function is called. It is
+ *		possible that multiple workers call this function for the same node at
+ *		the same time, because these workers were executing the same node and
+ *		they finished with it at the same time. The spinlock is not for this
+ *		purpose. The spinlock is used so that it does not change the
+ *		pa_num_workers field while workers are choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	elog(DEBUG2, "Parallelappend : pid %d : finishing plan %d",
+				 MyProcPid, whichplan);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_info[whichplan].pa_num_workers = -1;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the optimal subplan that should be executed. The logic is to
+ *		choose the subplan that is being executed by the least number of
+ *		workers.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		min_whichplan = PA_INVALID_PLAN;
+	int		min_workers = -1; /* Keep compiler quiet */
+
+	Assert(padesc != NULL);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Choose the plan with the least number of workers */
+	for (whichplan = 0; whichplan < state->as_nplans; whichplan++)
+	{
+		ParallelAppendInfo *painfo = &padesc->pa_info[whichplan];
+
+		/* Ignore plans that are already done processing */
+		if (painfo->pa_num_workers == -1)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d"
+						 " since pa_num_workers is -1",
+						 MyProcPid, whichplan);
+			continue;
+		}
+
+		/* Ignore plans that are already being processed by max_workers */
+		if (painfo->pa_num_workers == painfo->pa_max_workers)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d,"
+						 " since reached max_worker count %d",
+						 MyProcPid, whichplan, painfo->pa_max_workers);
+			continue;
+		}
+
+		/*
+		 * Keep track of the node with the least workers so far. For the very
+		 * first plan, choose that one as the least-workers node.
+		 */
+		if (min_whichplan == PA_INVALID_PLAN ||
+			painfo->pa_num_workers < min_workers)
+		{
+			min_whichplan = whichplan;
+			min_workers = painfo->pa_num_workers;
+		}
+	}
+
+	/* Increment worker count for the chosen node, if at all we found one. */
+	if (min_whichplan != PA_INVALID_PLAN)
+		padesc->pa_info[min_whichplan].pa_num_workers++;
+
+	/*
+	 * Save the chosen plan index. It can be PA_INVALID_PLAN, which means we
+	 * are done with all nodes (Note : this meaning applies only to *parallel*
+	 * append).
+	 */
+	state->as_whichplan = min_whichplan;
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+	SpinLockRelease(&padesc->pa_mutex);
+	elog(DEBUG2, "ParallelAppend : pid %d : Chosen plan : %d",
+				 MyProcPid, min_whichplan);
+
+	/*
+	 * If we didn't find any node to work on, it means each subplan is either
+	 * finished or has reached it's pa_max_workers. In such case, should this
+	 * worker wait for some subplan to have its worker count drop below its
+	 * pa_max_workers so that it can choose that subplan ? It turns out that
+	 * it's not worth again finding a subplan to work on. Non-partial subplan
+	 * anyway can have only one worker, and that worker will execute it to
+	 * completion. For a partial subplan, if at all it reaches pa_max_workers,
+	 * it's worker count will reduce only when it's workers find that there is
+	 * nothing more to be executed, so there is no point taking up such node if
+	 * it's worker count reduces. In conclusion, just stop executing once we
+	 * don't find nodes to work on. Indicate the same by returning false.
+	 */
+	return (min_whichplan == PA_INVALID_PLAN ? false : true);
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 30d733e..cf8d7d1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -236,6 +236,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(appendplans);
+	COPY_NODE_FIELD(num_workers);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1560ac3..38e13e0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -369,6 +369,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_NODE_FIELD(num_workers);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index dcfa6ee..8d0cda4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1539,6 +1539,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_NODE_FIELD(appendplans);
+	READ_NODE_FIELD(num_workers);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5c18987..c85271f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -98,7 +98,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_append_subpath(List *subpaths, Path *path,
+									   Bitmapset **partial_subpaths_set);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1173,6 +1174,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	Bitmapset  *partial_subpath_set = NULL;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1232,14 +1234,52 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path,
+												 NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
+		{
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-									   linitial(childrel->partial_pathlist));
+									   linitial(childrel->partial_pathlist),
+									   &partial_subpath_set);
+		}
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the first unparameterized, parallel-safe one among the
+			 * child paths.
+			 */
+			Path *parallel_safe_path = NULL;
+			foreach(lcp, childrel->pathlist)
+			{
+				Path *child_path = (Path *) lfirst(lcp);
+				if (child_path->parallel_safe &&
+					child_path->param_info == NULL)
+				{
+					parallel_safe_path = child_path;
+					break;
+				}
+			}
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_append_subpath(partial_subpaths,
+											  parallel_safe_path, NULL);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1314,7 +1354,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+		add_path(rel, (Path *) create_append_path(rel, subpaths,
+												  NULL, NULL, 0));
 
 	/*
 	 * Consider an append of partial unordered, unparameterized partial paths.
@@ -1322,26 +1363,15 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths,
+												  partial_subpath_set,
+												  NULL);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers);
+		appendpath = create_append_path(rel, partial_subpaths,
+										partial_subpath_set,
+										NULL, parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1388,12 +1418,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			subpaths = accumulate_append_subpath(subpaths, subpath, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0));
+						  create_append_path(rel, subpaths,
+											 NULL, required_outer, 0));
 	}
 }
 
@@ -1475,9 +1506,11 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				startup_neq_total = true;
 
 			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
+				accumulate_append_subpath(startup_subpaths,
+										  cheapest_startup, NULL);
 			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+				accumulate_append_subpath(total_subpaths,
+										  cheapest_total, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1568,6 +1601,43 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 	return cheapest;
 }
 
+/* concat_append_subpaths
+ * 		helper function for accumulate_append_subpath()
+ *
+ * child_partial_subpaths_set is the bitmap set to indicate which of the
+ * childpaths are partial paths. This is currently non-NULL only in case
+ * the childpaths belong to an Append path.
+ */
+static List *
+concat_append_subpaths(List *append_subpaths, List *childpaths,
+					   Bitmapset **partial_subpaths_set,
+					   Bitmapset *child_partial_subpaths_set)
+{
+	int 	i;
+	int append_subpath_len = list_length(append_subpaths);
+
+	if (partial_subpaths_set)
+	{
+		for (i = 0; i < list_length(childpaths); i++)
+		{
+			/*
+			 * The child paths themselves may or may not be partial paths. The
+			 * bitmapset numbers of these paths will need to be set considering
+			 * that these are to be appended onto the partial_subpaths_set.
+			 */
+			if (!child_partial_subpaths_set ||
+				bms_is_member(i, child_partial_subpaths_set))
+			{
+				*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
+													   append_subpath_len + i);
+			}
+		}
+	}
+
+	/* list_copy is important here to avoid sharing list substructure */
+	return list_concat(append_subpaths, list_copy(childpaths));
+}
+
 /*
  * accumulate_append_subpath
  *		Add a subpath to the list being built for an Append or MergeAppend
@@ -1581,26 +1651,34 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ *
+ * If partial_subpaths_set is not NULL, it means we are building a
+ * partial subpaths list, and so we need to add the path (or its child paths
+ * in case it's Append or MergeAppend) into the partial_subpaths bitmap set.
  */
 static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+accumulate_append_subpath(List *append_subpaths, Path *path,
+						  Bitmapset **partial_subpaths_set)
 {
 	if (IsA(path, AppendPath))
 	{
-		AppendPath *apath = (AppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((AppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  ((AppendPath*)path)->partial_subpaths);
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
-		MergeAppendPath *mpath = (MergeAppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((MergeAppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  NULL);
 	}
 	else
-		return lappend(subpaths, path);
+		return concat_append_subpaths(append_subpaths,
+									  list_make1(path),
+									  partial_subpaths_set,
+									  NULL);
 }
 
 /*
@@ -1623,7 +1701,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 458f139..974d12d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1552,6 +1553,82 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, Relids required_outer)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int parallel_divisor;
+
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up.
+			 */
+			path->startup_cost = Min(path->startup_cost,
+												  subpath->startup_cost);
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6f3c20b..37e755b 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1197,7 +1197,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fae1f67..b25d53c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -29,6 +29,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
@@ -194,7 +195,7 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, List *num_workers, List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -962,6 +963,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	Append	   *plan;
 	List	   *tlist = build_path_tlist(root, &best_path->path);
 	List	   *subplans = NIL;
+	List	   *num_workers_list;
 	ListCell   *subpaths;
 
 	/*
@@ -1000,6 +1002,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		subplans = lappend(subplans, subplan);
 	}
 
+	/* Get a list of number of workers for each of the subplans */
+	(void) get_append_num_workers(best_path->subpaths,
+								  best_path->partial_subpaths,
+								  &num_workers_list);
+
 	/*
 	 * XXX ideally, if there's just one child, we'd not bother to generate an
 	 * Append node but just return the single child.  At the moment this does
@@ -1007,7 +1014,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, num_workers_list, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5009,7 +5016,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, List *num_workers, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5019,6 +5026,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->num_workers = num_workers;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4b5902f..d397d1f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3394,10 +3394,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NULL, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 06e843d..847c4b9 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,7 +566,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -678,7 +678,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f440875..e2ead44 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1195,6 +1195,49 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ *    Optionally return the list of per-subplan worker count through
+ *    'per_subplan_workers'
+ */
+int
+get_append_num_workers(List *subpaths, Bitmapset *partial_subpaths_set,
+					   List **per_subplan_workers)
+{
+	ListCell   *lc;
+	int			total_workers = 0;
+	int			subplan_workers;
+	int			i = 0;
+
+	if (per_subplan_workers)
+		*per_subplan_workers = NIL;
+
+	foreach(lc, subpaths)
+	{
+		Path   *subpath = lfirst(lc);
+
+		if (bms_is_member(i, partial_subpaths_set))
+			subplan_workers = subpath->parallel_workers;
+		else
+			subplan_workers = 1;
+
+		if (per_subplan_workers)
+		{
+			*per_subplan_workers =
+				lappend_int(*per_subplan_workers, subplan_workers);
+		}
+		total_workers += subplan_workers;
+		i++;
+	}
+
+	/* In no case use more than max_parallel_workers_per_gather. */
+	total_workers = Min(total_workers,
+						   max_parallel_workers_per_gather);
+
+	return total_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1202,50 +1245,28 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, Bitmapset *partial_subpaths,
+				   Relids required_outer, int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
-	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
+
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
+	pathnode->partial_subpaths = partial_subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
-	foreach(l, subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(l);
-
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
-		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
-
-		/* All child paths must have same parameterization */
-		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
-	}
+	cost_append(&pathnode->path, subpaths, required_outer);
 
 	return pathnode;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 74ca4e7..3588f09 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -895,6 +895,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
+
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f9bcdd6..a21b16d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1180,12 +1181,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f72f7a8..8c06ee0 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	List	   *num_workers;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 643be54..ac0ff70 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1116,6 +1116,7 @@ typedef struct AppendPath
 {
 	Path		path;
 	List	   *subpaths;		/* list of component Paths */
+	Bitmapset  *partial_subpaths;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 39376ec..ecbda74 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,6 +66,7 @@ extern bool enable_nestloop;
 extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -98,6 +99,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths, Relids required_outer);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7b41317..425d7b9 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -61,8 +62,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern int get_append_num_workers(List *subpaths,
+				Bitmapset *partial_subpaths_set, List **per_subplan_workers);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					   List *subpaths, Bitmapset *partial_subpaths,
+					   Relids required_outer, int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 56481de..92439fe 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -1,18 +1,19 @@
 SELECT name, setting FROM pg_settings WHERE name LIKE 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(11 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(12 rows)
 
 CREATE TABLE foo2(fooid int, f2 int);
 INSERT INTO foo2 VALUES(1, 11);
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 18e21b7..f6c4b41 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -17,9 +17,9 @@ explain (costs off)
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#7)

1 attachment(s)

Re: Parallel Append implementation

v2 patch was not rebased over the latest master branch commits. Please
refer to the attached ParallelAppend_v3.patch, instead.

On 6 February 2017 at 11:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

We may want to think about a third goal: preventing too many workers
from executing the same plan. As per comment in get_parallel_divisor()
we do not see any benefit if more than 4 workers execute the same
node. So, an append node can distribute more than 4 worker nodes
equally among the available subplans. It might be better to do that as
a separate patch.

I think that comment is for calculating leader contribution. It does
not say that 4 workers is too many workers in general.

But yes, I agree, and I have it in mind as the next improvement.
Basically, it does not make sense to give more than 3 workers to a
subplan when parallel_workers for that subplan are 3. For e.g., if
gather max workers is 10, and we have 2 Append subplans s1 and s2 with
parallel workers 3 and 5 respectively. Then, with the current patch,
it will distribute 4 workers to each of these workers. What we should
do is : once both of the subplans get 3 workers each, we should give
the 7th and 8th worker to s2.

Now that I think of that, I think for implementing above, we need to
keep track of per-subplan max_workers in the Append path; and with
that, the bitmap will be redundant. Instead, it can be replaced with
max_workers. Let me check if it is easy to do that. We don't want to
have the bitmap if we are sure it would be replaced by some other data
structure.

Attached is v2 patch, which implements above. Now Append plan node
stores a list of per-subplan max worker count, rather than the
Bitmapset. But still Bitmapset turned out to be necessary for
AppendPath. More details are in the subsequent comments.

Goal A : Allow non-partial subpaths in Partial Append.
Goal B : Distribute workers across the Append subplans.
Both of these require some kind of synchronization while choosing the
next subplan. So, goal B is achieved by doing all the synchronization
stuff. And implementation of goal A requires that goal B is
implemented. So there is a dependency between these two goals. While
implementing goal B, we should keep in mind that it should also work
for goal A; it does not make sense later changing the synchronization
logic in goal A.

I am ok with splitting the patch into 2 patches :
a) changes required for goal A
b) changes required for goal B.
But I think we should split it only when we are ready to commit them
(commit for B, immediately followed by commit for A). Until then, we
should consider both of these together because they are interconnected
as explained above.

For B, we need to know, how much gain that brings and in which cases.
If that gain is not worth the complexity added, we may have to defer
Goal B. Goal A would certainly be useful since it will improve
performance of the targetted cases. The synchronization required for
Goal A is simpler than that of B and thus if we choose to implement
only A, we can live with a simpler synchronization.

For Goal A , the logic for a worker synchronously choosing a subplan will be :
Go the next subplan. If that subplan has not already assigned max
workers, choose this subplan, otherwise, go the next subplan, and so
on.
For Goal B , the logic will be :
Among the subplans which are yet to achieve max workers, choose the
subplan with the minimum number of workers currently assigned.

I don't think there is a significant difference between the complexity
of the above two algorithms. So I think here the complexity does not
look like a factor based on which we can choose the particular logic.
We should choose the logic which has more potential for benefits. The
logic for goal B will work for goal A as well. And secondly, if the
subplans are using their own different system resources, the resource
contention might be less. One case is : all subplans using different
disks. Second case is : some of the subplans may be using a foreign
scan, so it would start using foreign server resources sooner. These
benefits apply when the Gather max workers count is not sufficient for
running all the subplans in their full capacity. If they are
sufficient, then the workers will be distributed over the subplans
using both the logics. Just the order of assignments of workers to
subplans will be different.

Also, I don't see a disadvantage if we follow the logic of Goal B.

BTW, Right now, the patch does not consider non-partial paths for a
child which has partial paths. Do we know, for sure, that a path
containing partial paths for a child, which has it, is always going to
be cheaper than the one which includes non-partial path. If not,
should we build another paths which contains non-partial paths for all
child relations. This sounds like a 0/1 knapsack problem.

I didn't quite get this. We do create a non-partial Append path using
non-partial child paths anyways.

Here are some review comments

I will handle the other comments, but first, just a quick response to
some important ones :

6. By looking at parallel_worker field of a path, we can say whether it's
partial or not. We probably do not require to maintain a bitmap for that at in
the Append path. The bitmap can be constructed, if required, at the time of
creating the partial append plan. The reason to take this small step is 1. we
want to minimize our work at the time of creating paths, 2. while freeing a
path in add_path, we don't free the internal structures, in this case the
Bitmap, which will waste memory if the path is not chosen while planning.

Let me try keeping the per-subplan max_worker info in Append path
itself, like I mentioned above. If that works, the bitmap will be
replaced by max_worker field. In case of non-partial subpath,
max_worker will be 1. (this is the same info kept in AppendState node
in the patch, but now we might need to keep it in Append path node as
well).

It will be better if we can fetch that information from each subpath
when creating the plan. As I have explained before, a path is minimal
structure, which should be easily disposable, when throwing away the
path.

Now in the v2 patch, we store per-subplan worker count. But still, we
cannot use the path->parallel_workers to determine whether it's a
partial path. This is because even for a non-partial path, it seems
the parallel_workers can be non-zero. For e.g., in
create_subqueryscan_path(), it sets path->parallel_workers to
subpath->parallel_workers. But this path is added as a non-partial
path. So we need a separate info as to which of the subpaths in Append
path are partial subpaths. So in the v2 patch, I continued to use
Bitmapset in AppendPath. But in Append plan node, number of workers is
calculated using this bitmapset. Check the new function
get_append_num_workers().

7. If we consider 6, we don't need concat_append_subpaths(),

As explained above, I have kept the BitmapSet for AppendPath.

but still here are
some comments about that function. Instead of accepting two separate arguments
childpaths and child_partial_subpaths_set, which need to be in sync, we can
just pass the path which contains both of those. In the same following code may
be optimized by adding a utility function to Bitmapset, which advances
all members
by given offset and using that function with bms_union() to merge the
bitmapset e.g.
bms_union(*partial_subpaths_set,
bms_advance_members(bms_copy(child_partial_subpaths_set), append_subpath_len));
if (partial_subpaths_set)

I will get back on this after more thought.

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

WIth the current partial path costing infrastructure, it is assumed
that a partial path node should return the average per-worker cost.
Hence, I thought it would be best to do it in a similar way for
Append. But let me think if we can do something. With the current
parallelism costing infrastructure, I am not sure though.

The current parallel mechanism is in sync with that costing. Each
worker is supposed to take the same burden, hence the same (average)
cost. But it will change when a single worker has to scan an entire
child relation and different child relations have different sizes.

I gave more thought on this. Considering each subplan has different
number of workers, I think it makes sense to calculate average
per-worker cost even in parallel Append. In case of non-partial
subplan, a single worker will execute it, but it will next choose
another subplan. So on average each worker is going to process the
same number of rows, and also the same amount of CPU. And that amount
of CPU cost and rows cost should be calculated by taking the total
count and dividing it by number of workers (parallel_divsor actually).
Here are some review comments
1. struct ParallelAppendDescData is being used at other places. The declaration
style doesn't seem to be very common in the code or in the directory where the
file is located.
+struct ParallelAppendDescData
+{
+    slock_t        pa_mutex;        /* mutual exclusion to choose
next subplan */
+    parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
+};
Defining it like
typdef struct ParallelAppendDescData
{
slock_t        pa_mutex;        /* mutual exclusion to choose next
subplan */
parallel_append_info pa_info[FLEXIBLE_ARRAY_MEMBER];
};
will make its use handy. Instead of struct ParallelAppendDescData, you will
need to use just ParallelAppendDescData. May be we want to rename
parallel_append_info as ParallelAppendInfo and change the style to match other
declarations.
2. The comment below refers to the constant which it describes, which looks
odd. May be it should be worded as "A special value of
AppendState::as_whichplan, to indicate no plans left to be executed.". Also
using INVALID for "no plans left ..." seems to be a misnomer.
/*
* For Parallel Append, AppendState::as_whichplan can have PA_INVALID_PLAN
* value, which indicates there are no plans left to be executed.
*/
#define PA_INVALID_PLAN -1

3. The sentence "We have got NULL", looks odd. Probably we don't need it as
it's clear from the code above that this code deals with the case when the
current subplan didn't return any row.
/*
* We have got NULL. There might be other workers still processing the
* last chunk of rows for this same node, but there's no point for new
* workers to run this node, so mark this node as finished.
*/
4. In the same comment, I guess, the word "node" refers to "subnode" and not
the node pointed by variable "node". May be you want to use word "subplan"
here.

4. set_finished()'s prologue has different indentation compared to other
functions in the file.

5. Multilevel comment starts with an empty line.
+ /* Keep track of the node with the least workers so far. For the very
Done 1. to 5. above, as per your suggestions.

9. Shouldn't this funciton return double?
int
get_parallel_divisor(int parallel_workers)

v2 patch is rebased on latest master branch, which already contains
this function returning double.
10. We should probably move the parallel_safe calculation out of cost_append().
+            path->parallel_safe = path->parallel_safe &&
+                                  subpath->parallel_safe;
11. This check shouldn't be part of cost_append().
+            /* All child paths must have same parameterization */
+            Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
Yet to handle the above ones.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v3.patchapplication/octet-stream; name=ParallelAppend_v3.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index e01fe6d..0b50ab9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeSeqscan.h"
@@ -201,6 +202,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -249,6 +254,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecSeqScanInitializeDSM((SeqScanState *) planstate,
 										 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
@@ -725,6 +734,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 			case T_SeqScanState:
 				ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6986cae..97bfc89 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,48 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendInfo
+{
+	int		pa_num_workers;		/* workers currently executing the subplan */
+	int		pa_max_workers;		/* max workers that should run the subplan */
+}	ParallelAppendInfo;
+
+typedef struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+
+static void exec_append_scan_first(AppendState *appendstate);
 static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
 
+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+	appendstate->as_whichplan = 0;
+}
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -77,6 +116,22 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+		return parallel_append_next(appendstate);
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -178,8 +233,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	/*
 	 * initialize to scan first subplan
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	exec_append_scan_first(appendstate);
 
 	return appendstate;
 }
@@ -198,6 +252,14 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		/* Check if we are already finished plans from parallel append */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : all plans already finished",
+						 MyProcPid);
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
@@ -219,14 +281,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -270,6 +336,7 @@ ExecReScanAppend(AppendState *node)
 	for (i = 0; i < node->as_nplans; i++)
 	{
 		PlanState  *subnode = node->appendplans[i];
+		ParallelAppendDesc padesc = node->as_padesc;
 
 		/*
 		 * ExecReScan doesn't know about my subplans, so I have to do
@@ -284,7 +351,223 @@ ExecReScanAppend(AppendState *node)
 		 */
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
+
+		if (padesc)
+		{
+			/*
+			 * Just setting all the number of workers to 0 is enough. The logic
+			 * of choosing the next plan will take care of everything else.
+			 * pa_max_workers is already set initially.
+			 */
+			padesc->pa_info[i].pa_num_workers = 0;
+		}
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	exec_append_scan_first(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_info),
+				 sizeof(*node->as_padesc->pa_info) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+	List	   *num_workers_list = ((Append*)node->ps.plan)->num_workers;
+	ListCell   *lc;
+	int			i;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	Assert(node->as_nplans == list_length(num_workers_list));
+
+	i = 0;
+	foreach(lc, num_workers_list)
+	{
+		/* Initialize the max workers count for each subplan. */
+		padesc->pa_info[i].pa_max_workers = lfirst_int(lc);
+
+		/*
+		 * Also, initialize current number of workers. Just setting all the
+		 * number of workers to 0 is enough. The logic of choosing the next
+		 * plan in workers will take care of initializing everything else.
+		 */
+		padesc->pa_info[i].pa_num_workers = 0;
+
+		i++;
+	}
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ *		Indicate that this child plan node is about to be finished, so no other
+ *		workers should take up this node. Workers who are already executing
+ *		this node will continue to do so, but workers looking for next nodes to
+ *		pick up would skip this node after this function is called. It is
+ *		possible that multiple workers call this function for the same node at
+ *		the same time, because these workers were executing the same node and
+ *		they finished with it at the same time. The spinlock is not for this
+ *		purpose. The spinlock is used so that it does not change the
+ *		pa_num_workers field while workers are choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	elog(DEBUG2, "Parallelappend : pid %d : finishing plan %d",
+				 MyProcPid, whichplan);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_info[whichplan].pa_num_workers = -1;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the optimal subplan that should be executed. The logic is to
+ *		choose the subplan that is being executed by the least number of
+ *		workers.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		min_whichplan = PA_INVALID_PLAN;
+	int		min_workers = -1; /* Keep compiler quiet */
+
+	Assert(padesc != NULL);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Choose the plan with the least number of workers */
+	for (whichplan = 0; whichplan < state->as_nplans; whichplan++)
+	{
+		ParallelAppendInfo *painfo = &padesc->pa_info[whichplan];
+
+		/* Ignore plans that are already done processing */
+		if (painfo->pa_num_workers == -1)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d"
+						 " since pa_num_workers is -1",
+						 MyProcPid, whichplan);
+			continue;
+		}
+
+		/* Ignore plans that are already being processed by max_workers */
+		if (painfo->pa_num_workers == painfo->pa_max_workers)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d,"
+						 " since reached max_worker count %d",
+						 MyProcPid, whichplan, painfo->pa_max_workers);
+			continue;
+		}
+
+		/*
+		 * Keep track of the node with the least workers so far. For the very
+		 * first plan, choose that one as the least-workers node.
+		 */
+		if (min_whichplan == PA_INVALID_PLAN ||
+			painfo->pa_num_workers < min_workers)
+		{
+			min_whichplan = whichplan;
+			min_workers = painfo->pa_num_workers;
+		}
+	}
+
+	/* Increment worker count for the chosen node, if at all we found one. */
+	if (min_whichplan != PA_INVALID_PLAN)
+		padesc->pa_info[min_whichplan].pa_num_workers++;
+
+	/*
+	 * Save the chosen plan index. It can be PA_INVALID_PLAN, which means we
+	 * are done with all nodes (Note : this meaning applies only to *parallel*
+	 * append).
+	 */
+	state->as_whichplan = min_whichplan;
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+	SpinLockRelease(&padesc->pa_mutex);
+	elog(DEBUG2, "ParallelAppend : pid %d : Chosen plan : %d",
+				 MyProcPid, min_whichplan);
+
+	/*
+	 * If we didn't find any node to work on, it means each subplan is either
+	 * finished or has reached it's pa_max_workers. In such case, should this
+	 * worker wait for some subplan to have its worker count drop below its
+	 * pa_max_workers so that it can choose that subplan ? It turns out that
+	 * it's not worth again finding a subplan to work on. Non-partial subplan
+	 * anyway can have only one worker, and that worker will execute it to
+	 * completion. For a partial subplan, if at all it reaches pa_max_workers,
+	 * it's worker count will reduce only when it's workers find that there is
+	 * nothing more to be executed, so there is no point taking up such node if
+	 * it's worker count reduces. In conclusion, just stop executing once we
+	 * don't find nodes to work on. Indicate the same by returning false.
+	 */
+	return (min_whichplan == PA_INVALID_PLAN ? false : true);
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 30d733e..cf8d7d1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -236,6 +236,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(appendplans);
+	COPY_NODE_FIELD(num_workers);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1560ac3..38e13e0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -369,6 +369,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_NODE_FIELD(num_workers);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index dcfa6ee..8d0cda4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1539,6 +1539,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_NODE_FIELD(appendplans);
+	READ_NODE_FIELD(num_workers);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5c18987..c85271f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -98,7 +98,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_append_subpath(List *subpaths, Path *path,
+									   Bitmapset **partial_subpaths_set);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1173,6 +1174,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	Bitmapset  *partial_subpath_set = NULL;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1232,14 +1234,52 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path,
+												 NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
+		{
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-									   linitial(childrel->partial_pathlist));
+									   linitial(childrel->partial_pathlist),
+									   &partial_subpath_set);
+		}
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the first unparameterized, parallel-safe one among the
+			 * child paths.
+			 */
+			Path *parallel_safe_path = NULL;
+			foreach(lcp, childrel->pathlist)
+			{
+				Path *child_path = (Path *) lfirst(lcp);
+				if (child_path->parallel_safe &&
+					child_path->param_info == NULL)
+				{
+					parallel_safe_path = child_path;
+					break;
+				}
+			}
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_append_subpath(partial_subpaths,
+											  parallel_safe_path, NULL);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1314,7 +1354,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+		add_path(rel, (Path *) create_append_path(rel, subpaths,
+												  NULL, NULL, 0));
 
 	/*
 	 * Consider an append of partial unordered, unparameterized partial paths.
@@ -1322,26 +1363,15 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths,
+												  partial_subpath_set,
+												  NULL);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers);
+		appendpath = create_append_path(rel, partial_subpaths,
+										partial_subpath_set,
+										NULL, parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1388,12 +1418,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			subpaths = accumulate_append_subpath(subpaths, subpath, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0));
+						  create_append_path(rel, subpaths,
+											 NULL, required_outer, 0));
 	}
 }
 
@@ -1475,9 +1506,11 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				startup_neq_total = true;
 
 			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
+				accumulate_append_subpath(startup_subpaths,
+										  cheapest_startup, NULL);
 			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+				accumulate_append_subpath(total_subpaths,
+										  cheapest_total, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1568,6 +1601,43 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 	return cheapest;
 }
 
+/* concat_append_subpaths
+ * 		helper function for accumulate_append_subpath()
+ *
+ * child_partial_subpaths_set is the bitmap set to indicate which of the
+ * childpaths are partial paths. This is currently non-NULL only in case
+ * the childpaths belong to an Append path.
+ */
+static List *
+concat_append_subpaths(List *append_subpaths, List *childpaths,
+					   Bitmapset **partial_subpaths_set,
+					   Bitmapset *child_partial_subpaths_set)
+{
+	int 	i;
+	int append_subpath_len = list_length(append_subpaths);
+
+	if (partial_subpaths_set)
+	{
+		for (i = 0; i < list_length(childpaths); i++)
+		{
+			/*
+			 * The child paths themselves may or may not be partial paths. The
+			 * bitmapset numbers of these paths will need to be set considering
+			 * that these are to be appended onto the partial_subpaths_set.
+			 */
+			if (!child_partial_subpaths_set ||
+				bms_is_member(i, child_partial_subpaths_set))
+			{
+				*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
+													   append_subpath_len + i);
+			}
+		}
+	}
+
+	/* list_copy is important here to avoid sharing list substructure */
+	return list_concat(append_subpaths, list_copy(childpaths));
+}
+
 /*
  * accumulate_append_subpath
  *		Add a subpath to the list being built for an Append or MergeAppend
@@ -1581,26 +1651,34 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ *
+ * If partial_subpaths_set is not NULL, it means we are building a
+ * partial subpaths list, and so we need to add the path (or its child paths
+ * in case it's Append or MergeAppend) into the partial_subpaths bitmap set.
  */
 static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+accumulate_append_subpath(List *append_subpaths, Path *path,
+						  Bitmapset **partial_subpaths_set)
 {
 	if (IsA(path, AppendPath))
 	{
-		AppendPath *apath = (AppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((AppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  ((AppendPath*)path)->partial_subpaths);
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
-		MergeAppendPath *mpath = (MergeAppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((MergeAppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  NULL);
 	}
 	else
-		return lappend(subpaths, path);
+		return concat_append_subpaths(append_subpaths,
+									  list_make1(path),
+									  partial_subpaths_set,
+									  NULL);
 }
 
 /*
@@ -1623,7 +1701,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a43daa7..895e5e6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1515,6 +1516,82 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, Relids required_outer)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int parallel_divisor;
+
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up.
+			 */
+			path->startup_cost = Min(path->startup_cost,
+												  subpath->startup_cost);
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6f3c20b..37e755b 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1197,7 +1197,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 997bdcf..d3491d5 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -29,6 +29,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
@@ -194,7 +195,7 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, List *num_workers, List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -962,6 +963,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	Append	   *plan;
 	List	   *tlist = build_path_tlist(root, &best_path->path);
 	List	   *subplans = NIL;
+	List	   *num_workers_list;
 	ListCell   *subpaths;
 
 	/*
@@ -1000,6 +1002,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		subplans = lappend(subplans, subplan);
 	}
 
+	/* Get a list of number of workers for each of the subplans */
+	(void) get_append_num_workers(best_path->subpaths,
+								  best_path->partial_subpaths,
+								  &num_workers_list);
+
 	/*
 	 * XXX ideally, if there's just one child, we'd not bother to generate an
 	 * Append node but just return the single child.  At the moment this does
@@ -1007,7 +1014,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, num_workers_list, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5009,7 +5016,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, List *num_workers, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5019,6 +5026,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->num_workers = num_workers;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 881742f..329e7d8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3380,10 +3380,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NULL, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 06e843d..847c4b9 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,7 +566,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -678,7 +678,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f440875..e2ead44 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1195,6 +1195,49 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ *    Optionally return the list of per-subplan worker count through
+ *    'per_subplan_workers'
+ */
+int
+get_append_num_workers(List *subpaths, Bitmapset *partial_subpaths_set,
+					   List **per_subplan_workers)
+{
+	ListCell   *lc;
+	int			total_workers = 0;
+	int			subplan_workers;
+	int			i = 0;
+
+	if (per_subplan_workers)
+		*per_subplan_workers = NIL;
+
+	foreach(lc, subpaths)
+	{
+		Path   *subpath = lfirst(lc);
+
+		if (bms_is_member(i, partial_subpaths_set))
+			subplan_workers = subpath->parallel_workers;
+		else
+			subplan_workers = 1;
+
+		if (per_subplan_workers)
+		{
+			*per_subplan_workers =
+				lappend_int(*per_subplan_workers, subplan_workers);
+		}
+		total_workers += subplan_workers;
+		i++;
+	}
+
+	/* In no case use more than max_parallel_workers_per_gather. */
+	total_workers = Min(total_workers,
+						   max_parallel_workers_per_gather);
+
+	return total_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1202,50 +1245,28 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, Bitmapset *partial_subpaths,
+				   Relids required_outer, int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
-	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
+
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
+	pathnode->partial_subpaths = partial_subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
-	foreach(l, subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(l);
-
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
-		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
-
-		/* All child paths must have same parameterization */
-		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
-	}
+	cost_append(&pathnode->path, subpaths, required_outer);
 
 	return pathnode;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c53aede..97e5a39 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -895,6 +895,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
+
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f9bcdd6..a21b16d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1180,12 +1181,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f72f7a8..8c06ee0 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	List	   *num_workers;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 643be54..ac0ff70 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1116,6 +1116,7 @@ typedef struct AppendPath
 {
 	Path		path;
 	List	   *subpaths;		/* list of component Paths */
+	Bitmapset  *partial_subpaths;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 0e68264..875d3ed 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,6 +66,7 @@ extern bool enable_nestloop;
 extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -98,6 +99,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths, Relids required_outer);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7b41317..425d7b9 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -61,8 +62,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern int get_append_num_workers(List *subpaths,
+				Bitmapset *partial_subpaths_set, List **per_subplan_workers);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					   List *subpaths, Bitmapset *partial_subpaths,
+					   Relids required_outer, int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 18e21b7..f6c4b41 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -17,9 +17,9 @@ explain (costs off)
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index d48abd7..7a303fa 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,20 +70,21 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(11 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(12 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Amit Khandekar (#7)

Re: Parallel Append implementation

On Mon, Feb 6, 2017 at 11:06 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

We may want to think about a third goal: preventing too many workers
from executing the same plan. As per comment in get_parallel_divisor()
we do not see any benefit if more than 4 workers execute the same
node. So, an append node can distribute more than 4 worker nodes
equally among the available subplans. It might be better to do that as
a separate patch.

I think that comment is for calculating leader contribution. It does
not say that 4 workers is too many workers in general.

But yes, I agree, and I have it in mind as the next improvement.
Basically, it does not make sense to give more than 3 workers to a
subplan when parallel_workers for that subplan are 3. For e.g., if
gather max workers is 10, and we have 2 Append subplans s1 and s2 with
parallel workers 3 and 5 respectively. Then, with the current patch,
it will distribute 4 workers to each of these workers. What we should
do is : once both of the subplans get 3 workers each, we should give
the 7th and 8th worker to s2.

Now that I think of that, I think for implementing above, we need to
keep track of per-subplan max_workers in the Append path; and with
that, the bitmap will be redundant. Instead, it can be replaced with
max_workers. Let me check if it is easy to do that. We don't want to
have the bitmap if we are sure it would be replaced by some other data
structure.

Attached is v2 patch, which implements above. Now Append plan node
stores a list of per-subplan max worker count, rather than the
Bitmapset. But still Bitmapset turned out to be necessary for
AppendPath. More details are in the subsequent comments.

Goal A : Allow non-partial subpaths in Partial Append.
Goal B : Distribute workers across the Append subplans.
Both of these require some kind of synchronization while choosing the
next subplan. So, goal B is achieved by doing all the synchronization
stuff. And implementation of goal A requires that goal B is
implemented. So there is a dependency between these two goals. While
implementing goal B, we should keep in mind that it should also work
for goal A; it does not make sense later changing the synchronization
logic in goal A.

I am ok with splitting the patch into 2 patches :
a) changes required for goal A
b) changes required for goal B.
But I think we should split it only when we are ready to commit them
(commit for B, immediately followed by commit for A). Until then, we
should consider both of these together because they are interconnected
as explained above.

For B, we need to know, how much gain that brings and in which cases.
If that gain is not worth the complexity added, we may have to defer
Goal B. Goal A would certainly be useful since it will improve
performance of the targetted cases. The synchronization required for
Goal A is simpler than that of B and thus if we choose to implement
only A, we can live with a simpler synchronization.

For Goal A , the logic for a worker synchronously choosing a subplan will be :
Go the next subplan. If that subplan has not already assigned max
workers, choose this subplan, otherwise, go the next subplan, and so
on.

Right, at a given time, we have to remember only the next plan to
assign worker to. That's simpler than remembering the number of
workers for each subplan and updating those concurrently. That's why I
am saying synchronization for A is simpler than that of B.

For Goal B , the logic will be :
Among the subplans which are yet to achieve max workers, choose the
subplan with the minimum number of workers currently assigned.

I don't think there is a significant difference between the complexity
of the above two algorithms. So I think here the complexity does not
look like a factor based on which we can choose the particular logic.
We should choose the logic which has more potential for benefits. The
logic for goal B will work for goal A as well. And secondly, if the
subplans are using their own different system resources, the resource
contention might be less. One case is : all subplans using different
disks. Second case is : some of the subplans may be using a foreign
scan, so it would start using foreign server resources sooner. These
benefits apply when the Gather max workers count is not sufficient for
running all the subplans in their full capacity. If they are
sufficient, then the workers will be distributed over the subplans
using both the logics. Just the order of assignments of workers to
subplans will be different.

Also, I don't see a disadvantage if we follow the logic of Goal B.

Do we have any performance measurements where we see that Goal B
performs better than Goal A, in such a situation? Do we have any
performance measurement comparing these two approaches in other
situations. If implementation for Goal B beats that of Goal A always,
we can certainly implement it directly. But it may not. Also,
separating patches for Goal A and Goal B might make reviews easier.

BTW, Right now, the patch does not consider non-partial paths for a
child which has partial paths. Do we know, for sure, that a path
containing partial paths for a child, which has it, is always going to
be cheaper than the one which includes non-partial path. If not,
should we build another paths which contains non-partial paths for all
child relations. This sounds like a 0/1 knapsack problem.

I didn't quite get this. We do create a non-partial Append path using
non-partial child paths anyways.

Let's say a given child-relation has both partial and non-partial
paths, your approach would always pick up a partial path. But now that
parallel append can handle non-partial paths as well, it may happen
that picking up non-partial path instead of partial one when both are
available gives an overall better performance. Have we ruled out that
possibility.

Here are some review comments

I will handle the other comments, but first, just a quick response to
some important ones :

6. By looking at parallel_worker field of a path, we can say whether it's
partial or not. We probably do not require to maintain a bitmap for that at in
the Append path. The bitmap can be constructed, if required, at the time of
creating the partial append plan. The reason to take this small step is 1. we
want to minimize our work at the time of creating paths, 2. while freeing a
path in add_path, we don't free the internal structures, in this case the
Bitmap, which will waste memory if the path is not chosen while planning.

Let me try keeping the per-subplan max_worker info in Append path
itself, like I mentioned above. If that works, the bitmap will be
replaced by max_worker field. In case of non-partial subpath,
max_worker will be 1. (this is the same info kept in AppendState node
in the patch, but now we might need to keep it in Append path node as
well).

It will be better if we can fetch that information from each subpath
when creating the plan. As I have explained before, a path is minimal
structure, which should be easily disposable, when throwing away the
path.

Now in the v2 patch, we store per-subplan worker count. But still, we
cannot use the path->parallel_workers to determine whether it's a
partial path. This is because even for a non-partial path, it seems
the parallel_workers can be non-zero. For e.g., in
create_subqueryscan_path(), it sets path->parallel_workers to
subpath->parallel_workers. But this path is added as a non-partial
path. So we need a separate info as to which of the subpaths in Append
path are partial subpaths. So in the v2 patch, I continued to use
Bitmapset in AppendPath. But in Append plan node, number of workers is
calculated using this bitmapset. Check the new function
get_append_num_workers().

If the subpath from childrel->partial_pathlist, then we set the
corresponding bit in the bitmap. Now we can infer that for any path if
that path is found in path->parent->partial_pathlist. Since the code
always chooses the first partial path, the search in partial_pathlist
should not affect performance. So, we can avoid maintaining a bitmap
in the path and keep accumulating it when collapsing append paths.

7. If we consider 6, we don't need concat_append_subpaths(),

As explained above, I have kept the BitmapSet for AppendPath.

but still here are
some comments about that function. Instead of accepting two separate arguments
childpaths and child_partial_subpaths_set, which need to be in sync, we can
just pass the path which contains both of those. In the same following code may
be optimized by adding a utility function to Bitmapset, which advances
all members
by given offset and using that function with bms_union() to merge the
bitmapset e.g.
bms_union(*partial_subpaths_set,
bms_advance_members(bms_copy(child_partial_subpaths_set), append_subpath_len));
if (partial_subpaths_set)

I will get back on this after more thought.

Another possibility, you could use a loop like offset_relid_set(),
using bms_next_member(). That way we could combine the for loop and
bms_is_member() call into a loop over bms_next_member().

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

WIth the current partial path costing infrastructure, it is assumed
that a partial path node should return the average per-worker cost.
Hence, I thought it would be best to do it in a similar way for
Append. But let me think if we can do something. With the current
parallelism costing infrastructure, I am not sure though.

The current parallel mechanism is in sync with that costing. Each
worker is supposed to take the same burden, hence the same (average)
cost. But it will change when a single worker has to scan an entire
child relation and different child relations have different sizes.

I gave more thought on this. Considering each subplan has different
number of workers, I think it makes sense to calculate average
per-worker cost even in parallel Append. In case of non-partial
subplan, a single worker will execute it, but it will next choose
another subplan. So on average each worker is going to process the
same number of rows, and also the same amount of CPU. And that amount
of CPU cost and rows cost should be calculated by taking the total
count and dividing it by number of workers (parallel_divsor actually).

That's not entirely true. Consider N child relations with chosen paths
with costs C1, C2, ... CN which are very very different. If there are
N workers, the total cost should correspond to the highest of the
costs of subpaths, since no worker will execute more than one plan.
The unfortunate worker which executes the costliest path would take
the longest time. The cost of parallel append should reflect that. The
patch does not make any attempt to distribute workers based on the
actual load, so such skews should be considered into costing. I don't
think we can do anything to the condition I explained.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#7)

Re: Parallel Append implementation

On Mon, Feb 6, 2017 at 12:36 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Now that I think of that, I think for implementing above, we need to
keep track of per-subplan max_workers in the Append path; and with
that, the bitmap will be redundant. Instead, it can be replaced with
max_workers. Let me check if it is easy to do that. We don't want to
have the bitmap if we are sure it would be replaced by some other data
structure.

Attached is v2 patch, which implements above. Now Append plan node
stores a list of per-subplan max worker count, rather than the
Bitmapset. But still Bitmapset turned out to be necessary for
AppendPath. More details are in the subsequent comments.

Keep in mind that, for a non-partial path, the cap of 1 worker for
that subplan is a hard limit. Anything more will break the world.
But for a partial plan, the limit -- whether 1 or otherwise -- is a
soft limit. It may not help much to route more workers to that node,
and conceivably it could even hurt, but it shouldn't yield any
incorrect result. I'm not sure it's a good idea to conflate those two
things. For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high. Had those two tables been
a single unpartitioned table, I would have picked 4 or 5 workers, not
7. On the other hand, if I pick parallel_workers of 4 or 5 for the
Parallel Append, and I finish with the larger table first, I think I
might as well throw all 4 of those workers at the smaller table even
though it would normally have only used 3 workers. Having the extra
1-2 workers exist does not seem better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Robert Haas (#10)

Re: Parallel Append implementation

On Tue, Feb 14, 2017 at 12:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Having the extra
1-2 workers exist does not seem better.

Err, exit, not exist.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#10)

Re: Parallel Append implementation

On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Feb 6, 2017 at 12:36 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Now that I think of that, I think for implementing above, we need to
keep track of per-subplan max_workers in the Append path; and with
that, the bitmap will be redundant. Instead, it can be replaced with
max_workers. Let me check if it is easy to do that. We don't want to
have the bitmap if we are sure it would be replaced by some other data
structure.

Attached is v2 patch, which implements above. Now Append plan node
stores a list of per-subplan max worker count, rather than the
Bitmapset. But still Bitmapset turned out to be necessary for
AppendPath. More details are in the subsequent comments.

Keep in mind that, for a non-partial path, the cap of 1 worker for
that subplan is a hard limit. Anything more will break the world.
But for a partial plan, the limit -- whether 1 or otherwise -- is a
soft limit. It may not help much to route more workers to that node,
and conceivably it could even hurt, but it shouldn't yield any
incorrect result. I'm not sure it's a good idea to conflate those two
things.

Yes, the logic that I used in the patch assumes that
"Path->parallel_workers field not only suggests how many workers to
allocate, but also prevents allocation of too many workers for that
path". For seqscan path, this field is calculated based on the
relation pages count. I believe the theory is that, too many workers
might even slow down the parallel scan. And the same theory would be
applied for calculating other types of low-level paths like index
scan.

The only reason I combined the soft limit and the hard limit is
because it is not necessary to have two different fields. But of
course this is again under the assumption that allocating more than
parallel_workers would never improve the speed, in fact it can even
slow it down.

Do we have such a case currently where the actual number of workers
launched turns out to be *more* than Path->parallel_workers ?

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high. Had those two tables been
a single unpartitioned table, I would have picked 4 or 5 workers, not
7. On the other hand, if I pick parallel_workers of 4 or 5 for the
Parallel Append, and I finish with the larger table first, I think I
might as well throw all 4 of those workers at the smaller table even
though it would normally have only used 3 workers.

Having the extra 1-2 workers exit does not seem better.

It is here, where I didn't understand exactly why would we want to
assign these extra workers to a subplan which tells use that it is
already being run by 'parallel_workers' number of workers.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#12)

Re: Parallel Append implementation

On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote:

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high.

In the patch, in such case, 7 workers are indeed selected for Parallel
Append path, so that both the subplans are able to execute in parallel
with their full worker capacity. Are you suggesting that we should not
?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#12)

Re: Parallel Append implementation

On Wed, Feb 15, 2017 at 2:33 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

The only reason I combined the soft limit and the hard limit is
because it is not necessary to have two different fields. But of
course this is again under the assumption that allocating more than
parallel_workers would never improve the speed, in fact it can even
slow it down.

That could be true in extreme cases, but in general I think it's probably false.

Do we have such a case currently where the actual number of workers
launched turns out to be *more* than Path->parallel_workers ?

No.

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high. Had those two tables been
a single unpartitioned table, I would have picked 4 or 5 workers, not
7. On the other hand, if I pick parallel_workers of 4 or 5 for the
Parallel Append, and I finish with the larger table first, I think I
might as well throw all 4 of those workers at the smaller table even
though it would normally have only used 3 workers.

Having the extra 1-2 workers exit does not seem better.

It is here, where I didn't understand exactly why would we want to
assign these extra workers to a subplan which tells use that it is
already being run by 'parallel_workers' number of workers.

The decision to use fewer workers for a smaller scan isn't really
because we think that using more workers will cause a regression.
It's because we think it may not help very much, and because it's not
worth firing up a ton of workers for a relatively small scan given
that workers are a limited resource. I think once we've got a bunch
of workers started, we might as well try to use them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#13)

Re: Parallel Append implementation

On Wed, Feb 15, 2017 at 4:43 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote:

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high.

In the patch, in such case, 7 workers are indeed selected for Parallel
Append path, so that both the subplans are able to execute in parallel
with their full worker capacity. Are you suggesting that we should not
?

Absolutely. I think that's going to be way too many workers. Imagine
that there are 100 child tables and each one is big enough to qualify
for 2 or 3 workers. No matter what value the user has selected for
max_parallel_workers_per_gather, they should not get a scan involving
200 workers.

What I was thinking about is something like this:

1. First, take the maximum parallel_workers value from among all the children.

2. Second, compute log2(num_children)+1 and round up. So, for 1
child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
for 9-16 children, 5, and so on.

3. Use as the number of parallel workers for the children the maximum
of the value computed in step 1 and the value computed in step 2.

With this approach, a plan with 100 children qualifies for 8 parallel
workers (unless one of the children individually qualifies for some
larger number, or unless max_parallel_workers_per_gather is set to a
smaller value). That seems fairly reasonable to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#15)

Re: Parallel Append implementation

On Wed, Feb 15, 2017 at 6:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 15, 2017 at 4:43 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote:

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high.

In the patch, in such case, 7 workers are indeed selected for Parallel
Append path, so that both the subplans are able to execute in parallel
with their full worker capacity. Are you suggesting that we should not
?

Absolutely. I think that's going to be way too many workers. Imagine
that there are 100 child tables and each one is big enough to qualify
for 2 or 3 workers. No matter what value the user has selected for
max_parallel_workers_per_gather, they should not get a scan involving
200 workers.

If the user is ready throw 200 workers and if the subplans can use
them to speed up the query 200 times (obviously I am exaggerating),
why not to use those? When the user set
max_parallel_workers_per_gather to that high a number, he meant it to
be used by a gather, and that's what we should be doing.

What I was thinking about is something like this:

1. First, take the maximum parallel_workers value from among all the children.

2. Second, compute log2(num_children)+1 and round up. So, for 1
child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
for 9-16 children, 5, and so on.

Can you please explain the rationale behind this maths?

3. Use as the number of parallel workers for the children the maximum
of the value computed in step 1 and the value computed in step 2.

With this approach, a plan with 100 children qualifies for 8 parallel
workers (unless one of the children individually qualifies for some
larger number, or unless max_parallel_workers_per_gather is set to a
smaller value). That seems fairly reasonable to me.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#15)

Re: Parallel Append implementation

On 15 February 2017 at 18:40, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 15, 2017 at 4:43 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote:

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high.

In the patch, in such case, 7 workers are indeed selected for Parallel
Append path, so that both the subplans are able to execute in parallel
with their full worker capacity. Are you suggesting that we should not
?

Absolutely. I think that's going to be way too many workers. Imagine
that there are 100 child tables and each one is big enough to qualify
for 2 or 3 workers. No matter what value the user has selected for
max_parallel_workers_per_gather, they should not get a scan involving
200 workers.

What I was thinking about is something like this:

1. First, take the maximum parallel_workers value from among all the children.

2. Second, compute log2(num_children)+1 and round up. So, for 1
child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
for 9-16 children, 5, and so on.

3. Use as the number of parallel workers for the children the maximum
of the value computed in step 1 and the value computed in step 2.

Ah, now that I closely look at compute_parallel_worker(), I see what
you are getting at.

For plain unpartitioned table, parallel_workers is calculated as
roughly equal to log(num_pages) (actually it is log3). So if the table
size is n, the workers will be log(n). So if it is partitioned into p
partitions of size n/p each, still the number of workers should be
log(n). Whereas, in the patch, it is calculated as (total of all the
child workers) i.e. n * log(n/p) for this case. But log(n) != p *
log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
log(300).

That means, the way it is calculated in the patch turns out to be much
larger than if it were calculated using log(total of sizes of all
children). So I think for the step 2 above, log(total_rel_size)
formula seems to be appropriate. What do you think ? For
compute_parallel_worker(), it is actually log3 by the way.

BTW this formula is just an extension of how parallel_workers is
calculated for an unpartitioned table.

For example, suppose that I have a scan of two children, one
of which has parallel_workers of 4, and the other of which has
parallel_workers of 3. If I pick parallel_workers of 7 for the
Parallel Append, that's probably too high. Had those two tables been
a single unpartitioned table, I would have picked 4 or 5 workers, not
7. On the other hand, if I pick parallel_workers of 4 or 5 for the
Parallel Append, and I finish with the larger table first, I think I
might as well throw all 4 of those workers at the smaller table even
though it would normally have only used 3 workers.

Having the extra 1-2 workers exit does not seem better.

It is here, where I didn't understand exactly why would we want to
assign these extra workers to a subplan which tells use that it is
already being run by 'parallel_workers' number of workers.

The decision to use fewer workers for a smaller scan isn't really
because we think that using more workers will cause a regression.
It's because we think it may not help very much, and because it's not
worth firing up a ton of workers for a relatively small scan given
that workers are a limited resource. I think once we've got a bunch
of workers started, we might as well try to use them.

One possible side-effect I see due to this is : Other sessions might
not get a fair share of workers due to this. But again, there might be
counter argument that, because Append is now focussing all the workers
on a last subplan, it may finish faster, and release *all* of its
workers earlier.

BTW, there is going to be some logic change in the choose-next-subplan
algorithm if we consider giving extra workers to subplans.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#16)

Re: Parallel Append implementation

On Wed, Feb 15, 2017 at 11:15 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

If the user is ready throw 200 workers and if the subplans can use
them to speed up the query 200 times (obviously I am exaggerating),
why not to use those? When the user set
max_parallel_workers_per_gather to that high a number, he meant it to
be used by a gather, and that's what we should be doing.

The reason is because of what Amit Khandekar wrote in his email -- you
get a result with a partitioned table that is wildly inconsistent with
the result you get for an unpartitioned table. You could equally well
argue that if the user sets max_parallel_workers_per_gather to 200,
and there's a parallel sequential scan of an 8MB table to be
performed, we ought to use all 200 workers for that. But the planner
in fact estimates a much lesser number of workers, because using 200
workers for that task wastes a lot of resources for no real
performance benefit. If you partition that 8MB table into 100 tables
that are each 80kB, that shouldn't radically increase the number of
workers that get used.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#17)

Re: Parallel Append implementation

On Thu, Feb 16, 2017 at 1:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

What I was thinking about is something like this:

1. First, take the maximum parallel_workers value from among all the children.

2. Second, compute log2(num_children)+1 and round up. So, for 1
child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
for 9-16 children, 5, and so on.

3. Use as the number of parallel workers for the children the maximum
of the value computed in step 1 and the value computed in step 2.

Ah, now that I closely look at compute_parallel_worker(), I see what
you are getting at.

For plain unpartitioned table, parallel_workers is calculated as
roughly equal to log(num_pages) (actually it is log3). So if the table
size is n, the workers will be log(n). So if it is partitioned into p
partitions of size n/p each, still the number of workers should be
log(n). Whereas, in the patch, it is calculated as (total of all the
child workers) i.e. n * log(n/p) for this case. But log(n) != p *
log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
log(300).

That means, the way it is calculated in the patch turns out to be much
larger than if it were calculated using log(total of sizes of all
children). So I think for the step 2 above, log(total_rel_size)
formula seems to be appropriate. What do you think ? For
compute_parallel_worker(), it is actually log3 by the way.

BTW this formula is just an extension of how parallel_workers is
calculated for an unpartitioned table.

log(total_rel_size) would be a reasonable way to estimate workers when
we're scanning an inheritance hierarchy, but I'm hoping Parallel
Append is also going to apply to UNION ALL queries, where there's no
concept of the total rel size. For that we need something else, which
is why the algorithm that I proposed upthread doesn't rely on it.

The decision to use fewer workers for a smaller scan isn't really
because we think that using more workers will cause a regression.
It's because we think it may not help very much, and because it's not
worth firing up a ton of workers for a relatively small scan given
that workers are a limited resource. I think once we've got a bunch
of workers started, we might as well try to use them.

One possible side-effect I see due to this is : Other sessions might
not get a fair share of workers due to this. But again, there might be
counter argument that, because Append is now focussing all the workers
on a last subplan, it may finish faster, and release *all* of its
workers earlier.

Right. I think in general it's pretty clear that there are possible
fairness problems with parallel query. The first process that comes
along seizes however many workers it thinks it should use, and
everybody else can use whatever (if anything) is left. In the long
run, I think it would be cool to have a system where workers can leave
one parallel query in progress and join a different one (or exit and
spawn a new worker to join a different one), automatically rebalancing
as the number of parallel queries in flight fluctuates. But that's
clearly way beyond anything we can do right now. I think we should
assume that any parallel workers our process has obtained are ours to
use for the duration of the query, and use them as best we can. Note
that even if the Parallel Append tells one of the workers that there
are no more tuples and it should go away, some higher level of the
query plan could make a different choice anyway; there might be
another Append elsewhere in the plan tree.

BTW, there is going to be some logic change in the choose-next-subplan
algorithm if we consider giving extra workers to subplans.

I'm not sure that it's going to be useful to make this logic very
complicated. I think the most important thing is to give 1 worker to
each plan before we give a second worker to any plan. In general I
think it's sufficient to assign a worker that becomes available to the
subplan with the fewest number of workers (or one of them, if there's
a tie) without worrying too much about the target number of workers
for that subplan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#18)

Re: Parallel Append implementation

On Thu, Feb 16, 2017 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 15, 2017 at 11:15 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

If the user is ready throw 200 workers and if the subplans can use
them to speed up the query 200 times (obviously I am exaggerating),
why not to use those? When the user set
max_parallel_workers_per_gather to that high a number, he meant it to
be used by a gather, and that's what we should be doing.

The reason is because of what Amit Khandekar wrote in his email -- you
get a result with a partitioned table that is wildly inconsistent with
the result you get for an unpartitioned table. You could equally well
argue that if the user sets max_parallel_workers_per_gather to 200,
and there's a parallel sequential scan of an 8MB table to be
performed, we ought to use all 200 workers for that. But the planner
in fact estimates a much lesser number of workers, because using 200
workers for that task wastes a lot of resources for no real
performance benefit. If you partition that 8MB table into 100 tables
that are each 80kB, that shouldn't radically increase the number of
workers that get used.

That's true for a partitioned table, but not necessarily for every
append relation. Amit's patch is generic for all append relations. If
the child plans are joins or subquery segments of set operations, I
doubt if the same logic works. It may be better if we throw as many
workers (or some function "summing" those up) as specified by those
subplans. I guess, we have to use different logic for append relations
which are base relations and append relations which are not base
relations.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#19)

Re: Parallel Append implementation

On 16 February 2017 at 20:37, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 16, 2017 at 1:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

What I was thinking about is something like this:

1. First, take the maximum parallel_workers value from among all the children.

2. Second, compute log2(num_children)+1 and round up. So, for 1
child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
for 9-16 children, 5, and so on.

3. Use as the number of parallel workers for the children the maximum
of the value computed in step 1 and the value computed in step 2.

Ah, now that I closely look at compute_parallel_worker(), I see what
you are getting at.

For plain unpartitioned table, parallel_workers is calculated as
roughly equal to log(num_pages) (actually it is log3). So if the table
size is n, the workers will be log(n). So if it is partitioned into p
partitions of size n/p each, still the number of workers should be
log(n). Whereas, in the patch, it is calculated as (total of all the
child workers) i.e. n * log(n/p) for this case. But log(n) != p *
log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
log(300).

That means, the way it is calculated in the patch turns out to be much
larger than if it were calculated using log(total of sizes of all
children). So I think for the step 2 above, log(total_rel_size)
formula seems to be appropriate. What do you think ? For
compute_parallel_worker(), it is actually log3 by the way.

BTW this formula is just an extension of how parallel_workers is
calculated for an unpartitioned table.

log(total_rel_size) would be a reasonable way to estimate workers when
we're scanning an inheritance hierarchy, but I'm hoping Parallel
Append is also going to apply to UNION ALL queries, where there's no
concept of the total rel size.

Yes ParallelAppend also gets used in UNION ALL.

For that we need something else, which
is why the algorithm that I proposed upthread doesn't rely on it.

The log2(num_children)+1 formula which you proposed does not take into
account the number of workers for each of the subplans, that's why I
am a bit more inclined to look for some other logic. May be, treat the
children as if they belong to partitions, and accordingly calculate
the final number of workers. So for 2 children with 4 and 5 workers
respectively, Append parallel_workers would be : log3(3^4 + 3^5) .

The decision to use fewer workers for a smaller scan isn't really
because we think that using more workers will cause a regression.
It's because we think it may not help very much, and because it's not
worth firing up a ton of workers for a relatively small scan given
that workers are a limited resource. I think once we've got a bunch
of workers started, we might as well try to use them.

One possible side-effect I see due to this is : Other sessions might
not get a fair share of workers due to this. But again, there might be
counter argument that, because Append is now focussing all the workers
on a last subplan, it may finish faster, and release *all* of its
workers earlier.

Right. I think in general it's pretty clear that there are possible
fairness problems with parallel query. The first process that comes
along seizes however many workers it thinks it should use, and
everybody else can use whatever (if anything) is left. In the long
run, I think it would be cool to have a system where workers can leave
one parallel query in progress and join a different one (or exit and
spawn a new worker to join a different one), automatically rebalancing
as the number of parallel queries in flight fluctuates. But that's
clearly way beyond anything we can do right now. I think we should
assume that any parallel workers our process has obtained are ours to
use for the duration of the query, and use them as best we can.

Note that even if the Parallel Append tells one of the workers that there
are no more tuples and it should go away, some higher level of the
query plan could make a different choice anyway; there might be
another Append elsewhere in the plan tree.

Yeah, that looks good enough to justify not losing the workers

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#9)

1 attachment(s)

Re: Parallel Append implementation

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

Do we have any performance measurements where we see that Goal B
performs better than Goal A, in such a situation? Do we have any
performance measurement comparing these two approaches in other
situations. If implementation for Goal B beats that of Goal A always,
we can certainly implement it directly. But it may not.

I will get back with some performance numbers.

Also, separating patches for Goal A and Goal B might make reviews easier.

Do you anyways want the patch with the current state to be split ?
Right now, I am not sure how exactly you need me to split it.

BTW, Right now, the patch does not consider non-partial paths for a
child which has partial paths. Do we know, for sure, that a path
containing partial paths for a child, which has it, is always going to
be cheaper than the one which includes non-partial path. If not,
should we build another paths which contains non-partial paths for all
child relations. This sounds like a 0/1 knapsack problem.

I didn't quite get this. We do create a non-partial Append path using
non-partial child paths anyways.

Let's say a given child-relation has both partial and non-partial
paths, your approach would always pick up a partial path. But now that
parallel append can handle non-partial paths as well, it may happen
that picking up non-partial path instead of partial one when both are
available gives an overall better performance. Have we ruled out that
possibility.

Yes, one Append can contain a child c1 with partial path, another
Append path can contain child c1 with non-partial path, and each of
this combination can have two more combinations for child2, and so on,
leading to too many Append paths. I think that's what you referred to
as 0/1 knapsack problem. Right, this does not seem worth it.

I had earlier considered adding a partial Append path containing only
non-partial paths, but for some reason I had concluded that it's not
worth having this path, as it's cost is most likely going to be higher
due to presence of all single-worker paths *and* also a Gather above
them. I should have documented the reason. Let me give a thought on
this.

Let me try keeping the per-subplan max_worker info in Append path
itself, like I mentioned above. If that works, the bitmap will be
replaced by max_worker field. In case of non-partial subpath,
max_worker will be 1. (this is the same info kept in AppendState node
in the patch, but now we might need to keep it in Append path node as
well).

It will be better if we can fetch that information from each subpath
when creating the plan. As I have explained before, a path is minimal
structure, which should be easily disposable, when throwing away the
path.

Now in the v2 patch, we store per-subplan worker count. But still, we
cannot use the path->parallel_workers to determine whether it's a
partial path. This is because even for a non-partial path, it seems
the parallel_workers can be non-zero. For e.g., in
create_subqueryscan_path(), it sets path->parallel_workers to
subpath->parallel_workers. But this path is added as a non-partial
path. So we need a separate info as to which of the subpaths in Append
path are partial subpaths. So in the v2 patch, I continued to use
Bitmapset in AppendPath. But in Append plan node, number of workers is
calculated using this bitmapset. Check the new function
get_append_num_workers().

If the subpath from childrel->partial_pathlist, then we set the
corresponding bit in the bitmap. Now we can infer that for any path if
that path is found in path->parent->partial_pathlist. Since the code
always chooses the first partial path, the search in partial_pathlist
should not affect performance. So, we can avoid maintaining a bitmap
in the path and keep accumulating it when collapsing append paths.

Thanks. Accordingly did these changes in attached v4 patch.
get_append_num_workers() now uses
linitial(path->parent->partial_pathlist) to determine whether the
subpath is a partial or a non-partial path. Removed the bitmapset
field from AppendPath.

12. cost_append() essentially adds costs of all the subpaths and then divides
by parallel_divisor. This might work if all the subpaths are partial paths. But
for the subpaths which are not partial, a single worker will incur the whole
cost of that subpath. Hence just dividing all the total cost doesn't seem the
right thing to do. We should apply different logic for costing non-partial
subpaths and partial subpaths.

WIth the current partial path costing infrastructure, it is assumed
that a partial path node should return the average per-worker cost.
Hence, I thought it would be best to do it in a similar way for
Append. But let me think if we can do something. With the current
parallelism costing infrastructure, I am not sure though.

The current parallel mechanism is in sync with that costing. Each
worker is supposed to take the same burden, hence the same (average)
cost. But it will change when a single worker has to scan an entire
child relation and different child relations have different sizes.

I gave more thought on this. Considering each subplan has different
number of workers, I think it makes sense to calculate average
per-worker cost even in parallel Append. In case of non-partial
subplan, a single worker will execute it, but it will next choose
another subplan. So on average each worker is going to process the
same number of rows, and also the same amount of CPU. And that amount
of CPU cost and rows cost should be calculated by taking the total
count and dividing it by number of workers (parallel_divsor actually).

That's not entirely true. Consider N child relations with chosen paths
with costs C1, C2, ... CN which are very very different. If there are
N workers, the total cost should correspond to the highest of the
costs of subpaths, since no worker will execute more than one plan.
The unfortunate worker which executes the costliest path would take
the longest time.

Yeah, there seems to be no specific method that can compute the total
cost as the maximum of all the subplans total cost. So the assumption
is that there would be roughly equal distribution of workers.

In the new patch, there is a new test case output modification for
inherit.sql , because that test case started failing on account of
getting a ParallelAppend plan instead of Merge Append for an
inheritence table where seqscan was disabled.

Attachments:

ParallelAppend_v4.patchapplication/octet-stream; name=ParallelAppend_v4.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 98d4f1e..6b34dab 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeSeqscan.h"
@@ -206,6 +207,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -258,6 +263,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecIndexScanInitializeDSM((IndexScanState *) planstate,
 										   d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
@@ -737,6 +746,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 			case T_IndexScanState:
 				ExecIndexScanInitializeWorker((IndexScanState *) planstate, toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_ForeignScanState:
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6986cae..97bfc89 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,48 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendInfo
+{
+	int		pa_num_workers;		/* workers currently executing the subplan */
+	int		pa_max_workers;		/* max workers that should run the subplan */
+}	ParallelAppendInfo;
+
+typedef struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+
+static void exec_append_scan_first(AppendState *appendstate);
 static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
 
+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+	appendstate->as_whichplan = 0;
+}
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -77,6 +116,22 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+		return parallel_append_next(appendstate);
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -178,8 +233,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	/*
 	 * initialize to scan first subplan
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	exec_append_scan_first(appendstate);
 
 	return appendstate;
 }
@@ -198,6 +252,14 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		/* Check if we are already finished plans from parallel append */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : all plans already finished",
+						 MyProcPid);
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
@@ -219,14 +281,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -270,6 +336,7 @@ ExecReScanAppend(AppendState *node)
 	for (i = 0; i < node->as_nplans; i++)
 	{
 		PlanState  *subnode = node->appendplans[i];
+		ParallelAppendDesc padesc = node->as_padesc;
 
 		/*
 		 * ExecReScan doesn't know about my subplans, so I have to do
@@ -284,7 +351,223 @@ ExecReScanAppend(AppendState *node)
 		 */
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
+
+		if (padesc)
+		{
+			/*
+			 * Just setting all the number of workers to 0 is enough. The logic
+			 * of choosing the next plan will take care of everything else.
+			 * pa_max_workers is already set initially.
+			 */
+			padesc->pa_info[i].pa_num_workers = 0;
+		}
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	exec_append_scan_first(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_info),
+				 sizeof(*node->as_padesc->pa_info) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+	List	   *num_workers_list = ((Append*)node->ps.plan)->num_workers;
+	ListCell   *lc;
+	int			i;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	Assert(node->as_nplans == list_length(num_workers_list));
+
+	i = 0;
+	foreach(lc, num_workers_list)
+	{
+		/* Initialize the max workers count for each subplan. */
+		padesc->pa_info[i].pa_max_workers = lfirst_int(lc);
+
+		/*
+		 * Also, initialize current number of workers. Just setting all the
+		 * number of workers to 0 is enough. The logic of choosing the next
+		 * plan in workers will take care of initializing everything else.
+		 */
+		padesc->pa_info[i].pa_num_workers = 0;
+
+		i++;
+	}
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ *		Indicate that this child plan node is about to be finished, so no other
+ *		workers should take up this node. Workers who are already executing
+ *		this node will continue to do so, but workers looking for next nodes to
+ *		pick up would skip this node after this function is called. It is
+ *		possible that multiple workers call this function for the same node at
+ *		the same time, because these workers were executing the same node and
+ *		they finished with it at the same time. The spinlock is not for this
+ *		purpose. The spinlock is used so that it does not change the
+ *		pa_num_workers field while workers are choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	elog(DEBUG2, "Parallelappend : pid %d : finishing plan %d",
+				 MyProcPid, whichplan);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_info[whichplan].pa_num_workers = -1;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the optimal subplan that should be executed. The logic is to
+ *		choose the subplan that is being executed by the least number of
+ *		workers.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		min_whichplan = PA_INVALID_PLAN;
+	int		min_workers = -1; /* Keep compiler quiet */
+
+	Assert(padesc != NULL);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Choose the plan with the least number of workers */
+	for (whichplan = 0; whichplan < state->as_nplans; whichplan++)
+	{
+		ParallelAppendInfo *painfo = &padesc->pa_info[whichplan];
+
+		/* Ignore plans that are already done processing */
+		if (painfo->pa_num_workers == -1)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d"
+						 " since pa_num_workers is -1",
+						 MyProcPid, whichplan);
+			continue;
+		}
+
+		/* Ignore plans that are already being processed by max_workers */
+		if (painfo->pa_num_workers == painfo->pa_max_workers)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : ignoring plan %d,"
+						 " since reached max_worker count %d",
+						 MyProcPid, whichplan, painfo->pa_max_workers);
+			continue;
+		}
+
+		/*
+		 * Keep track of the node with the least workers so far. For the very
+		 * first plan, choose that one as the least-workers node.
+		 */
+		if (min_whichplan == PA_INVALID_PLAN ||
+			painfo->pa_num_workers < min_workers)
+		{
+			min_whichplan = whichplan;
+			min_workers = painfo->pa_num_workers;
+		}
+	}
+
+	/* Increment worker count for the chosen node, if at all we found one. */
+	if (min_whichplan != PA_INVALID_PLAN)
+		padesc->pa_info[min_whichplan].pa_num_workers++;
+
+	/*
+	 * Save the chosen plan index. It can be PA_INVALID_PLAN, which means we
+	 * are done with all nodes (Note : this meaning applies only to *parallel*
+	 * append).
+	 */
+	state->as_whichplan = min_whichplan;
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+	SpinLockRelease(&padesc->pa_mutex);
+	elog(DEBUG2, "ParallelAppend : pid %d : Chosen plan : %d",
+				 MyProcPid, min_whichplan);
+
+	/*
+	 * If we didn't find any node to work on, it means each subplan is either
+	 * finished or has reached it's pa_max_workers. In such case, should this
+	 * worker wait for some subplan to have its worker count drop below its
+	 * pa_max_workers so that it can choose that subplan ? It turns out that
+	 * it's not worth again finding a subplan to work on. Non-partial subplan
+	 * anyway can have only one worker, and that worker will execute it to
+	 * completion. For a partial subplan, if at all it reaches pa_max_workers,
+	 * it's worker count will reduce only when it's workers find that there is
+	 * nothing more to be executed, so there is no point taking up such node if
+	 * it's worker count reduces. In conclusion, just stop executing once we
+	 * don't find nodes to work on. Indicate the same by returning false.
+	 */
+	return (min_whichplan == PA_INVALID_PLAN ? false : true);
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 05d8538..160860e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -236,6 +236,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(appendplans);
+	COPY_NODE_FIELD(num_workers);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b3802b4..a007f69 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -369,6 +369,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_NODE_FIELD(num_workers);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d2f69fe..6142488 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1539,6 +1539,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_NODE_FIELD(appendplans);
+	READ_NODE_FIELD(num_workers);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index eeacf81..648d04a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1232,14 +1232,50 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
+		{
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
+		}
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the first unparameterized, parallel-safe one among the
+			 * child paths.
+			 */
+			Path *parallel_safe_path = NULL;
+			foreach(lcp, childrel->pathlist)
+			{
+				Path *child_path = (Path *) lfirst(lcp);
+				if (child_path->parallel_safe &&
+					child_path->param_info == NULL)
+				{
+					parallel_safe_path = child_path;
+					break;
+				}
+			}
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_append_subpath(partial_subpaths,
+											  parallel_safe_path);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1322,24 +1358,10 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths, NULL);
 
-		/* Generate a partial append path. */
 		appendpath = create_append_path(rel, partial_subpaths, NULL,
 										parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d01630f..0ac1feb 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1560,6 +1561,82 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, Relids required_outer)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int parallel_divisor;
+
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up.
+			 */
+			path->startup_cost = Min(path->startup_cost,
+												  subpath->startup_cost);
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+
+			path->parallel_safe = path->parallel_safe &&
+								  subpath->parallel_safe;
+
+			/* All child paths must have same parameterization */
+			Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 997bdcf..d36808b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -29,6 +29,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
@@ -194,7 +195,7 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, List *num_workers, List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -962,6 +963,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	Append	   *plan;
 	List	   *tlist = build_path_tlist(root, &best_path->path);
 	List	   *subplans = NIL;
+	List	   *num_workers_list;
 	ListCell   *subpaths;
 
 	/*
@@ -1000,6 +1002,10 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		subplans = lappend(subplans, subplan);
 	}
 
+	/* Get a list of number of workers for each of the subplans */
+	(void) get_append_num_workers(best_path->subpaths,
+								  &num_workers_list);
+
 	/*
 	 * XXX ideally, if there's just one child, we'd not bother to generate an
 	 * Append node but just return the single child.  At the moment this does
@@ -1007,7 +1013,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, num_workers_list, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5009,7 +5015,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, List *num_workers, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5019,6 +5025,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->num_workers = num_workers;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3d33d46..6f4ab23 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3334,10 +3334,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3248296..7c244d9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1192,6 +1192,55 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ *    Optionally return the list of per-subplan worker count through
+ *    'per_subplan_workers'
+ */
+int
+get_append_num_workers(List *subpaths, List **per_subplan_workers)
+{
+	ListCell   *lc;
+	int			total_workers = 0;
+	int			subplan_workers;
+	int			i = 0;
+
+	if (per_subplan_workers)
+		*per_subplan_workers = NIL;
+
+	foreach(lc, subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		RelOptInfo *rel = subpath->parent;
+
+		/*
+		 * If this subpath is actually the cheapest partial path, take into
+		 * account its parallel workers, else consider one worker since it's
+		 * non-partial.
+		 */
+		if (rel->partial_pathlist != NIL &&
+			(Path *) linitial(rel->partial_pathlist) == subpath)
+			subplan_workers = subpath->parallel_workers;
+		else
+			subplan_workers = 1;
+
+		if (per_subplan_workers)
+		{
+			*per_subplan_workers =
+				lappend_int(*per_subplan_workers, subplan_workers);
+		}
+		total_workers += subplan_workers;
+		i++;
+	}
+
+	/* In no case use more than max_parallel_workers_per_gather. */
+	total_workers = Min(total_workers,
+						   max_parallel_workers_per_gather);
+
+	return total_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1203,46 +1252,21 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 				   int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
-	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
-	foreach(l, subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(l);
-
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
-		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
-
-		/* All child paths must have same parameterization */
-		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
-	}
+	cost_append(&pathnode->path, subpaths, required_outer);
 
 	return pathnode;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5d8fb2e..0efb5cf 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -901,6 +901,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
+
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9f41bab..df5cfd5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1184,12 +1185,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f72f7a8..8c06ee0 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	List	   *num_workers;
 } Append;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 72200fa..1c47b4b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,6 +66,7 @@ extern bool enable_nestloop;
 extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -98,6 +99,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths, Relids required_outer);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 53cad24..d5429c8 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -62,8 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern int get_append_num_workers(List *subpaths, List **per_subplan_workers);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					   List *subpaths, Relids required_outer,
+					   int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a8c8b28..1ef8c4d 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1337,6 +1337,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1403,6 +1404,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 48fb80e..75c6103 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -17,9 +17,9 @@ explain (costs off)
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index d48abd7..7a303fa 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,20 +70,21 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(11 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(12 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index a8b7eb1..e9203a1 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -462,11 +462,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;

#23

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#19)

Re: Parallel Append implementation

On 16 February 2017 at 20:37, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not sure that it's going to be useful to make this logic very
complicated. I think the most important thing is to give 1 worker to
each plan before we give a second worker to any plan. In general I
think it's sufficient to assign a worker that becomes available to the
subplan with the fewest number of workers (or one of them, if there's
a tie)

without worrying too much about the target number of workers for that subplan.

The reason I have considered per-subplan workers is , for instance, so
that we can respect the parallel_workers reloption set by the user for
different tables. Or for e.g., subquery1 is a big hash join needing
more workers, and subquery2 is a small table requiring quite lesser
workers, it seems to make sense to give more workers to subquery1.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#20)

Re: Parallel Append implementation

On Fri, Feb 17, 2017 at 11:44 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

That's true for a partitioned table, but not necessarily for every
append relation. Amit's patch is generic for all append relations. If
the child plans are joins or subquery segments of set operations, I
doubt if the same logic works. It may be better if we throw as many
workers (or some function "summing" those up) as specified by those
subplans. I guess, we have to use different logic for append relations
which are base relations and append relations which are not base
relations.

Well, I for one do not believe that if somebody writes a UNION ALL
with 100 branches, they should get 100 (or 99) workers. Generally
speaking, the sweet spot for parallel workers on queries we've tested
so far has been between 1 and 4. It's straining credulity to believe
that the number that's correct for parallel append is more than an
order of magnitude larger. Since increasing resource commitment by
the logarithm of the problem size has worked reasonably well for table
scans, I believe we should pursue a similar approach here. I'm
willing to negotiate on the details of what the formula I looked like,
but I'm not going to commit something that lets an Append relation try
to grab massively more resources than we'd use for some other plan
shape.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#21)

Re: Parallel Append implementation

On Fri, Feb 17, 2017 at 2:56 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

The log2(num_children)+1 formula which you proposed does not take into
account the number of workers for each of the subplans, that's why I
am a bit more inclined to look for some other logic. May be, treat the
children as if they belong to partitions, and accordingly calculate
the final number of workers. So for 2 children with 4 and 5 workers
respectively, Append parallel_workers would be : log3(3^4 + 3^5) .

In general this will give an answer not different by more than 1 or 2
from my answer, and often exactly the same. In the case you mention,
whether we get the same answer depends on which way you round:
log3(3^4+3^5) is 5 if you round down, 6 if you round up.

My formula is more aggressive when there are many subplans that are
not parallel or take only 1 worker, because I'll always use at least 5
workers for an append that has 9-16 children, whereas you might use
only 2 if you do log3(3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0). In that
case I like my formula better. With lots of separate children, the
chances of being able to use as many as 5 workers seem good. (Note
that using 9 workers as Ashutosh seems to be proposing would be a
waste if the different children have very unequal execution times,
because the workers that run children with short execution times can
be reused to run additional subplans while the long ones are still
running. Running a separate worker for each child only works out if
the shortest runtime is more than 50% of the longest runtime, which
may sometimes be true but doesn't seem like a good bet in general.)

Your formula is more aggressive when you have 3 children that all use
the same number of workers; it'll always decide on <number of workers
per child>+1, whereas mine won't add the extra worker in that case.
Possibly your formula is better than mine in that case, but I'm not
sure. If you have as many as 9 children that all want N workers, your
formula will decide on N+2 workers, but since my formula guarantees a
minimum of 5 workers in such cases, I'll probably be within 1 of
whatever answer you were getting.

Basically, I don't believe that the log3(n) thing is anything very
special or magical. The fact that I settled on that formula for
parallel sequential scan doesn't mean that it's exactly right for
every other case. I do think it's likely that increasing workers
logarithmically is a fairly decent strategy here, but I wouldn't get
hung up on using log3(n) in every case or making all of the answers
100% consistent according to some grand principal. I'm not even sure
log3(n) is right for parallel sequential scan, so insisting that
Parallel Append has to work that way when I had no better reason than
gut instinct for picking that for Parallel Sequential Scan seems to me
to be a little unprincipled. We're still in the early stages of this
parallel query experiment, and a decent number of these algorithms are
likely to change as we get more sophisticated. For now at least, it's
more important to pick things that work well pragmatically than to be
theoretically optimal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#24)

Re: Parallel Append implementation

On Sun, Feb 19, 2017 at 2:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 17, 2017 at 11:44 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

That's true for a partitioned table, but not necessarily for every
append relation. Amit's patch is generic for all append relations. If
the child plans are joins or subquery segments of set operations, I
doubt if the same logic works. It may be better if we throw as many
workers (or some function "summing" those up) as specified by those
subplans. I guess, we have to use different logic for append relations
which are base relations and append relations which are not base
relations.

Well, I for one do not believe that if somebody writes a UNION ALL
with 100 branches, they should get 100 (or 99) workers. Generally
speaking, the sweet spot for parallel workers on queries we've tested
so far has been between 1 and 4. It's straining credulity to believe
that the number that's correct for parallel append is more than an
order of magnitude larger. Since increasing resource commitment by
the logarithm of the problem size has worked reasonably well for table
scans, I believe we should pursue a similar approach here.

Thanks for that explanation. I makes sense. So, something like this
would work: total number of workers = some function of log(sum of
sizes of relations). The number of workers allotted to each segment
are restricted to the the number of workers chosen by the planner
while planning that segment. The patch takes care of the limit right
now. It needs to incorporate the calculation for total number of
workers for append.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#26)

Re: Parallel Append implementation

On Mon, Feb 20, 2017 at 10:54 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Sun, Feb 19, 2017 at 2:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 17, 2017 at 11:44 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

That's true for a partitioned table, but not necessarily for every
append relation. Amit's patch is generic for all append relations. If
the child plans are joins or subquery segments of set operations, I
doubt if the same logic works. It may be better if we throw as many
workers (or some function "summing" those up) as specified by those
subplans. I guess, we have to use different logic for append relations
which are base relations and append relations which are not base
relations.

Well, I for one do not believe that if somebody writes a UNION ALL
with 100 branches, they should get 100 (or 99) workers. Generally
speaking, the sweet spot for parallel workers on queries we've tested
so far has been between 1 and 4. It's straining credulity to believe
that the number that's correct for parallel append is more than an
order of magnitude larger. Since increasing resource commitment by
the logarithm of the problem size has worked reasonably well for table
scans, I believe we should pursue a similar approach here.

Thanks for that explanation. I makes sense. So, something like this
would work: total number of workers = some function of log(sum of
sizes of relations). The number of workers allotted to each segment
are restricted to the the number of workers chosen by the planner
while planning that segment. The patch takes care of the limit right
now. It needs to incorporate the calculation for total number of
workers for append.

log(sum of sizes of relations) isn't well-defined for a UNION ALL query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#25)

1 attachment(s)

Re: Parallel Append implementation

On 19 February 2017 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 17, 2017 at 2:56 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

The log2(num_children)+1 formula which you proposed does not take into
account the number of workers for each of the subplans, that's why I
am a bit more inclined to look for some other logic. May be, treat the
children as if they belong to partitions, and accordingly calculate
the final number of workers. So for 2 children with 4 and 5 workers
respectively, Append parallel_workers would be : log3(3^4 + 3^5) .

In general this will give an answer not different by more than 1 or 2
from my answer, and often exactly the same. In the case you mention,
whether we get the same answer depends on which way you round:
log3(3^4+3^5) is 5 if you round down, 6 if you round up.

My formula is more aggressive when there are many subplans that are
not parallel or take only 1 worker, because I'll always use at least 5
workers for an append that has 9-16 children, whereas you might use
only 2 if you do log3(3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0). In that
case I like my formula better. With lots of separate children, the
chances of being able to use as many as 5 workers seem good. (Note
that using 9 workers as Ashutosh seems to be proposing would be a
waste if the different children have very unequal execution times,
because the workers that run children with short execution times can
be reused to run additional subplans while the long ones are still
running. Running a separate worker for each child only works out if
the shortest runtime is more than 50% of the longest runtime, which
may sometimes be true but doesn't seem like a good bet in general.)

Your formula is more aggressive when you have 3 children that all use
the same number of workers; it'll always decide on <number of workers
per child>+1, whereas mine won't add the extra worker in that case.
Possibly your formula is better than mine in that case, but I'm not
sure. If you have as many as 9 children that all want N workers, your
formula will decide on N+2 workers, but since my formula guarantees a
minimum of 5 workers in such cases, I'll probably be within 1 of
whatever answer you were getting.

Yeah, that seems to be right in most of the cases. The only cases
where your formula seems to give too few workers is for something like
: (2, 8, 8). For such subplans, we should at least allocate 8 workers.
It turns out that in most of the cases in my formula, the Append
workers allocated is just 1 worker more than the max per-subplan
worker count. So in (2, 1, 1, 8), it will be a fraction more than 8.
So in the patch, in addition to the log2() formula you proposed, I
have made sure that it allocates at least equal to max(per-subplan
parallel_workers values).

BTW, there is going to be some logic change in the choose-next-subplan
algorithm if we consider giving extra workers to subplans.

I'm not sure that it's going to be useful to make this logic very
complicated. I think the most important thing is to give 1 worker to
each plan before we give a second worker to any plan. In general I
think it's sufficient to assign a worker that becomes available to the
subplan with the fewest number of workers (or one of them, if there's
a tie) without worrying too much about the target number of workers
for that subplan.

In the attached v5 patch, the logic of distributing the workers is now
kept simple : it just distributes the workers equally without
considering the per-sublan parallel_workers value. I have retained the
earlier logic of choosing the plan with minimum current workers. But
now that the pa_max_workers is not needed, I removed it, and instead a
partial_plans bitmapset is added in the Append node. Once a worker
picks up a non-partial subplan, it immediately changes its
pa_num_workers to -1. Whereas for partial subplans, the worker sets it
to -1 only after it finishes executing it.

Effectively, in parallel_append_next(), the check for whether subplan
is executing with max parallel_workers is now removed, and all code
that was using pa_max_workers is now removed.

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

10. We should probably move the parallel_safe calculation out of cost_append().
+            path->parallel_safe = path->parallel_safe &&
+                                  subpath->parallel_safe;

11. This check shouldn't be part of cost_append().
+            /* All child paths must have same parameterization */
+            Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));

Moved out these two statements from cost_append(). Did it separately
in create_append_path().

Also, I have removed some elog() statements which were there while
inside Spinlock in parallel_append_next().

On 17 January 2017 at 11:10, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

I was looking at the executor portion of this patch and I noticed that in
exec_append_initialize_next():

if (appendstate->as_padesc)
return parallel_append_next(appendstate);

/*
* Not parallel-aware. Fine, just go on to the next subplan in the
* appropriate direction.
*/
if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
appendstate->as_whichplan++;
else
appendstate->as_whichplan--;

which seems to mean that executing Append in parallel mode disregards the
scan direction. I am not immediately sure what implications that has, so
I checked what heap scan does when executing in parallel mode, and found
this in heapgettup():

else if (backward)
{
/* backward parallel scan not supported */
Assert(scan->rs_parallel == NULL);

Perhaps, AppendState.as_padesc would not have been set if scan direction
is backward, because parallel mode would be disabled for the whole query
in that case (PlannerGlobal.parallelModeOK = false). Maybe add an
Assert() similar to one in heapgettup().

Right. Thanks for noticing this. I have added a similar Assert in
exec_append_initialize_next().

Attachments:

ParallelAppend_v5.patchapplication/octet-stream; name=ParallelAppend_v5.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index de0e2ba..6357f29 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeSeqscan.h"
@@ -213,6 +214,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -273,6 +278,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -771,6 +780,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6986cae..a5ffb38 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,56 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendInfo
+{
+	/*
+	 * pa_num_workers : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_num_workers to -1, so that no new
+	 * worker pick this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan, it should immediately set to -1, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan. In general, -1
+	 * means workers should stop picking it.
+	 */
+	int		pa_num_workers;
+
+}	ParallelAppendInfo;
+
+typedef struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+
+static void exec_append_scan_first(AppendState *appendstate);
 static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
 
+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+	appendstate->as_whichplan = 0;
+}
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -77,6 +124,27 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+	{
+		/* Backward scan is not supported by parallel-aware plans */
+		Assert(!ScanDirectionIsBackward(appendstate->ps.state->es_direction));
+
+		return parallel_append_next(appendstate);
+	}
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -178,8 +246,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	/*
 	 * initialize to scan first subplan
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	exec_append_scan_first(appendstate);
 
 	return appendstate;
 }
@@ -198,6 +265,14 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		/* Check if we are already finished plans from parallel append */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		{
+			elog(DEBUG2, "ParallelAppend : pid %d : all plans already finished",
+						 MyProcPid);
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
@@ -219,14 +294,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -270,6 +349,7 @@ ExecReScanAppend(AppendState *node)
 	for (i = 0; i < node->as_nplans; i++)
 	{
 		PlanState  *subnode = node->appendplans[i];
+		ParallelAppendDesc padesc = node->as_padesc;
 
 		/*
 		 * ExecReScan doesn't know about my subplans, so I have to do
@@ -284,7 +364,204 @@ ExecReScanAppend(AppendState *node)
 		 */
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
+
+		if (padesc)
+		{
+			/*
+			 * Just setting all the number of workers to 0 is enough. The logic
+			 * of choosing the next plan will take care of everything else.
+			 * pa_max_workers is already set initially.
+			 */
+			padesc->pa_info[i].pa_num_workers = 0;
+		}
+	}
+
+	exec_append_scan_first(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_info),
+				 sizeof(*node->as_padesc->pa_info) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+	int			i;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	for (i = 0; i < node->as_nplans; i++)
+	{
+		/*
+		 * Just setting all the number of workers to 0 is enough. The logic
+		 * of choosing the next plan in workers will take care of everything
+		 * else.
+		 */
+		padesc->pa_info[i].pa_num_workers = 0;
+	}
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ *		Indicate that this child plan node is about to be finished, so no other
+ *		workers should take up this node. Workers who are already executing
+ *		this node will continue to do so, but workers looking for next nodes to
+ *		pick up would skip this node after this function is called. It is
+ *		possible that multiple workers call this function for the same node at
+ *		the same time, because these workers were executing the same node and
+ *		they finished with it at the same time. The spinlock is not for this
+ *		purpose. The spinlock is used so that it does not change the
+ *		pa_num_workers field while workers are choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	elog(DEBUG2, "Parallelappend : pid %d : finishing plan %d",
+				 MyProcPid, whichplan);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_info[whichplan].pa_num_workers = -1;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the optimal subplan that should be executed. The logic is to
+ *		choose the subplan that is being executed by the least number of
+ *		workers.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		min_whichplan = PA_INVALID_PLAN;
+	int		min_workers = -1; /* Keep compiler quiet */
+
+	Assert(padesc != NULL);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Choose the plan with the least number of workers */
+	for (whichplan = 0; whichplan < state->as_nplans; whichplan++)
+	{
+		ParallelAppendInfo *painfo = &padesc->pa_info[whichplan];
+
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (painfo->pa_num_workers == -1)
+			continue;
+
+		/*
+		 * Keep track of the node with the least workers so far. For the very
+		 * first plan, choose that one as the least-workers node.
+		 */
+		if (min_whichplan == PA_INVALID_PLAN ||
+			painfo->pa_num_workers < min_workers)
+		{
+			min_whichplan = whichplan;
+			min_workers = painfo->pa_num_workers;
+		}
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+
+	/*
+	 * Increment worker count for the chosen node, if at all we found one.
+	 * For non-partial plans, set it to -1 instead, so that no other workers
+	 * run it.
+	 */
+	if (min_whichplan != PA_INVALID_PLAN)
+	{
+		if (bms_is_member(min_whichplan,
+						  ((Append*)state->ps.plan)->partial_subplans_set))
+			padesc->pa_info[min_whichplan].pa_num_workers++;
+		else
+			padesc->pa_info[min_whichplan].pa_num_workers = -1;
+	}
+
+	/*
+	 * Save the chosen plan index. It can be PA_INVALID_PLAN, which means we
+	 * are done with all nodes (Note : this meaning applies only to *parallel*
+	 * append).
+	 */
+	state->as_whichplan = min_whichplan;
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+	SpinLockRelease(&padesc->pa_mutex);
+	elog(DEBUG2, "ParallelAppend : pid %d : Chosen plan : %d",
+				 MyProcPid, min_whichplan);
+
+	/*
+	 * If we didn't find any node to work on, stop executing. Indicate the same
+	 * by returning false.
+	 */
+	return (min_whichplan == PA_INVALID_PLAN ? false : true);
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index bb2a8a3..67f722a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -236,6 +236,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(appendplans);
+	COPY_BITMAPSET_FIELD(partial_subplans_set);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b3802b4..69f1139 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -369,6 +369,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_BITMAPSET_FIELD(partial_subplans_set);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 05bf2e9..6d3ca5d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1537,6 +1537,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_NODE_FIELD(appendplans);
+	READ_BITMAPSET_FIELD(partial_subplans_set);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 87a3faf..7a59c8e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1232,14 +1232,50 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
+		{
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
+		}
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the first unparameterized, parallel-safe one among the
+			 * child paths.
+			 */
+			Path *parallel_safe_path = NULL;
+			foreach(lcp, childrel->pathlist)
+			{
+				Path *child_path = (Path *) lfirst(lcp);
+				if (child_path->parallel_safe &&
+					child_path->param_info == NULL)
+				{
+					parallel_safe_path = child_path;
+					break;
+				}
+			}
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_append_subpath(partial_subpaths,
+											  parallel_safe_path);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1322,24 +1358,10 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths);
 
-		/* Generate a partial append path. */
 		appendpath = create_append_path(rel, partial_subpaths, NULL,
 										parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c138f57..ccd6733 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1559,6 +1560,70 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int parallel_divisor;
+
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up.
+			 */
+			path->startup_cost = Min(path->startup_cost,
+												  subpath->startup_cost);
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 1e953b4..04b0414 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -29,6 +29,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
@@ -194,7 +195,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, Bitmapset *partial_plans_set,
+						   List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -962,6 +964,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	List	   *tlist = build_path_tlist(root, &best_path->path);
 	List	   *subplans = NIL;
 	ListCell   *subpaths;
+	Bitmapset  *partial_subplans_set;
+	int			i;
 
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
@@ -987,12 +991,25 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		return plan;
 	}
 
-	/* Build the plan for each child */
+	/* Build the plan for each child, and a bitmapset of partial subpaths */
+	partial_subplans_set = NULL;
+	i = 0;
 	foreach(subpaths, best_path->subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(subpaths);
+		RelOptInfo *rel = subpath->parent;
 		Plan	   *subplan;
 
+		/*
+		 * If this subpath is actually the cheapest partial path, add this into
+		 * the partial path set.
+		 */
+		if (rel->partial_pathlist != NIL &&
+			(Path *) linitial(rel->partial_pathlist) == subpath)
+			partial_subplans_set = bms_add_member(partial_subplans_set, i);
+
+		i++;
+
 		/* Must insist that all children return the same tlist */
 		subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
 
@@ -1006,7 +1023,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, partial_subplans_set, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5003,7 +5020,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, Bitmapset *partial_plans_set, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5013,6 +5030,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->partial_subplans_set = partial_plans_set;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ca0ae78..fb91264 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3334,10 +3334,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3248296..1b8e362 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1192,6 +1192,67 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) i.e. ln(num_subpaths) / ln(2)  */
+	log2w = log(list_length(subpaths)) / 0.693 ;
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1210,40 +1271,27 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, subpaths);
+
 	return pathnode;
 }
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f8b073d..2994413 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -902,6 +902,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
+
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..c887be6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1185,12 +1186,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f72f7a8..6d772ca 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	Bitmapset  *partial_subplans_set;
 } Append;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 72200fa..484e179 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,6 +66,7 @@ extern bool enable_nestloop;
 extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -98,6 +99,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 53cad24..dbec534 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -62,8 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern int get_append_num_workers(List *subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					   List *subpaths, Relids required_outer,
+					   int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 795d9f5..367d23f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1337,6 +1337,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1403,6 +1404,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 75558d0..3071bae 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -17,9 +17,9 @@ explain (costs off)
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index d48abd7..7a303fa 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,20 +70,21 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(11 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(12 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 836ec22..0636f08 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -462,11 +462,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;

#29

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#28)

Re: Parallel Append implementation

On Wed, Mar 8, 2017 at 2:00 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Yeah, that seems to be right in most of the cases. The only cases
where your formula seems to give too few workers is for something like
: (2, 8, 8). For such subplans, we should at least allocate 8 workers.
It turns out that in most of the cases in my formula, the Append
workers allocated is just 1 worker more than the max per-subplan
worker count. So in (2, 1, 1, 8), it will be a fraction more than 8.
So in the patch, in addition to the log2() formula you proposed, I
have made sure that it allocates at least equal to max(per-subplan
parallel_workers values).

Yeah, I agree with that.

Some review:

+typedef struct ParallelAppendDescData
+{
+    slock_t        pa_mutex;        /* mutual exclusion to choose
next subplan */
+    ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;

Instead of having ParallelAppendInfo, how about just int
pa_workers[FLEXIBLE_ARRAY_MEMBER]? The second structure seems like
overkill, at least for now.

+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+    appendstate->as_whichplan = 0;
+}

I don't think this is buying you anything, and suggest backing it out.

+        /* Backward scan is not supported by parallel-aware plans */
+        Assert(!ScanDirectionIsBackward(appendstate->ps.state->es_direction));

I think you could assert ScanDirectionIsForward, couldn't you?
NoMovement, I assume, is right out.

+            elog(DEBUG2, "ParallelAppend : pid %d : all plans already
finished",
+                         MyProcPid);

Please remove (and all similar cases also).

+ sizeof(*node->as_padesc->pa_info) * node->as_nplans);

I'd use the type name instead.

+    for (i = 0; i < node->as_nplans; i++)
+    {
+        /*
+         * Just setting all the number of workers to 0 is enough. The logic
+         * of choosing the next plan in workers will take care of everything
+         * else.
+         */
+        padesc->pa_info[i].pa_num_workers = 0;
+    }

Here I'd use memset.

+ return (min_whichplan == PA_INVALID_PLAN ? false : true);

Maybe just return (min_whichplan != PA_INVALID_PLAN);

-                                              childrel->cheapest_total_path);
+
childrel->cheapest_total_path);

Unnecessary.

+ {
partial_subpaths = accumulate_append_subpath(partial_subpaths,
linitial(childrel->partial_pathlist));
+ }

Don't need to add braces.

+            /*
+             * Extract the first unparameterized, parallel-safe one among the
+             * child paths.
+             */

Can we use get_cheapest_parallel_safe_total_inner for this, from
a71f10189dc10a2fe422158a2c9409e0f77c6b9e?

+        if (rel->partial_pathlist != NIL &&
+            (Path *) linitial(rel->partial_pathlist) == subpath)
+            partial_subplans_set = bms_add_member(partial_subplans_set, i);

This seems like a scary way to figure this out. What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason? I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.

-                create_append_path(grouped_rel,
-                                   paths,
-                                   NULL,
-                                   0);
+                create_append_path(grouped_rel, paths, NULL, 0);

Unnecessary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#29)

Re: Parallel Append implementation

+        if (rel->partial_pathlist != NIL &&
+            (Path *) linitial(rel->partial_pathlist) == subpath)
+            partial_subplans_set = bms_add_member(partial_subplans_set, i);
This seems like a scary way to figure this out. What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason? I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.

I agree that assuming that a subpath is non-partial path if it's not
cheapest of the partial paths is risky. In fact, we can not assume
that even when it's not one of the partial_paths since it could have
been kicked out or was never added to the partial path list like
reparameterized path. But if we have to save the information about
which of the subpaths are partial paths and which are not in
AppendPath, it would take some memory, noticeable for thousands of
partitions, which will leak if the path doesn't make into the
rel->pathlist. The purpose of that information is to make sure that we
allocate only one worker to that plan. I suggested that we use
path->parallel_workers for the same, but it seems that's not
guaranteed to be reliable. The reasons were discussed upthread. Is
there any way to infer whether we can allocate more than one workers
to a plan by looking at the corresponding path?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#30)

Re: Parallel Append implementation

On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

+        if (rel->partial_pathlist != NIL &&
+            (Path *) linitial(rel->partial_pathlist) == subpath)
+            partial_subplans_set = bms_add_member(partial_subplans_set, i);
This seems like a scary way to figure this out. What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason? I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.
I agree that assuming that a subpath is non-partial path if it's not
cheapest of the partial paths is risky. In fact, we can not assume
that even when it's not one of the partial_paths since it could have
been kicked out or was never added to the partial path list like
reparameterized path. But if we have to save the information about
which of the subpaths are partial paths and which are not in
AppendPath, it would take some memory, noticeable for thousands of
partitions, which will leak if the path doesn't make into the
rel->pathlist.

True, but that's no different from the situation for any other Path
node that has substructure. For example, an IndexPath has no fewer
than 5 list pointers in it. Generally we assume that the number of
paths won't be large enough for the memory used to really matter, and
I think that will also be true here. And an AppendPath has a list of
subpaths, and if I'm not mistaken, those list nodes consume more
memory than the tracking information we're thinking about here will.

I think you're thinking about this issue because you've been working
on partitionwise join where memory consumption is a big issue, but
there are a lot of cases where that isn't really a big deal.

The purpose of that information is to make sure that we
allocate only one worker to that plan. I suggested that we use
path->parallel_workers for the same, but it seems that's not
guaranteed to be reliable. The reasons were discussed upthread. Is
there any way to infer whether we can allocate more than one workers
to a plan by looking at the corresponding path?

I think it would be smarter to track it some other way. Either keep
two lists of paths, one of which is the partial paths and the other of
which is the parallel-safe paths, or keep a bitmapset indicating which
paths fall into which category. I am not going to say there's no way
we could make it work without either of those things -- looking at the
parallel_workers flag might be made to work, for example -- but the
design idea I had in mind when I put this stuff into place was that
you keep them separate in other ways, not by the data they store
inside them. I think it will be more robust if we keep to that
principle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#31)

Re: Parallel Append implementation

On Thu, Mar 9, 2017 at 6:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
+        if (rel->partial_pathlist != NIL &&
+            (Path *) linitial(rel->partial_pathlist) == subpath)
+            partial_subplans_set = bms_add_member(partial_subplans_set, i);
This seems like a scary way to figure this out. What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason? I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.
I agree that assuming that a subpath is non-partial path if it's not
cheapest of the partial paths is risky. In fact, we can not assume
that even when it's not one of the partial_paths since it could have
been kicked out or was never added to the partial path list like
reparameterized path. But if we have to save the information about
which of the subpaths are partial paths and which are not in
AppendPath, it would take some memory, noticeable for thousands of
partitions, which will leak if the path doesn't make into the
rel->pathlist.
True, but that's no different from the situation for any other Path
node that has substructure. For example, an IndexPath has no fewer
than 5 list pointers in it. Generally we assume that the number of
paths won't be large enough for the memory used to really matter, and
I think that will also be true here. And an AppendPath has a list of
subpaths, and if I'm not mistaken, those list nodes consume more
memory than the tracking information we're thinking about here will.

What I have observed is that we try to keep the memory usage to a
minimum, trying to avoid memory consumption as much as possible. Most
of that substructure gets absorbed by the planner or is shared across
paths. Append path lists are an exception to that, but we need
something to hold all subpaths together and list is PostgreSQL's way
of doing it. So, that's kind of unavoidable. And may be we will find
some reason for almost every substructure in paths.

I think you're thinking about this issue because you've been working
on partitionwise join where memory consumption is a big issue, but
there are a lot of cases where that isn't really a big deal.

:).

The purpose of that information is to make sure that we
allocate only one worker to that plan. I suggested that we use
path->parallel_workers for the same, but it seems that's not
guaranteed to be reliable. The reasons were discussed upthread. Is
there any way to infer whether we can allocate more than one workers
to a plan by looking at the corresponding path?

I think it would be smarter to track it some other way. Either keep
two lists of paths, one of which is the partial paths and the other of
which is the parallel-safe paths, or keep a bitmapset indicating which
paths fall into which category.

I like two lists: it consumes almost no memory (two list headers
instead of one) compared to non-parallel-append when there are
non-partial paths and what more, it consumes no extra memory when all
paths are partial.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#32)

Re: Parallel Append implementation

On 10 March 2017 at 10:13, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Mar 9, 2017 at 6:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
+        if (rel->partial_pathlist != NIL &&
+            (Path *) linitial(rel->partial_pathlist) == subpath)
+            partial_subplans_set = bms_add_member(partial_subplans_set, i);
This seems like a scary way to figure this out. What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason?

Yes, there was an assumption that append subpath will be either a
cheapest non-partial path, or the cheapest (i.e. first in the list)
partial path, although in the patch there is no Asserts to make sure
that a common rule has been followed at both these places.

I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.

I agree that assuming that a subpath is non-partial path if it's not
cheapest of the partial paths is risky. In fact, we can not assume
that even when it's not one of the partial_paths since it could have
been kicked out or was never added to the partial path list like
reparameterized path. But if we have to save the information about
which of the subpaths are partial paths and which are not in
AppendPath, it would take some memory, noticeable for thousands of
partitions, which will leak if the path doesn't make into the
rel->pathlist.

True, but that's no different from the situation for any other Path
node that has substructure. For example, an IndexPath has no fewer
than 5 list pointers in it. Generally we assume that the number of
paths won't be large enough for the memory used to really matter, and
I think that will also be true here. And an AppendPath has a list of
subpaths, and if I'm not mistaken, those list nodes consume more
memory than the tracking information we're thinking about here will.

What I have observed is that we try to keep the memory usage to a
minimum, trying to avoid memory consumption as much as possible. Most
of that substructure gets absorbed by the planner or is shared across
paths. Append path lists are an exception to that, but we need
something to hold all subpaths together and list is PostgreSQL's way
of doing it. So, that's kind of unavoidable. And may be we will find
some reason for almost every substructure in paths.

I think you're thinking about this issue because you've been working
on partitionwise join where memory consumption is a big issue, but
there are a lot of cases where that isn't really a big deal.

:).

The purpose of that information is to make sure that we
allocate only one worker to that plan. I suggested that we use
path->parallel_workers for the same, but it seems that's not
guaranteed to be reliable. The reasons were discussed upthread. Is
there any way to infer whether we can allocate more than one workers
to a plan by looking at the corresponding path?

I think it would be smarter to track it some other way. Either keep
two lists of paths, one of which is the partial paths and the other of
which is the parallel-safe paths, or keep a bitmapset indicating which
paths fall into which category.

I like two lists: it consumes almost no memory (two list headers
instead of one) compared to non-parallel-append when there are
non-partial paths and what more, it consumes no extra memory when all
paths are partial.

I agree that the two-lists approach will consume less memory than
bitmapset. Keeping two lists will effectively have an extra pointer
field which will add up to the AppendPath size, but this size will not
grow with the number of subpaths, whereas the Bitmapset will grow.

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
padesc->pa_info[min_whichplan].pa_num_workers++;
else
padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution with such
simplicity, why not have this same data structure in AppendPath, and
re-use bitmapset field in Append plan node without making a copy of
it. Otherwise, if we have two lists in AppendPath, and a bitmap in
Append, again there is going to be code for data structure conversion.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Amit Khandekar (#33)

Re: Parallel Append implementation

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
padesc->pa_info[min_whichplan].pa_num_workers++;
else
padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution with such
simplicity, why not have this same data structure in AppendPath, and
re-use bitmapset field in Append plan node without making a copy of
it. Otherwise, if we have two lists in AppendPath, and a bitmap in
Append, again there is going to be code for data structure conversion.

I think there is some merit in separating out non-parallel and
parallel plans within the same array or outside it. The current logic
to assign plan to a worker looks at all the plans, unnecessarily
hopping over the un-parallel ones after they are given to a worker. If
we separate those two, we can keep assigning new workers to the
non-parallel plans first and then iterate over the parallel ones when
a worker needs a plan to execute. We might eliminate the need for
special value -1 for num workers. You may separate those two kinds in
two different arrays or within the same array and remember the
smallest index of a parallel plan.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Ashutosh Bapat (#34)

Re: Parallel Append implementation

On Fri, Mar 10, 2017 at 11:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
padesc->pa_info[min_whichplan].pa_num_workers++;
else
padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution with such
simplicity, why not have this same data structure in AppendPath, and
re-use bitmapset field in Append plan node without making a copy of
it. Otherwise, if we have two lists in AppendPath, and a bitmap in
Append, again there is going to be code for data structure conversion.

I think there is some merit in separating out non-parallel and
parallel plans within the same array or outside it. The current logic
to assign plan to a worker looks at all the plans, unnecessarily
hopping over the un-parallel ones after they are given to a worker. If
we separate those two, we can keep assigning new workers to the
non-parallel plans first and then iterate over the parallel ones when
a worker needs a plan to execute. We might eliminate the need for
special value -1 for num workers. You may separate those two kinds in
two different arrays or within the same array and remember the
smallest index of a parallel plan.

Further to that, with this scheme and the scheme to distribute workers
equally irrespective of the maximum workers per plan, you don't need
to "scan" the subplans to find the one with minimum workers. If you
treat the array of parallel plans as a circular queue, the plan to be
assigned next to a worker will always be the plan next to the one
which got assigned to the given worker. Once you have assigned workers
to non-parallel plans, intialize a shared variable next_plan to point
to the first parallel plan. When a worker comes asking for a plan,
assign the plan pointed by next_plan and update it to the next plan in
the circular queue.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#35)

Re: Parallel Append implementation

On 10 March 2017 at 12:33, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 10, 2017 at 11:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
padesc->pa_info[min_whichplan].pa_num_workers++;
else
padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution with such
simplicity, why not have this same data structure in AppendPath, and
re-use bitmapset field in Append plan node without making a copy of
it. Otherwise, if we have two lists in AppendPath, and a bitmap in
Append, again there is going to be code for data structure conversion.

I think there is some merit in separating out non-parallel and
parallel plans within the same array or outside it. The current logic
to assign plan to a worker looks at all the plans, unnecessarily
hopping over the un-parallel ones after they are given to a worker. If
we separate those two, we can keep assigning new workers to the
non-parallel plans first and then iterate over the parallel ones when
a worker needs a plan to execute. We might eliminate the need for
special value -1 for num workers. You may separate those two kinds in
two different arrays or within the same array and remember the
smallest index of a parallel plan.

Do you think we might get performance benefit with this ? I am looking
more towards logic simplicity. non-parallel plans would be mostly
likely be there only in case of UNION ALL queries, and not partitioned
tables. And UNION ALL queries probably would have far lesser number of
subplans, there won't be too many unnecessary hops. The need for
num_workers=-1 will still be there for partial plans, because we need
to set it to -1 once a worker finishes a plan.

Further to that, with this scheme and the scheme to distribute workers
equally irrespective of the maximum workers per plan, you don't need
to "scan" the subplans to find the one with minimum workers. If you
treat the array of parallel plans as a circular queue, the plan to be
assigned next to a worker will always be the plan next to the one
which got assigned to the given worker.

Once you have assigned workers
to non-parallel plans, intialize a shared variable next_plan to point
to the first parallel plan. When a worker comes asking for a plan,
assign the plan pointed by next_plan and update it to the next plan in
the circular queue.

At some point of time, this logic may stop working. Imagine plans are
running with (1, 1, 1). Next worker goes to plan 1, so they run with
(2, 1, 1). So now the next_plan points to plan 2. Now suppose worker
on plan 2 finishes. It should not again take plan 2, even though
next_plan points to 2. It should take plan 3, or whichever is not
finished. May be a worker that finishes a plan should do this check
before directly going to the next_plan. But if this is turning out as
simple as the finding-min-worker-plan, we can use this logic. But will
have to check. We can anyway consider this even when we have a single
list.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Amit Khandekar (#36)

Re: Parallel Append implementation

I think there is some merit in separating out non-parallel and
parallel plans within the same array or outside it. The current logic
to assign plan to a worker looks at all the plans, unnecessarily
hopping over the un-parallel ones after they are given to a worker. If
we separate those two, we can keep assigning new workers to the
non-parallel plans first and then iterate over the parallel ones when
a worker needs a plan to execute. We might eliminate the need for
special value -1 for num workers. You may separate those two kinds in
two different arrays or within the same array and remember the
smallest index of a parallel plan.

Do you think we might get performance benefit with this ? I am looking
more towards logic simplicity. non-parallel plans would be mostly
likely be there only in case of UNION ALL queries, and not partitioned
tables. And UNION ALL queries probably would have far lesser number of
subplans, there won't be too many unnecessary hops.

A partitioned table which has foreign and local partitions would have
non-parallel and parallel plans if the foreign plans can not be
parallelized like what postgres_fdw does.

The need for
num_workers=-1 will still be there for partial plans, because we need
to set it to -1 once a worker finishes a plan.

IIRC, we do that so that no other workers are assigned to it when
scanning the array of plans. But with the new scheme we don't need to
scan the non-parallel plans for when assigning plan to workers so -1
may not be needed. I may be wrong though.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#37)

Re: Parallel Append implementation

On 10 March 2017 at 14:05, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

The need for
num_workers=-1 will still be there for partial plans, because we need
to set it to -1 once a worker finishes a plan.

IIRC, we do that so that no other workers are assigned to it when
scanning the array of plans. But with the new scheme we don't need to
scan the non-parallel plans for when assigning plan to workers so -1
may not be needed. I may be wrong though.

Still, when a worker finishes a partial subplan , it marks it as -1,
so that no new workers pick this, even if there are other workers
already executing it.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Tels

nospam-pg-abuse@bloodgate.com

almost 9 years ago

In reply to: Amit Khandekar (#36)

Re: Parallel Append implementation

Moin,

On Fri, March 10, 2017 3:24 am, Amit Khandekar wrote:

On 10 March 2017 at 12:33, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 10, 2017 at 11:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other
workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
padesc->pa_info[min_whichplan].pa_num_workers++;
else
padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution with such
simplicity, why not have this same data structure in AppendPath, and
re-use bitmapset field in Append plan node without making a copy of
it. Otherwise, if we have two lists in AppendPath, and a bitmap in
Append, again there is going to be code for data structure conversion.

I think there is some merit in separating out non-parallel and
parallel plans within the same array or outside it. The current logic
to assign plan to a worker looks at all the plans, unnecessarily
hopping over the un-parallel ones after they are given to a worker. If
we separate those two, we can keep assigning new workers to the
non-parallel plans first and then iterate over the parallel ones when
a worker needs a plan to execute. We might eliminate the need for
special value -1 for num workers. You may separate those two kinds in
two different arrays or within the same array and remember the
smallest index of a parallel plan.

Do you think we might get performance benefit with this ? I am looking
more towards logic simplicity. non-parallel plans would be mostly
likely be there only in case of UNION ALL queries, and not partitioned
tables. And UNION ALL queries probably would have far lesser number of
subplans, there won't be too many unnecessary hops. The need for
num_workers=-1 will still be there for partial plans, because we need
to set it to -1 once a worker finishes a plan.

Further to that, with this scheme and the scheme to distribute workers
equally irrespective of the maximum workers per plan, you don't need
to "scan" the subplans to find the one with minimum workers. If you
treat the array of parallel plans as a circular queue, the plan to be
assigned next to a worker will always be the plan next to the one
which got assigned to the given worker.

Once you have assigned workers
to non-parallel plans, intialize a shared variable next_plan to point
to the first parallel plan. When a worker comes asking for a plan,
assign the plan pointed by next_plan and update it to the next plan in
the circular queue.

At some point of time, this logic may stop working. Imagine plans are
running with (1, 1, 1). Next worker goes to plan 1, so they run with
(2, 1, 1). So now the next_plan points to plan 2. Now suppose worker
on plan 2 finishes. It should not again take plan 2, even though
next_plan points to 2. It should take plan 3, or whichever is not
finished. May be a worker that finishes a plan should do this check
before directly going to the next_plan. But if this is turning out as
simple as the finding-min-worker-plan, we can use this logic. But will
have to check. We can anyway consider this even when we have a single
list.

Just a question for me to understand the implementation details vs. the
strategy:

Have you considered how the scheduling decision might impact performance
due to "inter-plan parallelism vs. in-plan parallelism"?

So what would be the scheduling strategy? And should there be a fixed one
or user-influencable? And what could be good ones?

A simple example:

E.g. if we have 5 subplans, and each can have at most 5 workers and we
have 5 workers overall.

So, do we:

Assign 5 workers to plan 1. Let it finish.
Then assign 5 workers to plan 2. Let it finish.
and so on

or:

Assign 1 workers to each plan until no workers are left?

In the second case you would have 5 plans running in a quasy-sequential
manner, which might be slower than the other way. Or not, that probably
needs some benchmarks?

Likewise, if you have a mix of plans with max workers like:

Plan A: 1 worker
Plan B: 2 workers
Plan C: 3 workers
Plan D: 1 worker
Plan E: 4 workers

Would the strategy be:

* Serve them in first-come-first-served order? (A,B,C,D?) (Would order
here be random due to how the plan's emerge, i.e. could the user re-order
query to get a different order?)
* Serve them in max-workers order? (A,D,B,C)
* Serve first all with 1 worker, then fill the rest? (A,D,B,C | A,D,C,B)
* Serve them by some other metric, e.g. index-only scans first, seq-scans
last? Or a mix of all these?

Excuse me if I just didn't see this from the thread so far. :)

Best regards,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#29)

1 attachment(s)

Re: Parallel Append implementation

After giving more thought to our discussions, I have have used the
Bitmapset structure in AppendPath as against having two lists one for
partial and other for non-partial paths. Attached is the patch v6 that
has the required changes. So accumulate_append_subpath() now also
prepares the bitmapset containing the information about which paths
are partial paths. This is what I had done in the first version.

At this point of time, I have not given sufficient time to think about
Ashutosh's proposal of just keeping track of the next_subplan which he
mentioned. There, we just keep assigning workers to a circle of
subplans in round-robin style. But I think as of now the approach of
choosing the minimum worker subplan is pretty simple looking. So the
patch v6 is in a working condition using minimum-worker approach.

On 9 March 2017 at 07:22, Robert Haas <robertmhaas@gmail.com> wrote:

Some review:
+typedef struct ParallelAppendDescData
+{
+    slock_t        pa_mutex;        /* mutual exclusion to choose
next subplan */
+    ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
Instead of having ParallelAppendInfo, how about just int
pa_workers[FLEXIBLE_ARRAY_MEMBER]? The second structure seems like
overkill, at least for now.

I have , for now, kept the structure there, just in case after further
discussion we may add something.

+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+    appendstate->as_whichplan = 0;
+}
I don't think this is buying you anything, and suggest backing it out.

This is required for sequential Append, so that we can start executing
from the first subplan.

+        /* Backward scan is not supported by parallel-aware plans */
+        Assert(!ScanDirectionIsBackward(appendstate->ps.state->es_direction));
I think you could assert ScanDirectionIsForward, couldn't you?
NoMovement, I assume, is right out.

Right. Changed.

+            elog(DEBUG2, "ParallelAppend : pid %d : all plans already
finished",
+                         MyProcPid);

Please remove (and all similar cases also).

Removed at multiple places.

+ sizeof(*node->as_padesc->pa_info) * node->as_nplans);

I'd use the type name instead.

Done.

+    for (i = 0; i < node->as_nplans; i++)
+    {
+        /*
+         * Just setting all the number of workers to 0 is enough. The logic
+         * of choosing the next plan in workers will take care of everything
+         * else.
+         */
+        padesc->pa_info[i].pa_num_workers = 0;
+    }

Here I'd use memset.

Done.

+ return (min_whichplan == PA_INVALID_PLAN ? false : true);

Maybe just return (min_whichplan != PA_INVALID_PLAN);

Done.

-                                              childrel->cheapest_total_path);
+
childrel->cheapest_total_path);

Unnecessary.

This call is now having more param, so kept the change.

+ {
partial_subpaths = accumulate_append_subpath(partial_subpaths,
linitial(childrel->partial_pathlist));
+ }

Don't need to add braces.

Removed them.

+            /*
+             * Extract the first unparameterized, parallel-safe one among the
+             * child paths.
+             */
Can we use get_cheapest_parallel_safe_total_inner for this, from
a71f10189dc10a2fe422158a2c9409e0f77c6b9e?

Yes, Fixed.

+        if (rel->partial_pathlist != NIL &&
+            (Path *) linitial(rel->partial_pathlist) == subpath)
+            partial_subplans_set = bms_add_member(partial_subplans_set, i);
This seems like a scary way to figure this out. What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason? I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.

As mentioned above, used Bitmapset in AppendPath.

-                create_append_path(grouped_rel,
-                                   paths,
-                                   NULL,
-                                   0);
+                create_append_path(grouped_rel, paths, NULL, 0);

Unnecessary.

Now since there was anyway a change in the number of params, I kept
the single line call.

Please refer to attached patch version v6 for all of the above changes.

Attachments:

ParallelAppend_v6.patchapplication/octet-stream; name=ParallelAppend_v6.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a1289e5..41d807c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -214,6 +215,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -278,6 +283,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -781,6 +790,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6986cae..f156907 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,56 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendInfo
+{
+	/*
+	 * pa_num_workers : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_num_workers to -1, so that no new
+	 * worker pick this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan, it should immediately set to -1, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan. In general, -1
+	 * means workers should stop picking it.
+	 */
+	int		pa_num_workers;
+
+}	ParallelAppendInfo;
+
+typedef struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+
+static void exec_append_scan_first(AppendState *appendstate);
 static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
 
+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+	appendstate->as_whichplan = 0;
+}
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -77,6 +124,27 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+	{
+		/* Backward scan is not supported by parallel-aware plans */
+		Assert(ScanDirectionIsForward(appendstate->ps.state->es_direction));
+
+		return parallel_append_next(appendstate);
+	}
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -178,8 +246,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	/*
 	 * initialize to scan first subplan
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	exec_append_scan_first(appendstate);
 
 	return appendstate;
 }
@@ -198,6 +265,10 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		/* Check if we are already finished plans from parallel append */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
@@ -219,14 +290,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -270,6 +345,7 @@ ExecReScanAppend(AppendState *node)
 	for (i = 0; i < node->as_nplans; i++)
 	{
 		PlanState  *subnode = node->appendplans[i];
+		ParallelAppendDesc padesc = node->as_padesc;
 
 		/*
 		 * ExecReScan doesn't know about my subplans, so I have to do
@@ -284,7 +360,195 @@ ExecReScanAppend(AppendState *node)
 		 */
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
+
+		if (padesc)
+		{
+			/*
+			 * Just setting all the number of workers to 0 is enough. The logic
+			 * of choosing the next plan will take care of everything else.
+			 * pa_max_workers is already set initially.
+			 */
+			padesc->pa_info[i].pa_num_workers = 0;
+		}
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	exec_append_scan_first(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_info),
+				 sizeof(ParallelAppendInfo) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	/*
+	 * Just setting all the number of workers to 0 is enough. The logic
+	 * of choosing the next plan in workers will take care of everything
+	 * else.
+	 */
+	memset(padesc->pa_info, 0, sizeof(ParallelAppendInfo) * node->as_nplans);
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ *		Indicate that this child plan node is about to be finished, so no other
+ *		workers should take up this node. Workers who are already executing
+ *		this node will continue to do so, but workers looking for next nodes to
+ *		pick up would skip this node after this function is called. It is
+ *		possible that multiple workers call this function for the same node at
+ *		the same time, because these workers were executing the same node and
+ *		they finished with it at the same time. The spinlock is not for this
+ *		purpose. The spinlock is used so that it does not change the
+ *		pa_num_workers field while workers are choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_info[whichplan].pa_num_workers = -1;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the optimal subplan that should be executed. The logic is to
+ *		choose the subplan that is being executed by the least number of
+ *		workers.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		min_whichplan = PA_INVALID_PLAN;
+	int		min_workers = -1; /* Keep compiler quiet */
+
+	Assert(padesc != NULL);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Choose the plan with the least number of workers */
+	for (whichplan = 0; whichplan < state->as_nplans; whichplan++)
+	{
+		ParallelAppendInfo *painfo = &padesc->pa_info[whichplan];
+
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (painfo->pa_num_workers == -1)
+			continue;
+
+		/*
+		 * Keep track of the node with the least workers so far. For the very
+		 * first plan, choose that one as the least-workers node.
+		 */
+		if (min_whichplan == PA_INVALID_PLAN ||
+			painfo->pa_num_workers < min_workers)
+		{
+			min_whichplan = whichplan;
+			min_workers = painfo->pa_num_workers;
+		}
+	}
+
+
+	/*
+	 * Increment worker count for the chosen node, if at all we found one.
+	 * For non-partial plans, set it to -1 instead, so that no other workers
+	 * run it.
+	 */
+	if (min_whichplan != PA_INVALID_PLAN)
+	{
+		if (bms_is_member(min_whichplan,
+						  ((Append*)state->ps.plan)->partial_subplans_set))
+			padesc->pa_info[min_whichplan].pa_num_workers++;
+		else
+			padesc->pa_info[min_whichplan].pa_num_workers = -1;
+	}
+
+	/*
+	 * Save the chosen plan index. It can be PA_INVALID_PLAN, which means we
+	 * are done with all nodes (Note : this meaning applies only to *parallel*
+	 * append).
+	 */
+	state->as_whichplan = min_whichplan;
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+	SpinLockRelease(&padesc->pa_mutex);
+
+	/*
+	 * If we didn't find any node to work on, stop executing. Indicate the same
+	 * by returning false.
+	 */
+	return (min_whichplan != PA_INVALID_PLAN);
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index bfc2ac1..f861881 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -236,6 +236,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(appendplans);
+	COPY_BITMAPSET_FIELD(partial_subplans_set);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7418fbe..0aea119 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -369,6 +369,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_BITMAPSET_FIELD(partial_subplans_set);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d3bbc02..bb366bc 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1565,6 +1565,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_NODE_FIELD(appendplans);
+	READ_BITMAPSET_FIELD(partial_subplans_set);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b263359..16c2f5b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -99,7 +99,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_append_subpath(List *subpaths, Path *path,
+									   Bitmapset **partial_subpaths_set);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1186,6 +1187,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	Bitmapset  *partial_subpath_set = NULL;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1245,14 +1247,41 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path,
+												 NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-									   linitial(childrel->partial_pathlist));
+									   linitial(childrel->partial_pathlist),
+									   &partial_subpath_set);
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			Path *parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_append_subpath(partial_subpaths,
+											  parallel_safe_path, NULL);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1327,7 +1356,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+		add_path(rel, (Path *) create_append_path(rel, subpaths,
+												  NULL, NULL, 0));
 
 	/*
 	 * Consider an append of partial unordered, unparameterized partial paths.
@@ -1335,26 +1365,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers);
+		appendpath = create_append_path(rel, partial_subpaths,
+										partial_subpath_set,
+										NULL, parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1401,12 +1418,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			subpaths = accumulate_append_subpath(subpaths, subpath, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0));
+					 create_append_path(rel, subpaths,
+										NULL, required_outer, 0));
 	}
 }
 
@@ -1490,9 +1508,11 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				startup_neq_total = true;
 
 			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
+				accumulate_append_subpath(startup_subpaths,
+										  cheapest_startup, NULL);
 			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+				accumulate_append_subpath(total_subpaths,
+										  cheapest_total, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1584,6 +1604,43 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 	return cheapest;
 }
 
+/* concat_append_subpaths
+ * 		helper function for accumulate_append_subpath()
+ *
+ * child_partial_subpaths_set is the bitmap set to indicate which of the
+ * childpaths are partial paths. This is currently non-NULL only in case
+ * the childpaths belong to an Append path.
+ */
+static List *
+concat_append_subpaths(List *append_subpaths, List *childpaths,
+					   Bitmapset **partial_subpaths_set,
+					   Bitmapset *child_partial_subpaths_set)
+{
+	int 	i;
+	int append_subpath_len = list_length(append_subpaths);
+
+	if (partial_subpaths_set)
+	{
+		for (i = 0; i < list_length(childpaths); i++)
+		{
+			/*
+			 * The child paths themselves may or may not be partial paths. The
+			 * bitmapset numbers of these paths will need to be set considering
+			 * that these are to be appended onto the partial_subpaths_set.
+			 */
+			if (!child_partial_subpaths_set ||
+				bms_is_member(i, child_partial_subpaths_set))
+			{
+				*partial_subpaths_set = bms_add_member(*partial_subpaths_set,
+													   append_subpath_len + i);
+			}
+		}
+	}
+
+	/* list_copy is important here to avoid sharing list substructure */
+	return list_concat(append_subpaths, list_copy(childpaths));
+}
+
 /*
  * accumulate_append_subpath
  *		Add a subpath to the list being built for an Append or MergeAppend
@@ -1597,26 +1654,34 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ *
+ * If partial_subpaths_set is not NULL, it means we are building a
+ * partial subpaths list, and so we need to add the path (or its child paths
+ * in case it's Append or MergeAppend) into the partial_subpaths bitmap set.
  */
 static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+accumulate_append_subpath(List *append_subpaths, Path *path,
+						  Bitmapset **partial_subpaths_set)
 {
 	if (IsA(path, AppendPath))
 	{
-		AppendPath *apath = (AppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((AppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  ((AppendPath*)path)->partial_subpaths);
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
-		MergeAppendPath *mpath = (MergeAppendPath *) path;
-
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		return concat_append_subpaths(append_subpaths,
+									  ((MergeAppendPath*)path)->subpaths,
+									  partial_subpaths_set,
+									  NULL);
 	}
 	else
-		return lappend(subpaths, path);
+		return concat_append_subpaths(append_subpaths,
+									  list_make1(path),
+									  partial_subpaths_set,
+									  NULL);
 }
 
 /*
@@ -1639,7 +1704,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e78f3a8..c5da25c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1697,6 +1698,70 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int parallel_divisor;
+
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up.
+			 */
+			path->startup_cost = Min(path->startup_cost,
+												  subpath->startup_cost);
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 0d00683..daf45c3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1197,7 +1197,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, NULL, 0));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d002e6d..ce4c1dd 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, Bitmapset *partial_plans_set,
+						   List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, best_path->partial_subpaths, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5187,7 +5188,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, Bitmapset *partial_plans_set, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5197,6 +5198,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->partial_subplans_set = partial_plans_set;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 02286d9..2b747c8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3345,10 +3345,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NULL, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1389db1..07ea748 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,7 +566,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -678,7 +678,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NULL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ce772d..574dc9e 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1193,6 +1193,67 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) i.e. ln(num_subpaths) / ln(2)  */
+	log2w = log(list_length(subpaths)) / 0.693 ;
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1261,9 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, Bitmapset *partial_subpaths,
+				   Relids required_outer, int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,40 +1273,28 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
+	pathnode->partial_subpaths = partial_subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, subpaths);
+
 	return pathnode;
 }
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 811ea51..8aecbff 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b880dc1..41de095 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	Bitmapset  *partial_subplans_set;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 05d6f07..be3abda 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1117,6 +1117,7 @@ typedef struct AppendPath
 {
 	Path		path;
 	List	   *subpaths;		/* list of component Paths */
+	Bitmapset  *partial_subpaths;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..1f42850 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 373c722..ee1ecb4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,8 +64,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern int get_append_num_workers(List *subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					   List *subpaths, Bitmapset *partial_subpaths,
+					   Relids required_outer, int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 795d9f5..367d23f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1337,6 +1337,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1403,6 +1404,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..ba963e6 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -17,9 +17,9 @@ explain (costs off)
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 836ec22..0636f08 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -462,11 +462,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;

#41

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#33)

Re: Parallel Append implementation

On Fri, Mar 10, 2017 at 12:17 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I agree that the two-lists approach will consume less memory than
bitmapset. Keeping two lists will effectively have an extra pointer
field which will add up to the AppendPath size, but this size will not
grow with the number of subpaths, whereas the Bitmapset will grow.

Sure. You'll use about one BIT of memory per subpath. I'm kind of
baffled as to why we're treating this as an issue worth serious
discussion; the amount of memory involved is clearly very small. Even
for an appendrel with 1000 children, that's 125 bytes of memory.
Considering the amount of memory we're going to spend planning that
appendrel overall, that's not significant.

However, Ashutosh's response made me think of something: one thing is
that we probably do want to group all of the non-partial plans at the
beginning of the Append so that they get workers first, and put the
partial plans afterward. That's because the partial plans can always
be accelerated by adding more workers as they become available, but
the non-partial plans are just going to take as long as they take - so
we want to start them as soon as possible. In fact, what we might
want to do is actually sort the non-partial paths in order of
decreasing cost, putting the most expensive one first and the others
in decreasing order after that - and then similarly afterward with the
partial paths. If we did that, we wouldn't need to store a bitmapset
OR two separate lists. We could just store the index of the first
partial plan in the list. Then you can test whether a path is partial
by checking whether this_index >= first_partial_index.

One problem with that is that, since the leader has about a 4ms head
start on the other workers, it would tend to pick the most expensive
path to run locally before any other worker had a chance to make a
selection, and that's probably not what we want. To fix that, let's
have the leader start at the end of the list of plans and work
backwards towards the beginning, so that it prefers cheaper and
partial plans over decisions that would force it to undertake a large
amount of work itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#40)

Re: Parallel Append implementation

On Fri, Mar 10, 2017 at 8:12 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+    appendstate->as_whichplan = 0;
+}
I don't think this is buying you anything, and suggest backing it out.
This is required for sequential Append, so that we can start executing
from the first subplan.

My point is that there's really no point in defining a static inline
function containing one line of code. You could just put that line of
code in whatever places need it, which would probably be more clear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Tels (#39)

Re: Parallel Append implementation

On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-abuse@bloodgate.com> wrote:

Just a question for me to understand the implementation details vs. the
strategy:

Have you considered how the scheduling decision might impact performance
due to "inter-plan parallelism vs. in-plan parallelism"?

So what would be the scheduling strategy? And should there be a fixed one
or user-influencable? And what could be good ones?

A simple example:

E.g. if we have 5 subplans, and each can have at most 5 workers and we
have 5 workers overall.

So, do we:

Assign 5 workers to plan 1. Let it finish.
Then assign 5 workers to plan 2. Let it finish.
and so on

or:

Assign 1 workers to each plan until no workers are left?

Currently, we do the first of those, but I'm pretty sure the second is
way better. For example, suppose each subplan has a startup cost. If
you have all the workers pile on each plan in turn, every worker pays
the startup cost for every subplan. If you spread them out, then
subplans can get finished without being visited by all workers, and
then the other workers never pay those costs. Moreover, you reduce
contention for spinlocks, condition variables, etc. It's not
impossible to imagine a scenario where having all workers pile on one
subplan at a time works out better: for example, suppose you have a
table with lots of partitions all of which are on the same disk, and
it's actually one physical spinning disk, not an SSD or a disk array
or anything, and the query is completely I/O-bound. Well, it could
be, in that scenario, that spreading out the workers is going to turn
sequential I/O into random I/O and that might be terrible. In most
cases, though, I think you're going to be better off. If the
partitions are on different spindles or if there's some slack I/O
capacity for prefetching, you're going to come out ahead, maybe way
ahead. If you come out behind, then you're evidently totally I/O
bound and have no capacity for I/O parallelism; in that scenario, you
should probably just turn parallel query off altogether, because
you're not going to benefit from it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Tels

nospam-pg-abuse@bloodgate.com

almost 9 years ago

In reply to: Robert Haas (#43)

Re: Parallel Append implementation

Moin,

On Sat, March 11, 2017 11:29 pm, Robert Haas wrote:

On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-abuse@bloodgate.com>
wrote:

Just a question for me to understand the implementation details vs. the
strategy:

Have you considered how the scheduling decision might impact performance
due to "inter-plan parallelism vs. in-plan parallelism"?

So what would be the scheduling strategy? And should there be a fixed
one
or user-influencable? And what could be good ones?

A simple example:

E.g. if we have 5 subplans, and each can have at most 5 workers and we
have 5 workers overall.

So, do we:

Assign 5 workers to plan 1. Let it finish.
Then assign 5 workers to plan 2. Let it finish.
and so on

or:

Assign 1 workers to each plan until no workers are left?

Currently, we do the first of those, but I'm pretty sure the second is
way better. For example, suppose each subplan has a startup cost. If
you have all the workers pile on each plan in turn, every worker pays
the startup cost for every subplan. If you spread them out, then
subplans can get finished without being visited by all workers, and
then the other workers never pay those costs. Moreover, you reduce
contention for spinlocks, condition variables, etc. It's not
impossible to imagine a scenario where having all workers pile on one
subplan at a time works out better: for example, suppose you have a
table with lots of partitions all of which are on the same disk, and
it's actually one physical spinning disk, not an SSD or a disk array
or anything, and the query is completely I/O-bound. Well, it could
be, in that scenario, that spreading out the workers is going to turn
sequential I/O into random I/O and that might be terrible. In most
cases, though, I think you're going to be better off. If the
partitions are on different spindles or if there's some slack I/O
capacity for prefetching, you're going to come out ahead, maybe way
ahead. If you come out behind, then you're evidently totally I/O
bound and have no capacity for I/O parallelism; in that scenario, you
should probably just turn parallel query off altogether, because
you're not going to benefit from it.

I agree with the proposition that both strategies can work well, or not,
depending on system-setup, the tables and data layout. I'd be a bit more
worried about turning it into the "random-io-case", but that's still just
a feeling and guesswork.

So which one will be better seems speculative, hence the question for
benchmarking different strategies.

So, I'd like to see the scheduler be out in a single place, maybe a
function that get's called with the number of currently running workers,
the max. number of workers to be expected, the new worker, the list of
plans still todo, and then schedules that single worker to one of these
plans by strategy X.

That would make it easier to swap out X for Y and see how it fares,
wouldn't it?

However, I don't think the patch needs to select the optimal strategy
right from the beginning (if that even exists, maybe it's a mixed
strategy), even "not so optimal" parallelism will be better than doing all
things sequentially.

Best regards,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#41)

Re: Parallel Append implementation

On 10 March 2017 at 22:08, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 10, 2017 at 12:17 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I agree that the two-lists approach will consume less memory than
bitmapset. Keeping two lists will effectively have an extra pointer
field which will add up to the AppendPath size, but this size will not
grow with the number of subpaths, whereas the Bitmapset will grow.

Sure. You'll use about one BIT of memory per subpath. I'm kind of
baffled as to why we're treating this as an issue worth serious
discussion; the amount of memory involved is clearly very small. Even
for an appendrel with 1000 children, that's 125 bytes of memory.
Considering the amount of memory we're going to spend planning that
appendrel overall, that's not significant.

Yes, I agree that we should consider rather other things like code
simplicity to determine which data structure we should use in
AppendPath.

However, Ashutosh's response made me think of something: one thing is
that we probably do want to group all of the non-partial plans at the
beginning of the Append so that they get workers first, and put the
partial plans afterward. That's because the partial plans can always
be accelerated by adding more workers as they become available, but
the non-partial plans are just going to take as long as they take - so
we want to start them as soon as possible. In fact, what we might
want to do is actually sort the non-partial paths in order of
decreasing cost, putting the most expensive one first and the others
in decreasing order after that - and then similarly afterward with the
partial paths. If we did that, we wouldn't need to store a bitmapset
OR two separate lists. We could just store the index of the first
partial plan in the list. Then you can test whether a path is partial
by checking whether this_index >= first_partial_index.

I agree that we should preferably have the non-partial plans started
first. But I am not sure if it is really worth ordering the partial
plans by cost. The reason we ended up not keeping track of the
per-subplan parallel_worker, is because it would not matter much ,
and we would just equally distribute the workers among all regardless
of how big the subplans are. Even if smaller plans get more worker,
they will finish faster, and workers would be available to larger
subplans sooner.

Anyways, I have given a thought on the logic of choosing the next plan
, and that is irrespective of whether the list is sorted. I have
included Ashutosh's proposal of scanning the array round-robin as
against finding the minimum, since that method will automatically
distribute the workers evenly. Also, the logic uses a single array and
keeps track of first partial plan. The first section of the array is
non-partial, followed by partial plans. Below is the algorithm ...
There might be corner cases which I didn't yet take into account, but
first I wanted to get an agreement if this looks ok to go ahead with.
Since it does not find minimum worker count, it no longer uses
pa_num_workers. Instead it has boolean field painfo->pa_finished.

parallel_append_next(AppendState *state)
{

/* Make a note of which subplan we have started with */
initial_plan = padesc->next_plan;

/* Keep going to the next plan until we find an unfinished one. In
the process, also keep track of the first unfinished subplan. As the
non-partial subplans are taken one by one, the unfinished subplan will
shift ahead, so that we don't have to scan these anymore */

whichplan = initial_plan;
for (;;)
{
ParallelAppendInfo *painfo = &padesc->pa_info[whichplan];

/*
* Ignore plans that are already done processing. These also include
* non-partial subplans which have already been taken by a worker.
*/
if (!painfo->pa_finished)
{
/* If this a non-partial plan, immediately mark it
finished, and shift ahead first_plan */
if (whichplan < padesc->first_partial_plan)
{
padesc->pa_info[whichplan].pa_finished = true;
padesc->first_plan++;
}

break;
}

/* Either go to the next index, or wrap around to the first
unfinished one */
whichplan = goto_next_plan(whichplan, padesc->first_plan,
padesc->as_nplans - 1));

/* Have we scanned all subplans ? If yes, we are done */
if (whichplan == initial_plan)
break;
}

/* If we didn't find any plan to execute, stop executing. */
if (whichplan == initial_plan || whichplan == PA_INVALID_PLAN)
return false;
else
{
/* Set the chosen plan, and also the next plan to be picked by
other workers */
state->as_whichplan = whichplan;
padesc->next_plan = goto_next_plan(whichplan,
padesc->first_plan, padesc->as_nplans - 1));
return true;
}
}

/* Either go to the next index, or wrap around to the first unfinished one */
int goto_next_plan(curplan, first_plan, last_plan)
{
if (curplan + 1 <= last_plan)
return curplan + 1;
else
return first_plan;
}

One problem with that is that, since the leader has about a 4ms head
start on the other workers, it would tend to pick the most expensive
path to run locally before any other worker had a chance to make a
selection, and that's probably not what we want. To fix that, let's
have the leader start at the end of the list of plans and work
backwards towards the beginning, so that it prefers cheaper and
partial plans over decisions that would force it to undertake a large
amount of work itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Tels (#44)

Re: Parallel Append implementation

On 12 March 2017 at 19:31, Tels <nospam-pg-abuse@bloodgate.com> wrote:

Moin,

On Sat, March 11, 2017 11:29 pm, Robert Haas wrote:

On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-abuse@bloodgate.com>
wrote:

Just a question for me to understand the implementation details vs. the
strategy:

Have you considered how the scheduling decision might impact performance
due to "inter-plan parallelism vs. in-plan parallelism"?

So what would be the scheduling strategy? And should there be a fixed
one
or user-influencable? And what could be good ones?

A simple example:

E.g. if we have 5 subplans, and each can have at most 5 workers and we
have 5 workers overall.

So, do we:

Assign 5 workers to plan 1. Let it finish.
Then assign 5 workers to plan 2. Let it finish.
and so on

or:

Assign 1 workers to each plan until no workers are left?

Currently, we do the first of those, but I'm pretty sure the second is
way better. For example, suppose each subplan has a startup cost. If
you have all the workers pile on each plan in turn, every worker pays
the startup cost for every subplan. If you spread them out, then
subplans can get finished without being visited by all workers, and
then the other workers never pay those costs. Moreover, you reduce
contention for spinlocks, condition variables, etc. It's not
impossible to imagine a scenario where having all workers pile on one
subplan at a time works out better: for example, suppose you have a
table with lots of partitions all of which are on the same disk, and
it's actually one physical spinning disk, not an SSD or a disk array
or anything, and the query is completely I/O-bound. Well, it could
be, in that scenario, that spreading out the workers is going to turn
sequential I/O into random I/O and that might be terrible. In most
cases, though, I think you're going to be better off. If the
partitions are on different spindles or if there's some slack I/O
capacity for prefetching, you're going to come out ahead, maybe way
ahead. If you come out behind, then you're evidently totally I/O
bound and have no capacity for I/O parallelism; in that scenario, you
should probably just turn parallel query off altogether, because
you're not going to benefit from it.

I agree with the proposition that both strategies can work well, or not,
depending on system-setup, the tables and data layout. I'd be a bit more
worried about turning it into the "random-io-case", but that's still just
a feeling and guesswork.

So which one will be better seems speculative, hence the question for
benchmarking different strategies.

So, I'd like to see the scheduler be out in a single place, maybe a
function that get's called with the number of currently running workers,
the max. number of workers to be expected, the new worker, the list of
plans still todo, and then schedules that single worker to one of these
plans by strategy X.

That would make it easier to swap out X for Y and see how it fares,
wouldn't it?

Yes, actually pretty much the scheduler logic is all in one single
function parallel_append_next().

However, I don't think the patch needs to select the optimal strategy
right from the beginning (if that even exists, maybe it's a mixed
strategy), even "not so optimal" parallelism will be better than doing all
things sequentially.

Best regards,

Tels

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#45)

Re: Parallel Append implementation

On Mon, Mar 13, 2017 at 4:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I agree that we should preferably have the non-partial plans started
first. But I am not sure if it is really worth ordering the partial
plans by cost. The reason we ended up not keeping track of the
per-subplan parallel_worker, is because it would not matter much ,
and we would just equally distribute the workers among all regardless
of how big the subplans are. Even if smaller plans get more worker,
they will finish faster, and workers would be available to larger
subplans sooner.

Imagine that the plan costs are 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, and 10
and you have 2 workers.

If you move that 10 to the front, this will finish in 10 time units.
If you leave it at the end, it will take 15 time units.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Robert Haas (#47)

Re: Parallel Append implementation

On Mon, Mar 13, 2017 at 7:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 13, 2017 at 4:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I agree that we should preferably have the non-partial plans started
first. But I am not sure if it is really worth ordering the partial
plans by cost. The reason we ended up not keeping track of the
per-subplan parallel_worker, is because it would not matter much ,
and we would just equally distribute the workers among all regardless
of how big the subplans are. Even if smaller plans get more worker,
they will finish faster, and workers would be available to larger
subplans sooner.

Imagine that the plan costs are 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, and 10
and you have 2 workers.

If you move that 10 to the front, this will finish in 10 time units.
If you leave it at the end, it will take 15 time units.

Oh, never mind. You were only asking whether we should sort partial
plans. That's a lot less important, and maybe not important at all.
The only consideration there is whether we might try to avoid having
the leader start in on a plan with a large startup cost.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#45)

1 attachment(s)

Re: Parallel Append implementation

On 12 March 2017 at 08:50, Robert Haas <robertmhaas@gmail.com> wrote:

However, Ashutosh's response made me think of something: one thing is
that we probably do want to group all of the non-partial plans at the
beginning of the Append so that they get workers first, and put the
partial plans afterward. That's because the partial plans can always
be accelerated by adding more workers as they become available, but
the non-partial plans are just going to take as long as they take - so
we want to start them as soon as possible. In fact, what we might
want to do is actually sort the non-partial paths in order of
decreasing cost, putting the most expensive one first and the others
in decreasing order after that - and then similarly afterward with the
partial paths. If we did that, we wouldn't need to store a bitmapset
OR two separate lists. We could just store the index of the first
partial plan in the list. Then you can test whether a path is partial
by checking whether this_index >= first_partial_index.

Attached is an updated patch v7, which does the above. Now,
AppendState->subplans has all non-partial subplans followed by all
partial subplans, with the non-partial subplans in the order of
descending total cost. Also, for convenience, the AppendPath also now
has similar ordering in its AppendPath->subpaths. So there is a new
field both in Append and AppendPath : first_partial_path/plan, which
has value 0 if there are no non-partial subpaths.

Also the backend now scans reverse, so that it does not take up the
most expensive path.

There are also some changes in the costing done. Now that we know that
the very first path is the costliest non-partial path, we can use its
total cost as the total cost of Append in case all the partial path
costs are lesser.

Modified/enhanced an existing test scenario in
src/test/regress/select_parallel.sql so that Parallel Append is
covered.

As suggested by Robert, since pa_info->pa_finished was the only field
in pa_info, removed the ParallelAppendDescData.pa_info structure, and
instead brought pa_info->pa_finished into ParallelAppendDescData.

+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+    appendstate->as_whichplan = 0;
+}
I don't think this is buying you anything, and suggest backing it out.
This is required for sequential Append, so that we can start executing
from the first subplan.
My point is that there's really no point in defining a static inline
function containing one line of code. You could just put that line of
code in whatever places need it, which would probably be more clear.

Did the same.

Attachments:

ParallelAppend_v7.patchapplication/octet-stream; name=ParallelAppend_v7.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a1289e5..41d807c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -214,6 +215,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -278,6 +283,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -781,6 +790,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6986cae..254e5ed 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,10 +59,48 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	slock_t		pa_mutex;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
 
+static bool exec_append_initialize_next(AppendState *appendstate);
+static void set_finished(ParallelAppendDesc padesc, int whichplan);
+static bool parallel_append_next(AppendState *state);
+static bool leader_next(AppendState *state);
+static int goto_next_plan(int curplan, int first_plan, int last_plan);
+
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
  *
@@ -77,6 +115,27 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+	{
+		/* Backward scan is not supported by parallel-aware plans */
+		Assert(ScanDirectionIsForward(appendstate->ps.state->es_direction));
+
+		return parallel_append_next(appendstate);
+	}
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -176,10 +235,9 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * In case it's a sequential Append, initialize to scan first subplan.
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +257,14 @@ ExecAppend(AppendState *node)
 		TupleTableSlot *result;
 
 		/*
+		 * Check if we are already finished plans from parallel append. This
+		 * can happen if all the subplans are finished when this worker
+		 * has not even started returning tuples.
+		 */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
+		/*
 		 * figure out which subplan we are currently processing
 		 */
 		subnode = node->appendplans[node->as_whichplan];
@@ -219,14 +285,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			set_finished(node->as_padesc, node->as_whichplan);
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -266,6 +336,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -285,6 +356,284 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+	SpinLockInit(&padesc->pa_mutex);
+
+	/*
+	 * Just setting all the number of workers to 0 is enough. The logic
+	 * of choosing the next plan in workers will take care of everything
+	 * else.
+	 */
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) parallel_append_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		set_finished
+ *
+ *		Indicate that this child plan node is about to be finished, so no other
+ *		workers should take up this node. Workers who are already executing
+ *		this node will continue to do so, but workers looking for next nodes to
+ *		pick up would skip this node after this function is called. It is
+ *		possible that multiple workers call this function for the same node at
+ *		the same time, because these workers were executing the same node and
+ *		they finished with it at the same time. The spinlock is not for this
+ *		purpose. The spinlock is used so that it does not change the
+ *		pa_num_workers field while workers are choosing the next node.
+ * ----------------------------------------------------------------
+ */
+static void
+set_finished(ParallelAppendDesc padesc, int whichplan)
+{
+	SpinLockAcquire(&padesc->pa_mutex);
+	padesc->pa_finished[whichplan] = true;
+	SpinLockRelease(&padesc->pa_mutex);
+}
+
+/* ----------------------------------------------------------------
+ *		parallel_append_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+parallel_append_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return leader_next(state);
+
+	SpinLockAcquire(&padesc->pa_mutex);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = goto_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	/* If we didn't find any plan to execute, stop executing. */
+	if (!found)
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = goto_next_plan(whichplan, padesc->pa_first_plan,
+										   state->as_nplans - 1);
+	}
+
+	SpinLockRelease(&padesc->pa_mutex);
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * here and spinlock released, some other worker finishes this node and
+	 * calls set_finished(). In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	/* The parallel leader should start from the last subplan. */
+
+	SpinLockAcquire(&padesc->pa_mutex);
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	SpinLockRelease(&padesc->pa_mutex);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		goto_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ * ----------------------------------------------------------------
+ */
+static int goto_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 25fd051..7ee0bb8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -236,6 +236,7 @@ _copyAppend(const Append *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7418fbe..38ade5f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -369,6 +369,7 @@ _outAppend(StringInfo str, const Append *node)
 	_outPlanInfo(str, (const Plan *) node);
 
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d3bbc02..fa1487a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1565,6 +1565,7 @@ _readAppend(void)
 	ReadCommonPlan(&local_node->plan);
 
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43bfd23..094809d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -99,7 +99,11 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
+static List *paths_insert_sorted_by_cost(List *paths, List *insert_paths);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1255,6 +1259,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
@@ -1277,14 +1282,42 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
 			subpaths = accumulate_append_subpath(subpaths,
-											  childrel->cheapest_total_path);
+												 childrel->cheapest_total_path);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
-			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-									   linitial(childrel->partial_pathlist));
+			partial_subpaths = accumulate_partialappend_subpath(
+										partial_subpaths,
+										linitial(childrel->partial_pathlist),
+										true, &nonpartial_subpaths);
+		else if (enable_parallelappend)
+		{
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			Path *parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* If we got one parallel-safe path, add it */
+			if (parallel_safe_path)
+			{
+				partial_subpaths =
+					accumulate_partialappend_subpath(partial_subpaths,
+											  parallel_safe_path, false,
+											  &nonpartial_subpaths);
+			}
+			else
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. So drop the idea for partial append path.
+				 */
+				partial_subpaths_valid = false;
+			}
+		}
 		else
 			partial_subpaths_valid = false;
 
@@ -1359,7 +1392,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0));
 
 	/*
 	 * Consider an append of partial unordered, unparameterized partial paths.
@@ -1367,26 +1401,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths,
+												  nonpartial_subpaths);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers);
+		appendpath = create_append_path(rel, nonpartial_subpaths,
+										partial_subpaths,
+										NULL, parallel_workers);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1438,7 +1460,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0));
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0));
 	}
 }
 
@@ -1652,6 +1675,123 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * paths_insert_sorted_by_cost
+ *
+ * Insert each of the paths of 'insert_paths' into the right position in
+ * 'paths' so that 'paths' remains sorted by descending order of total cost.
+ *
+ * Return a possibly modified 'paths'.
+ */
+static List *
+paths_insert_sorted_by_cost(List *paths, List *insert_paths)
+{
+	ListCell	   *lci;
+
+	foreach(lci, insert_paths)
+	{
+		Path	   *insert_path = (Path *) lfirst(lci);
+		ListCell	   *lcp;
+		ListCell	   *prev;
+
+		/* Insert this 'insert_path' into 'paths' */
+		prev = NULL;
+		foreach(lcp, paths)
+		{
+			Path	   *path = (Path *) lfirst(lcp);
+
+			/* found the right position ? */
+			if (path->total_cost < insert_path->total_cost)
+				break;
+			prev = lcp;
+		}
+
+		/* Inserting before the first element ? */
+		if (prev == NULL)
+			paths = lcons(insert_path, paths);
+		else
+			(void) lappend_cell(paths, prev, insert_path);
+	}
+
+	return paths;
+}
+
+/*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths =
+						paths_insert_sorted_by_cost(*nonpartial_subpaths,
+													apath_nonpartial_paths);
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths =
+					paths_insert_sorted_by_cost(*nonpartial_subpaths,
+												mpath->subpaths);
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths =
+						paths_insert_sorted_by_cost(*nonpartial_subpaths,
+													list_make1(subpath));
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1671,7 +1811,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL, 0));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a129d1e..e8df075 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1704,6 +1705,98 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int		parallel_divisor;
+		Cost	highest_nonpartial_cost = 0;
+		int		worker;
+
+		/*
+		 * Make a note of the cost of first non-partial subpath, i.e. the first
+		 * one in the list, if at all there are any non-partial subpaths.
+		 */
+		if (num_nonpartial_subpaths > 0)
+			highest_nonpartial_cost = ((Path *) linitial(subpaths))->total_cost;
+
+		worker = 1;
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append. We don't consider non-partial paths separately. The
+			 * parallel_divisor for non-partial paths is 1, and so overall we
+			 * get a good approximation of per-worker cost.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (worker <= path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+				worker++;
+			}
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+		/*
+		 * No matter how fast the partial plans finish, the Append path is
+		 * going to take at least the time needed for the costliest non-partial
+		 * path to finish. This is actually an approximation. We can even
+		 * consider all the other non-partial plans and how workers would get
+		 * scheduled to determine the cost of finishing the non-partial paths.
+		 * But we anyway can't calculate the total cost exactly, especially
+		 * because we can't determine exactly when some of the workers would
+		 * start executing partial plans.
+		 */
+		path->total_cost = Max(highest_nonpartial_cost, path->total_cost);
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 0551668..401d95a 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL, 0));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 89e1946..1702015 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist);
+	plan = make_append(subplans, best_path->first_partial_path, tlist);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5161,7 +5162,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(List *appendplans, int first_partial_plan, List *tlist)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5171,6 +5172,7 @@ make_append(List *appendplans, List *tlist)
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 02286d9..529d91f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3345,10 +3345,7 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NULL,
-								   0);
+				create_append_path(grouped_rel, paths, NIL, NULL, 0);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1389db1..e1d70a8 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,7 +566,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -678,7 +678,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL, NULL, 0);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ce772d..6475e23 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1193,6 +1193,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) i.e. ln(num_subpaths) / ln(2)  */
+	log2w = log(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths))
+			/ 0.693 ;
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1264,9 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer, int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,40 +1276,29 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware =
+		(enable_parallelappend && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
-	pathnode->subpaths = subpaths;
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4feb26a..4f21c2e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b880dc1..79048af 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,7 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 05d6f07..eea8c72 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1108,15 +1108,22 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
 	Path		path;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..43dc72f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 373c722..9622d2f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,8 +64,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+							   List *subpaths, List *partial_subpaths,
+							   Relids required_outer, int parallel_workers);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6494b20..36be3a7 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1343,6 +1343,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1409,6 +1410,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index e3e9e34..810070a 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -463,11 +463,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#50

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Amit Khandekar (#49)

Re: Parallel Append implementation

On Thu, Mar 16, 2017 at 3:57 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 12 March 2017 at 08:50, Robert Haas <robertmhaas@gmail.com> wrote:

However, Ashutosh's response made me think of something: one thing is
that we probably do want to group all of the non-partial plans at the
beginning of the Append so that they get workers first, and put the
partial plans afterward. That's because the partial plans can always
be accelerated by adding more workers as they become available, but
the non-partial plans are just going to take as long as they take - so
we want to start them as soon as possible. In fact, what we might
want to do is actually sort the non-partial paths in order of
decreasing cost, putting the most expensive one first and the others
in decreasing order after that - and then similarly afterward with the
partial paths. If we did that, we wouldn't need to store a bitmapset
OR two separate lists. We could just store the index of the first
partial plan in the list. Then you can test whether a path is partial
by checking whether this_index >= first_partial_index.

Attached is an updated patch v7, which does the above. Now,
AppendState->subplans has all non-partial subplans followed by all
partial subplans, with the non-partial subplans in the order of
descending total cost. Also, for convenience, the AppendPath also now
has similar ordering in its AppendPath->subpaths. So there is a new
field both in Append and AppendPath : first_partial_path/plan, which
has value 0 if there are no non-partial subpaths.

Also the backend now scans reverse, so that it does not take up the
most expensive path.

There are also some changes in the costing done. Now that we know that
the very first path is the costliest non-partial path, we can use its
total cost as the total cost of Append in case all the partial path
costs are lesser.

Modified/enhanced an existing test scenario in
src/test/regress/select_parallel.sql so that Parallel Append is
covered.

As suggested by Robert, since pa_info->pa_finished was the only field
in pa_info, removed the ParallelAppendDescData.pa_info structure, and
instead brought pa_info->pa_finished into ParallelAppendDescData.
+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+    appendstate->as_whichplan = 0;
+}
I don't think this is buying you anything, and suggest backing it out.
This is required for sequential Append, so that we can start executing
from the first subplan.
My point is that there's really no point in defining a static inline
function containing one line of code. You could just put that line of
code in whatever places need it, which would probably be more clear.
Did the same.

Some comments
+         * Check if we are already finished plans from parallel append. This
+         * can happen if all the subplans are finished when this worker
+         * has not even started returning tuples.
+         */
+        if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+            return ExecClearTuple(node->ps.ps_ResultTupleSlot);
From the comment, it looks like this condition will be encountered before the
backend returns any tuple. But this code is part of the loop which returns the
tuples. Shouldn't this be outside the loop? Why do we want to check a condition
for every row returned when the condition can happen only once and that too
before returning any tuple?

Why do we need following code in both ExecAppendInitializeWorker() and
ExecAppendInitializeDSM()? Both of those things happen before starting the
actual execution, so one of those should suffice?
+    /* Choose the optimal subplan to be executed. */
+    (void) parallel_append_next(node);

There is no pa_num_worker now, so probably this should get updated. Per comment
we should also get rid of SpinLockAcquire() and SpinLockRelease()?
+ *        purpose. The spinlock is used so that it does not change the
+ *        pa_num_workers field while workers are choosing the next node.

BTW, sa_finished seems to be a misnomor. The plan is not finished yet, but it
wants no more workers. So, should it be renamed as sa_no_new_workers or
something like that?

In parallel_append_next() we shouldn't need to call goto_next_plan() twice. If
the plan indicated by pa_next_plan is finished, all the plans must have
finished. This should be true if we set pa_next_plan to 0 at the time of
initialization. Any worker picking up pa_next_plan will set it to the next
valid plan. So the next worker asking for plan should pick pa_next_plan and
set it to the next one and so on.

I am wonding whether goto_next_plan() can be simplified as some module
arithmatic e.g. (whichplan - first_plan)++ % (last_plan - first_plan)
+ first_plan.

I am still reviewing the patch.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#50)

Re: Parallel Append implementation

On Thu, Mar 16, 2017 at 8:48 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Why do we need following code in both ExecAppendInitializeWorker() and
ExecAppendInitializeDSM()? Both of those things happen before starting the
actual execution, so one of those should suffice?
+    /* Choose the optimal subplan to be executed. */
+    (void) parallel_append_next(node);

ExecAppendInitializeWorker runs only in workers, but
ExecAppendInitializeDSM runs only in the leader.

BTW, sa_finished seems to be a misnomor. The plan is not finished yet, but it
wants no more workers. So, should it be renamed as sa_no_new_workers or
something like that?

I think that's not going to improve clarity. The comments can clarify
the exact semantics.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#49)

Re: Parallel Append implementation

On Thu, Mar 16, 2017 at 6:27 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Attached is an updated patch v7, which does the above.

Some comments:

- You've added a GUC (which is good) but not documented it (which is
bad) or added it to postgresql.conf.sample (also bad).

- You've used a loop inside a spinlock-protected critical section,
which is against project policy. Use an LWLock; define and document a
new builtin tranche ID.

- The comment for pa_finished claims that it is the number of workers
executing the subplan, but it's a bool, not a count; I think this
comment is just out of date.

- paths_insert_sorted_by_cost() is a hand-coded insertion sort. Can't
we find a way to use qsort() for this instead of hand-coding a slower
algorithm? I think we could just create an array of the right length,
stick each path into it from add_paths_to_append_rel, and then qsort()
the array based on <is-partial, total-cost>. Then the result can be
turned into a list.

- Maybe the new helper functions in nodeAppend.c could get names
starting with exec_append_, to match the style of
exec_append_initialize_next().

- There's a superfluous whitespace change in add_paths_to_append_rel.

- The substantive changes in add_paths_to_append_rel don't look right
either. It's not clear why accumulate_partialappend_subpath is
getting called even in the non-enable_parallelappend case. I don't
think the logic for the case where we're not generating a parallel
append path needs to change at all.

- When parallel append is enabled, I think add_paths_to_append_rel
should still consider all the same paths that it does today, plus one
extra. The new path is a parallel append path where each subpath is
the cheapest subpath for that childrel, whether partial or
non-partial. If !enable_parallelappend, or if all of the cheapest
subpaths are partial, then skip this. (If all the cheapest subpaths
are non-partial, it's still potentially useful.) In other words,
don't skip consideration of parallel append just because you have a
partial path available for every child rel; it could be

- I think the way cost_append() works is not right. What you've got
assumes that you can just multiply the cost of a partial plan by the
parallel divisor to recover the total cost, which is not true because
we don't divide all elements of the plan cost by the parallel divisor
-- only the ones that seem like they should be divided. Also, it
could be smarter about what happens with the costs of non-partial
paths. I suggest the following algorithm instead.

1. Add up all the costs of the partial paths. Those contribute
directly to the final cost of the Append. This ignores the fact that
the Append may escalate the parallel degree, but I think we should
just ignore that problem for now, because we have no real way of
knowing what the impact of that is going to be.

2. Next, estimate the cost of the non-partial paths. To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each non-partial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the non-partial plans. Add
this to the result from step 1 to get the total cost.

- In get_append_num_workers, instead of the complicated formula with
log() and 0.693, just add the list lengths and call fls() on the
result. Integer arithmetic FTW!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#52)

Re: Parallel Append implementation

On 17 March 2017 at 01:37, Robert Haas <robertmhaas@gmail.com> wrote:

- You've added a GUC (which is good) but not documented it (which is
bad) or added it to postgresql.conf.sample (also bad).

- You've used a loop inside a spinlock-protected critical section,
which is against project policy. Use an LWLock; define and document a
new builtin tranche ID.

- The comment for pa_finished claims that it is the number of workers
executing the subplan, but it's a bool, not a count; I think this
comment is just out of date.

Yes, agreed. Will fix the above.

- paths_insert_sorted_by_cost() is a hand-coded insertion sort. Can't
we find a way to use qsort() for this instead of hand-coding a slower
algorithm? I think we could just create an array of the right length,
stick each path into it from add_paths_to_append_rel, and then qsort()
the array based on <is-partial, total-cost>. Then the result can be
turned into a list.

Yeah, I was in double minds as to whether to do the
copy-to-array-and-qsort thing, or should just write the same number of
lines of code to manually do an insertion sort. Actually I was
searching if we already have a linked list sort, but it seems we don't
have. Will do the qsort now since it would be faster.

- Maybe the new helper functions in nodeAppend.c could get names
starting with exec_append_, to match the style of
exec_append_initialize_next().

- There's a superfluous whitespace change in add_paths_to_append_rel.

Will fix this.

- The substantive changes in add_paths_to_append_rel don't look right
either. It's not clear why accumulate_partialappend_subpath is
getting called even in the non-enable_parallelappend case. I don't
think the logic for the case where we're not generating a parallel
append path needs to change at all.

When accumulate_partialappend_subpath() is called for a childrel with
a partial path, it works just like accumulate_append_subpath() when
enable_parallelappend is false. That's why, for partial child path,
the same function is called irrespective of parallel-append or
non-parallel-append case. May be mentioning this in comments should
suffice here ?

- When parallel append is enabled, I think add_paths_to_append_rel
should still consider all the same paths that it does today, plus one
extra. The new path is a parallel append path where each subpath is
the cheapest subpath for that childrel, whether partial or
non-partial. If !enable_parallelappend, or if all of the cheapest
subpaths are partial, then skip this. (If all the cheapest subpaths
are non-partial, it's still potentially useful.)

In case of all-partial childrels, the paths are *exactly* same as
those that would have been created for enable_parallelappend=off. The
extra path is there for enable_parallelappend=on only when one or more
of the child rels do not have partial paths. Does this make sense ?

In other words,
don't skip consideration of parallel append just because you have a
partial path available for every child rel; it could be

I didn't get this. Are you saying that in the patch it is getting
skipped if enable_parallelappend = off ? I don't think so. For
all-partial child rels, partial append is always created. Only thing
is, in case of enable_parallelappend=off, the Append path is not
parallel_aware, so it executes just like it executes today under
Gather without being parallel-aware.

- I think the way cost_append() works is not right. What you've got
assumes that you can just multiply the cost of a partial plan by the
parallel divisor to recover the total cost, which is not true because
we don't divide all elements of the plan cost by the parallel divisor
-- only the ones that seem like they should be divided.

Yes, that was an approximation done. For those subpaths for which
there is no parallel_divsor, we cannot calculate the total cost
considering the number of workers for the subpath. I feel we should
consider the per-subpath parallel_workers somehow. The
Path->total_cost for a partial path is *always* per-worker cost, right
? Just want to confirm this assumption of mine.

Also, it
could be smarter about what happens with the costs of non-partial
paths. I suggest the following algorithm instead.

1. Add up all the costs of the partial paths. Those contribute
directly to the final cost of the Append. This ignores the fact that
the Append may escalate the parallel degree, but I think we should
just ignore that problem for now, because we have no real way of
knowing what the impact of that is going to be.

I wanted to take into account per-subpath parallel_workers for total
cost of Append. Suppose the partial subpaths have per worker total
costs (3, 3, 3) and their parallel_workers are (2, 8, 4), with 2
Append workers available. So according to what you say, the total cost
is 9. With per-subplan parallel_workers taken into account, total cost
= (3*2 + 3*8 * 3*4)/2 = 21.

May be I didn't follow exactly what you suggested. Your logic is not
taking into account number of workers ? I am assuming you are
calculating per-worker total cost here.

2. Next, estimate the cost of the non-partial paths. To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each non-partial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the non-partial plans. Add
this to the result from step 1 to get the total cost.

So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5,
15) , and so the max is 15 ? I surely am misinterpreting this.

Actually, I couldn't come up with a general formula to find the
non-partial paths total cost, given the per-subplan cost and number of
workers. I mean, we can manually find out the total cost, but turning
it into a formula seems quite involved. We can even do a dry-run of
workers consuming each of the subplan slots and find the total time
time units taken, but finding some approximation seemed ok.

For e.g. we can manually find total time units taken for following :
costs (8, 2, 2, 2) with 2 workers : 8
costs (6, 6, 4, 1) with 2 workers : 10.
costs (6, 6, 4, 1) with 3 workers : 6.

But coming up with an alogrithm or a formula didn't look worth. So I
just did the total cost and divided it by workers. And besides that,
took the maximum of the 1st plan cost (since it is the highest) and
the average of total. I understand it would be too much approximation
for some cases, but another thing is, we don't know how to take into
account some of the workers shifting to partial workers. So the shift
may be quite fuzzy since all workers may not shift to partial plans
together.

- In get_append_num_workers, instead of the complicated formula with
log() and 0.693, just add the list lengths and call fls() on the
result. Integer arithmetic FTW!

Yeah fls() could be used. BTW I just found that costsize.c already has
this defined in the same way I did:
#define LOG2(x) (log(x) / 0.693147180559945)
May be we need to shift this to some common header file.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#50)

Re: Parallel Append implementation

On 16 March 2017 at 18:18, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

+         * Check if we are already finished plans from parallel append. This
+         * can happen if all the subplans are finished when this worker
+         * has not even started returning tuples.
+         */
+        if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+            return ExecClearTuple(node->ps.ps_ResultTupleSlot);
From the comment, it looks like this condition will be encountered before the
backend returns any tuple. But this code is part of the loop which returns the
tuples. Shouldn't this be outside the loop? Why do we want to check a condition
for every row returned when the condition can happen only once and that too
before returning any tuple?

The way ExecProcNode() gets called, there is no different special code
that gets called instead of ExecProcNode() when a tuple is fetched for
the first time. I mean, we cannot prevent ExecProcNode() from getting
called when as_whichplan is invalid right from the beginning.

One thing we can do is : have a special slot in AppenState->as_plan[]
which has some dummy execution node that just returns NULL tuple, and
initially make as_whichplan point to this slot. But I think it is not
worth doing this.

We can instead reduce the if condition to:
if (node->as_whichplan == PA_INVALID_PLAN)
{
Assert(node->as_padesc != NULL);
return ExecClearTuple(node->ps.ps_ResultTupleSlot);
}

BTW, the loop which you mentioned that returns tuples.... the loop is
not for returning tuples, the loop is for iterating to the next
subplan. Even if we take the condition out and keep it in the
beginning of ExecAppend, the issue will remain.

Why do we need following code in both ExecAppendInitializeWorker() and
ExecAppendInitializeDSM()? Both of those things happen before starting the
actual execution, so one of those should suffice?
+    /* Choose the optimal subplan to be executed. */
+    (void) parallel_append_next(node);

ExecAppendInitializeWorker() is for the worker to attach (and then
initialize its own local data) to the dsm area created and shared by
ExecAppendInitializeDSM() in backend. But both worker and backend
needs to initialize its own as_whichplan to the next subplan.

There is no pa_num_worker now, so probably this should get updated. Per comment
we should also get rid of SpinLockAcquire() and SpinLockRelease()?
+ *        purpose. The spinlock is used so that it does not change the
+ *        pa_num_workers field while workers are choosing the next node.

Will do this.

BTW, sa_finished seems to be a misnomor. The plan is not finished yet, but it
wants no more workers. So, should it be renamed as sa_no_new_workers or
something like that?

Actually in this context, "finished" means "we are done with this subplan".

In parallel_append_next() we shouldn't need to call goto_next_plan() twice. If
the plan indicated by pa_next_plan is finished, all the plans must have
finished. This should be true if we set pa_next_plan to 0 at the time of
initialization. Any worker picking up pa_next_plan will set it to the next
valid plan. So the next worker asking for plan should pick pa_next_plan and
set it to the next one and so on.

The current patch does not call it twice, but I might have overlooked
something. Let me know if I have.

I am wonding whether goto_next_plan() can be simplified as some module
arithmatic e.g. (whichplan - first_plan)++ % (last_plan - first_plan)
+ first_plan.

Hmm. IMHO it seems too much calculation for just shifting to next array element.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Amit Khandekar (#53)

Re: Parallel Append implementation

On Fri, Mar 17, 2017 at 10:12 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Yeah, I was in double minds as to whether to do the
copy-to-array-and-qsort thing, or should just write the same number of
lines of code to manually do an insertion sort. Actually I was
searching if we already have a linked list sort, but it seems we don't
have. Will do the qsort now since it would be faster.

relcache.c does an insertion sort with a list of OIDs. See insert_ordered_oid().

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#53)

Re: Parallel Append implementation

2. Next, estimate the cost of the non-partial paths. To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each non-partial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the non-partial plans. Add
this to the result from step 1 to get the total cost.

So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5,
15) , and so the max is 15 ? I surely am misinterpreting this.

Actually, I couldn't come up with a general formula to find the
non-partial paths total cost, given the per-subplan cost and number of
workers. I mean, we can manually find out the total cost, but turning
it into a formula seems quite involved. We can even do a dry-run of
workers consuming each of the subplan slots and find the total time
time units taken, but finding some approximation seemed ok.

For e.g. we can manually find total time units taken for following :
costs (8, 2, 2, 2) with 2 workers : 8
costs (6, 6, 4, 1) with 2 workers : 10.
costs (6, 6, 4, 1) with 3 workers : 6.

But coming up with an alogrithm or a formula didn't look worth. So I
just did the total cost and divided it by workers. And besides that,
took the maximum of the 1st plan cost (since it is the highest) and
the average of total. I understand it would be too much approximation
for some cases, but another thing is, we don't know how to take into
account some of the workers shifting to partial workers. So the shift
may be quite fuzzy since all workers may not shift to partial plans
together.

For non-partial paths, I did some comparison between the actual cost
and the cost taken by adding the per-subpath figures and dividing by
number of workers. And in the below cases, they do not differ
significantly. Here are the figures :

Case 1 :
Cost units of subpaths : 20 16 10 8 3 1.
Workers : 3
Actual total time to finish all workers : 20.
total/workers: 16.

Case 2 :
Cost units of subpaths : 20 16 10 8 3 1.
Workers : 2
Actual total time to finish all workers : 34.
total/workers: 32.

Case 3 :
Cost units of subpaths : 5 3 3 3 3
Workers : 3
Actual total time to finish all workers : 6
total/workers: 5.6

One more thing observed, is , in all of the above cases, all the
workers more or less finish at about the same time.

So this method seem to compare good which actual cost. The average
comes out a little less than the actual. But I think in the patch,
what I need to correct is, calculate separate per-worker costs of
non-partial and partial costs, and add them. This will give us
per-worker total cost, which is what a partial Append cost will be. I
just added all costs together.

There can be some extreme cases such as (5, 1, 1, 1, 1, 1) with 6
workers, where it will take at least 5 units, but average is 2. For
that we can clamp up the cost to the first path cost, so that for e.g.
it does not go lesser than 5 in this case.

Actually I have deviced one algorithm to calculate the exact time when
all workers finish non-partial costs. But I think it does not make
sense to apply it because it may be too much of calculation cost for
hundreds of paths.

But anyways, for archival purpose, here is the algorithm :

Per-subpath cost : 20 16 10 8 3 1, with 3 workers.
After 10 units (this is minimum of 20, 16, 10), the times remaining are :
10 6 0 8 3 1
After 6 units (minimum of 10, 06, 08), the times remaining are :
4 0 0 2 3 1
After 2 units (minimum of 4, 2, 3), the times remaining are :
2 0 0 0 1 1
After 1 units (minimum of 2, 1, 1), the times remaining are :
1 0 0 0 0 0
After 1 units (minimum of 1, 0 , 0), the times remaining are :
0 0 0 0 0 0
Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#53)

Re: Parallel Append implementation

On Fri, Mar 17, 2017 at 1:12 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

- The substantive changes in add_paths_to_append_rel don't look right
either. It's not clear why accumulate_partialappend_subpath is
getting called even in the non-enable_parallelappend case. I don't
think the logic for the case where we're not generating a parallel
append path needs to change at all.

When accumulate_partialappend_subpath() is called for a childrel with
a partial path, it works just like accumulate_append_subpath() when
enable_parallelappend is false. That's why, for partial child path,
the same function is called irrespective of parallel-append or
non-parallel-append case. May be mentioning this in comments should
suffice here ?

I don't get it. If you can get the same effect by changing something
or not changing it, presumably it'd be better to not change it. We
try not to change things just because we can; the change should be an
improvement in some way.

- When parallel append is enabled, I think add_paths_to_append_rel
should still consider all the same paths that it does today, plus one
extra. The new path is a parallel append path where each subpath is
the cheapest subpath for that childrel, whether partial or
non-partial. If !enable_parallelappend, or if all of the cheapest
subpaths are partial, then skip this. (If all the cheapest subpaths
are non-partial, it's still potentially useful.)

In case of all-partial childrels, the paths are *exactly* same as
those that would have been created for enable_parallelappend=off. The
extra path is there for enable_parallelappend=on only when one or more
of the child rels do not have partial paths. Does this make sense ?

No, I don't think so. Imagine that we have three children, A, B, and
C. The cheapest partial paths have costs of 10,000 each. A, however,
has a non-partial path with a cost of 1,000. Even though A has a
partial path, we still want to consider a parallel append using the
non-partial path because it figures to be hugely faster.

The
Path->total_cost for a partial path is *always* per-worker cost, right
? Just want to confirm this assumption of mine.

Yes.

Also, it
could be smarter about what happens with the costs of non-partial
paths. I suggest the following algorithm instead.

1. Add up all the costs of the partial paths. Those contribute
directly to the final cost of the Append. This ignores the fact that
the Append may escalate the parallel degree, but I think we should
just ignore that problem for now, because we have no real way of
knowing what the impact of that is going to be.

I wanted to take into account per-subpath parallel_workers for total
cost of Append. Suppose the partial subpaths have per worker total
costs (3, 3, 3) and their parallel_workers are (2, 8, 4), with 2
Append workers available. So according to what you say, the total cost
is 9. With per-subplan parallel_workers taken into account, total cost
= (3*2 + 3*8 * 3*4)/2 = 21.

But that case never happens, because the parallel workers for the
append is always at least as large as the number of workers for any
single child.

May be I didn't follow exactly what you suggested. Your logic is not
taking into account number of workers ? I am assuming you are
calculating per-worker total cost here.

2. Next, estimate the cost of the non-partial paths. To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each non-partial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the non-partial plans. Add
this to the result from step 1 to get the total cost.

So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5,
15) , and so the max is 15 ? I surely am misinterpreting this.

No. If you have costs 8, 5, and 2 and only one process, cost is 15.
If you have two processes then for costing purposes you assume worker
1 will execute the first path (cost 8) and worker 2 will execute the
other two (cost 5 + 2 = 7), so the total cost is 8. If you have three
workers, the cost will still be 8, because there's no way to finish
the cost-8 path in less than 8 units of work.

- In get_append_num_workers, instead of the complicated formula with
log() and 0.693, just add the list lengths and call fls() on the
result. Integer arithmetic FTW!

Yeah fls() could be used. BTW I just found that costsize.c already has
this defined in the same way I did:
#define LOG2(x) (log(x) / 0.693147180559945)
May be we need to shift this to some common header file.

LOG2() would make sense if you're working with a value represented as
a double, but if you have an integer input, I think fls() is better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#57)

1 attachment(s)

Re: Parallel Append implementation

Attached is the updated patch that handles the changes for all the
comments except the cost changes part. Details about the specific
changes are after the cost-related points discussed below.

I wanted to take into account persubpath parallel_workers for total
cost of Append. Suppose the partial subpaths have per worker total
costs (3, 3, 3) and their parallel_workers are (2, 8, 4), with 2
Append workers available. So according to what you say, the total cost
is 9. With persubplan parallel_workers taken into account, total cost
= (3*2 + 3*8 * 3*4)/2 = 21.

But that case never happens, because the parallel workers for the
append is always at least as large as the number of workers for any
single child.

Yeah, that's right. I will use this approach for partial paths.

For non-partial paths, I was checking following 3 options :

Option 1. Just take the sum of total non-partial child costs and
divide it by number of workers. It seems to be getting close to the
actual cost.

Option 2. Calculate exact cost by an algorithm which I mentioned
before, which is pasted below for reference :
Persubpath cost : 20 16 10 8 3 1, with 3 workers.
After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
times remaining are :
10 6 0 8 3 1
After 6 units (minimum of 10, 06, 08), the times remaining are :
4 0 0 2 3 1
After 2 units (minimum of 4, 2, 3), the times remaining are :
2 0 0 0 1 1
After 1 units (minimum of 2, 1, 1), the times remaining are :
1 0 0 0 0 0
After 1 units (minimum of 1, 0 , 0), the times remaining are :
0 0 0 0 0 0
Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

Option 3. Get some approximation formula like you suggested. I am also
looking for such formula, just that some things are not clear to me.
The discussion of the same is below ...

2. Next, estimate the cost of the nonpartial paths. To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each nonpartial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the nonpartial plans. Add
this to the result from step 1 to get the total cost.

So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5,
15) , and so the max is 15 ? I surely am misinterpreting this.

No. If you have costs 8, 5, and 2 and only one process, cost is 15.
If you have two processes then for costing purposes you assume worker
1 will execute the first path (cost 8) and worker 2 will execute the
other two (cost 5 + 2 = 7), so the total cost is 8. If you have three
workers, the cost will still be 8, because there's no way to finish
the cost8 path in less than 8 units of work.

So the part that you suggested about adding up total cost in turn to
the smallest cost; this suggestion applies to only 1 worker right ?
For more than worker, are you suggesting to use some algorithm similar
to the one I suggested in option 2 above ? If yes, it would be great
if you again describe how that works for multiple workers. Or is it
that you were suggesting some simple approximate arithmetic that
applies to multiple workers ?
Like I mentioned, I will be happy to get such simple approximation
arithmetic that can be applied for multiple worker case. The one logic
I suggested in option 2 is something we can keep as the last option.
And option 1 is also an approximation but we would like to have a
better approximation. So wanted to clear my queries regarding option
3.

----------

Details about all the remaining changes in updated patch are below ...

On 20 March 2017 at 17:29, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 17, 2017 at 1:12 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

- The substantive changes in add_paths_to_append_rel don't look right
either. It's not clear why accumulate_partialappend_subpath is
getting called even in the non-enable_parallelappend case. I don't
think the logic for the case where we're not generating a parallel
append path needs to change at all.

When accumulate_partialappend_subpath() is called for a childrel with
a partial path, it works just like accumulate_append_subpath() when
enable_parallelappend is false. That's why, for partial child path,
the same function is called irrespective of parallel-append or
non-parallel-append case. May be mentioning this in comments should
suffice here ?

I don't get it. If you can get the same effect by changing something
or not changing it, presumably it'd be better to not change it. We
try not to change things just because we can; the change should be an
improvement in some way.

- When parallel append is enabled, I think add_paths_to_append_rel
should still consider all the same paths that it does today, plus one
extra. The new path is a parallel append path where each subpath is
the cheapest subpath for that childrel, whether partial or
non-partial. If !enable_parallelappend, or if all of the cheapest
subpaths are partial, then skip this. (If all the cheapest subpaths
are non-partial, it's still potentially useful.)

In case of all-partial childrels, the paths are *exactly* same as
those that would have been created for enable_parallelappend=off. The
extra path is there for enable_parallelappend=on only when one or more
of the child rels do not have partial paths. Does this make sense ?

No, I don't think so. Imagine that we have three children, A, B, and
C. The cheapest partial paths have costs of 10,000 each. A, however,
has a non-partial path with a cost of 1,000. Even though A has a
partial path, we still want to consider a parallel append using the
non-partial path because it figures to be hugely faster.

Right. Now that we want to consider both cheapest partial and cheapest
non-partial path, I now get what you were saying about having an extra
path for parallel_append. I have done all of the above changes. Now we
have an extra path for enable_parallelappend=true, besides the
non-parallel partial append path.

- You've added a GUC (which is good) but not documented it (which is
bad) or added it to postgresql.conf.sample (also bad).

Done.

- You've used a loop inside a spinlock-protected critical section,
which is against project policy. Use an LWLock; define and document a
new builtin tranche ID.

Done. Used LWlock for the parallel append synchronization. But I am
not sure what does "document the new builtin trancheID" mean. Didn't
find a readme which documents tranche ids.

For setting pa_finished=true when a partial plan finished, earlier it
was using Spinlock. Now it does not use any synchronization. It was
actually earlier using it because there was another field num_workers,
but it is not needed since there is no num_workers. I was considering
whether to use atomic read and write API in atomics.c for pa_finished.
But from what I understand, just a plain read/write is already atomic.
We require them only if there are some compound operations like
increment, exchange, etc.

- The comment for pa_finished claims that it is the number of workers
executing the subplan, but it's a bool, not a count; I think this
comment is just out of date.

Done.

- paths_insert_sorted_by_cost() is a hand-coded insertion sort. Can't
we find a way to use qsort() for this instead of hand-coding a slower
algorithm? I think we could just create an array of the right length,
stick each path into it from add_paths_to_append_rel, and then qsort()
the array based on <is-partial, total-cost>. Then the result can be
turned into a list.

Now added a new function list.c list_qsort() so that it can be used in
the future.

- Maybe the new helper functions in nodeAppend.c could get names
starting with exec_append_, to match the style of
exec_append_initialize_next().

Done.

- There's a superfluous whitespace change in add_paths_to_append_rel.

Didn't find exactly which, but I guess the attached latest patch does
not have it.

- In get_append_num_workers, instead of the complicated formula with
log() and 0.693, just add the list lengths and call fls() on the
result. Integer arithmetic FTW!

Yeah fls() could be used. BTW I just found that costsize.c already has
this defined in the same way I did:
#define LOG2(x) (log(x) / 0.693147180559945)
May be we need to shift this to some common header file.

LOG2() would make sense if you're working with a value represented as
a double, but if you have an integer input, I think fls() is better.

Used fls() now.

Attachments:

ParallelAppend_v8.patchapplication/octet-stream; name=ParallelAppend_v8.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b379b67..a8e3737 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3618,6 +3618,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 86db73b..2ba9472 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -215,6 +216,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -279,6 +284,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -782,6 +791,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..1ffa803 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,10 +59,48 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
 
+static bool exec_append_initialize_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_goto_next_plan(int curplan, int first_plan,
+									  int last_plan);
+
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
  *
@@ -77,6 +115,27 @@ exec_append_initialize_next(AppendState *appendstate)
 	int			whichplan;
 
 	/*
+	 * In case it's parallel-aware, follow it's own logic of choosing the next
+	 * subplan.
+	 */
+	if (appendstate->as_padesc)
+	{
+		/* Backward scan is not supported by parallel-aware plans */
+		Assert(ScanDirectionIsForward(appendstate->ps.state->es_direction));
+
+		return exec_append_parallel_next(appendstate);
+	}
+
+	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +241,9 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * In case it's a sequential Append, initialize to scan first subplan.
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -205,6 +263,14 @@ ExecAppend(AppendState *node)
 		TupleTableSlot *result;
 
 		/*
+		 * Check if we are already finished plans from parallel append. This
+		 * can happen if all the subplans are finished when this worker
+		 * has not even started returning tuples.
+		 */
+		if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
+		/*
 		 * figure out which subplan we are currently processing
 		 */
 		subnode = node->appendplans[node->as_whichplan];
@@ -225,14 +291,18 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * We are done with this subplan. There might be other workers still
+		 * processing the last chunk of rows for this same subplan, but there's
+		 * no point for new workers to run this subplan, so mark this subplan
+		 * as finished.
+		 */
+		if (node->as_padesc)
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+		/*
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
 		if (!exec_append_initialize_next(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
 
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,264 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	/*
+	 * Just setting all the number of workers to 0 is enough. The logic
+	 * of choosing the next plan in workers will take care of everything
+	 * else.
+	 */
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+	node->as_padesc = padesc;
+
+	/* Choose the optimal subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the optimal subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_goto_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	/* If we didn't find any plan to execute, stop executing. */
+	if (!found)
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_goto_next_plan(whichplan,
+														  padesc->pa_first_plan,
+														  state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_goto_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ * ----------------------------------------------------------------
+ */
+static int exec_append_goto_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 67c7de6..873c955 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -239,6 +239,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f09aa24..d5e3ca7 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1b9005f..7b22ca5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -372,6 +372,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 474f221..44da33a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1568,6 +1568,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a1e1a87..6611e45 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1264,7 +1267,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1300,7 +1307,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
@@ -1378,23 +1443,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1404,9 +1485,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1459,7 +1540,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1677,6 +1759,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1696,7 +1850,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a129d1e..e8df075 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1704,6 +1705,98 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int		parallel_divisor;
+		Cost	highest_nonpartial_cost = 0;
+		int		worker;
+
+		/*
+		 * Make a note of the cost of first non-partial subpath, i.e. the first
+		 * one in the list, if at all there are any non-partial subpaths.
+		 */
+		if (num_nonpartial_subpaths > 0)
+			highest_nonpartial_cost = ((Path *) linitial(subpaths))->total_cost;
+
+		worker = 1;
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append. We don't consider non-partial paths separately. The
+			 * parallel_divisor for non-partial paths is 1, and so overall we
+			 * get a good approximation of per-worker cost.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (worker <= path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+				worker++;
+			}
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+		/*
+		 * No matter how fast the partial plans finish, the Append path is
+		 * going to take at least the time needed for the costliest non-partial
+		 * path to finish. This is actually an approximation. We can even
+		 * consider all the other non-partial plans and how workers would get
+		 * scheduled to determine the cost of finishing the non-partial paths.
+		 * But we anyway can't calculate the total cost exactly, especially
+		 * because we can't determine exactly when some of the workers would
+		 * start executing partial plans.
+		 */
+		path->total_cost = Max(highest_nonpartial_cost, path->total_cost);
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a0c67b..6e39fc1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c80c999..c517900 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5163,7 +5165,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5174,6 +5176,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index cbdea1f..9dd4ef3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3383,8 +3383,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d88738e..4069855 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,8 +566,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -678,7 +678,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fca96eb..9f962e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,69 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1264,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,44 +1278,51 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->partitioned_rels = partitioned_rels;
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..36b8750 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4feb26a..4f21c2e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a02b154..5383509 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -288,6 +288,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 90e84bc..8350220 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -248,6 +248,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4a95e16..1950192 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -235,6 +235,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1c88a79..70ccdbf 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1112,10 +1112,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1123,6 +1127,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..43dc72f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81640de..2203ab4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..802a380 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6163ed8..49d232f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index d43b75c..2270c53 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#59

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#58)

Re: Parallel Append implementation

On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Attached is the updated patch that handles the changes for all the
comments except the cost changes part. Details about the specific
changes are after the cost-related points discussed below.

For non-partial paths, I was checking following 3 options :

Option 1. Just take the sum of total non-partial child costs and
divide it by number of workers. It seems to be getting close to the
actual cost.

If the costs for all children are about equal, then that works fine.
But when they are very unequal, then it's highly misleading.

Option 2. Calculate exact cost by an algorithm which I mentioned
before, which is pasted below for reference :
Persubpath cost : 20 16 10 8 3 1, with 3 workers.
After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
times remaining are :
10 6 0 8 3 1
After 6 units (minimum of 10, 06, 08), the times remaining are :
4 0 0 2 3 1
After 2 units (minimum of 4, 2, 3), the times remaining are :
2 0 0 0 1 1
After 1 units (minimum of 2, 1, 1), the times remaining are :
1 0 0 0 0 0
After 1 units (minimum of 1, 0 , 0), the times remaining are :
0 0 0 0 0 0
Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

This gives the same answer as what I was proposing, but I believe it's
more complicated to compute. The way my proposal would work in this
case is that we would start with an array C[3] (since there are three
workers], with all entries 0. Logically C[i] represents the amount of
work to be performed by worker i. We add each path in turn to the
worker whose array entry is currently smallest; in the case of a tie,
just pick the first such entry.

So in your example we do this:

C[0] += 20;
C[1] += 16;
C[2] += 10;
/* C[2] is smaller than C[0] or C[1] at this point, so we add the next
path to C[2] */
C[2] += 8;
/* after the previous line, C[1] is now the smallest, so add to that
entry next */
C[1] += 3;
/* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
C[2] += 1;
/* final result: C[0] = 20, C[1] = 19, C[2] = 19 */

Now we just take the highest entry that appears in any array, which in
this case is C[0], as the total cost.

Comments on this latest version:

In my previous review, I said that you should "define and document a
new builtin tranche ID"; you did the first but not the second. See
the table in monitoring.sgml.

Definition of exec_append_goto_next_plan should have a line break
after the return type, per usual PostgreSQL style rules.

-     * initialize to scan first subplan
+     * In case it's a sequential Append, initialize to scan first subplan.

This comment is confusing because the code is executed whether it's
parallel or not. I think it might be better to write something like
"initialize to scan first subplan (but note that we'll override this
later in the case of a parallel append)"

         /*
+         * Check if we are already finished plans from parallel append. This
+         * can happen if all the subplans are finished when this worker
+         * has not even started returning tuples.
+         */
+        if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+            return ExecClearTuple(node->ps.ps_ResultTupleSlot);

There seems to be no reason why this couldn't be hoisted out of the
loop. Actually, I think Ashutosh pointed this out before, but I
didn't understand at that time what his point was. Looking back, I
see that he also pointed out that the as_padesc test isn't necessary,
which is also true.

+        if (node->as_padesc)
+            node->as_padesc->pa_finished[node->as_whichplan] = true;

I think you should move this logic inside exec_append_parallel_next.
That would avoid testing node->pa_desc an extra time for non-parallel
append. I note that the comment doesn't explain why it's safe to do
this without taking the lock. I think we could consider doing it with
the lock held, but it probably is safe, because we're only setting it
from false to true. If someone else does the same thing, that won't
hurt anything, and if someone else fails to see our update, then the
worst-case scenario is that they'll try to execute a plan that has no
chance of returning any more rows. That's not so bad. Actually,
looking further, you do have a comment explaining that, but it's in
exec_append_parallel_next() where the value is used, rather than here.

+    memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+    shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+    node->as_padesc = padesc;

Putting the shm_toc_insert call after we fully initialize the
structure seems better than putting it after we've done some of the
initialization and before we've done the rest.

+ /* Choose the optimal subplan to be executed. */

I think the word "first" would be more accurate than "optimal". We
can only hope to pick the optimal one, but whichever one we pick is
definitely the one we're executing first!

I think the loop in exec_append_parallel_next() is a bit confusing.
It has three exit conditions, one checked at the top of the loop and
two other ways to exit via break statements. Sometimes it exits
because whichplan == PA_INVALID_PLAN was set by
exec_append_goto_next_plan(), and other times it exits because
whichplan == initial_plan and then it sets whichplan ==
PA_INVALID_PLAN itself. I feel like this whole function could be
written more simply somehow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Robert Haas (#59)

1 attachment(s)

Re: Parallel Append implementation

On 23 March 2017 at 05:55, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Attached is the updated patch that handles the changes for all the
comments except the cost changes part. Details about the specific
changes are after the cost-related points discussed below.

For non-partial paths, I was checking following 3 options :

Option 1. Just take the sum of total non-partial child costs and
divide it by number of workers. It seems to be getting close to the
actual cost.

If the costs for all children are about equal, then that works fine.
But when they are very unequal, then it's highly misleading.

Option 2. Calculate exact cost by an algorithm which I mentioned
before, which is pasted below for reference :
Persubpath cost : 20 16 10 8 3 1, with 3 workers.
After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
times remaining are :
10 6 0 8 3 1
After 6 units (minimum of 10, 06, 08), the times remaining are :
4 0 0 2 3 1
After 2 units (minimum of 4, 2, 3), the times remaining are :
2 0 0 0 1 1
After 1 units (minimum of 2, 1, 1), the times remaining are :
1 0 0 0 0 0
After 1 units (minimum of 1, 0 , 0), the times remaining are :
0 0 0 0 0 0
Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

This gives the same answer as what I was proposing

Ah I see.

but I believe it's more complicated to compute.

Yes a bit, particularly because in my algorithm, I would have to do
'n' subtractions each time, in case of 'n' workers. But it looked more
natural because it follows exactly the way we manually calculate.

The way my proposal would work in this
case is that we would start with an array C[3] (since there are three
workers], with all entries 0. Logically C[i] represents the amount of
work to be performed by worker i. We add each path in turn to the
worker whose array entry is currently smallest; in the case of a tie,
just pick the first such entry.

So in your example we do this:

C[0] += 20;
C[1] += 16;
C[2] += 10;
/* C[2] is smaller than C[0] or C[1] at this point, so we add the next
path to C[2] */
C[2] += 8;
/* after the previous line, C[1] is now the smallest, so add to that
entry next */
C[1] += 3;
/* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
C[2] += 1;
/* final result: C[0] = 20, C[1] = 19, C[2] = 19 */

Now we just take the highest entry that appears in any array, which in
this case is C[0], as the total cost.

Wow. The way your final result exactly tallies with my algorithm
result is very interesting. This looks like some maths or computer
science theory that I am not aware.

I am currently coding the algorithm using your method. Meanwhile
attached is a patch that takes care of your other comments, details of
which are below...

In my previous review, I said that you should "define and document a
new builtin tranche ID"; you did the first but not the second. See
the table in monitoring.sgml.

Yeah, I tried to search how TBM did in the source, but I guess I
didn't correctly run "git grep" commands, so the results did not have
monitoring.sgml, so I thought may be you mean something else by
"document".

Added changes in monitoring.sgml now.

Definition of exec_append_goto_next_plan should have a line break
after the return type, per usual PostgreSQL style rules.

Oops. Done.

-     * initialize to scan first subplan
+     * In case it's a sequential Append, initialize to scan first subplan.
This comment is confusing because the code is executed whether it's
parallel or not. I think it might be better to write something like
"initialize to scan first subplan (but note that we'll override this
later in the case of a parallel append)"

Done.

/*
+         * Check if we are already finished plans from parallel append. This
+         * can happen if all the subplans are finished when this worker
+         * has not even started returning tuples.
+         */
+        if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+            return ExecClearTuple(node->ps.ps_ResultTupleSlot);
There seems to be no reason why this couldn't be hoisted out of the
loop. Actually, I think Ashutosh pointed this out before, but I
didn't understand at that time what his point was. Looking back, I
see that he also pointed out that the as_padesc test isn't necessary,
which is also true.

I am assuming both yours and Ashutosh's concern is that this check
will be executed for *each* tuple returned, and which needs to be
avoided. Actually, just moving it out of the loop is not going to
solve the runs-for-each-tuple issue. It still will execute for each
tuple. But after a thought, now I agree this can be taken out of loop
anyways, but, not for solving the per-tuple issue, but because it need
not be run for each of the iteration of the loop because that loop is
there to go to the next subplan.

When a worker tries to choose a plan to execute at the very beginning
(i.e in ExecAppendInitializeWorker()), it sometimes finds there is no
plan to execute, because all the others have already taken them and
they are already finished or they are all non-partial plans. In short,
for all subplans, pa_finished = true. So as_whichplan has to be
PA_INVALID_PLAN. To get rid of the extra check in ExecAppend(), in
ExecAppendInitializeWorker() if all plans are finished, we can very
well assign as_whichplan to a partial plan which has already finished,
so that ExecAppend() will execute this finished subplan and just
return NULL. But if all plans are non-partial, we cannot do that.

Now, when ExecAppend() is called, there is no way to know whether this
is the first time ExecProcNode() is executed or not. So we have to
keep on checking the node->as_whichplan == PA_INVALID_PLAN condition.

My earlier response to Ashutosh's feedback on this same point are
pasted below, where there are some possible improvements discussed :

One thing we can do is : have a special slot in AppenState>as_plan[]
which has some dummy execution node that just returns NULL tuple, and
initially make as_whichplan point to this slot. But I think it is not
worth doing this.

We can instead reduce the if condition to:
if (node>as_whichplan == PA_INVALID_PLAN)
{
Assert(node>as_padesc != NULL);
return ExecClearTuple(node>ps.ps_ResultTupleSlot);
}
BTW, the loop which you mentioned that returns tuples.... the loop is
not for returning tuples, the loop is for iterating to the next
subplan. Even if we take the condition out and keep it in the
beginning of ExecAppend, the issue will remain.

+        if (node->as_padesc)
+            node->as_padesc->pa_finished[node->as_whichplan] = true;
I think you should move this logic inside exec_append_parallel_next.
That would avoid testing node->pa_desc an extra time for non-parallel
append.

Actually exec_append_parallel_next() is called at other places also,
for which we cannot set pa_finished to true inside
exec_append_parallel_next().

But I have done the changes another way. I have taken
exec_append_parallel_next() out of exec_append_initialize_next(), and
put two different conditional code blocks in ExecAppend(), one which
calls set_finished() followed by exec_append_parallel_next() and the
other calls exec_append_initialize_next() (now renamed to
exec_append_seq_next()

But one thing to note is that this condition is not executed for each
tuple. It is only while going to the next subplan.

I note that the comment doesn't explain why it's safe to do
this without taking the lock. I think we could consider doing it with
the lock held, but it probably is safe, because we're only setting it
from false to true. If someone else does the same thing, that won't
hurt anything, and if someone else fails to see our update, then the
worst-case scenario is that they'll try to execute a plan that has no
chance of returning any more rows. That's not so bad. Actually,
looking further, you do have a comment explaining that, but it's in
exec_append_parallel_next() where the value is used, rather than here.

Yes, right.

+    memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+    shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+    node->as_padesc = padesc;
Putting the shm_toc_insert call after we fully initialize the
structure seems better than putting it after we've done some of the
initialization and before we've done the rest.

Done. Also found out that I was memset()ing only pa_finished[]. Now
there is a memset for the whole ParallelAppendDesc structure.

+ /* Choose the optimal subplan to be executed. */

I think the word "first" would be more accurate than "optimal". We
can only hope to pick the optimal one, but whichever one we pick is
definitely the one we're executing first!

Done.

I think the loop in exec_append_parallel_next() is a bit confusing.
It has three exit conditions, one checked at the top of the loop and
two other ways to exit via break statements. Sometimes it exits
because whichplan == PA_INVALID_PLAN was set by
exec_append_goto_next_plan(), and other times it exits because
whichplan == initial_plan

Yeah, we cannot bring up the (whichplan == initialplan) to the top in
for(;;) because initially whichplan is initialplan, and we have to
execute the loop at least once (unless whichplan = INVALID).
And we cannot bring down the for condition (which != PA_INVALID_PLAN)
because whichplan can be INVALID right at the beginning if
pa_next_plan itself can be PA_INVALID_PLAN.

and then it sets whichplan == PA_INVALID_PLAN itself.

It sets that to PA_INVALID_PLAN only when it does not find any next
plan to execute. This is essential to do that particularly because
initiallly when ExecAppendInitialize[Worker/DSM]() is called, it
cannot set as_whichplan to any valid value.

I feel like this whole function could be written more simply somehow.

Yeah, the main reason it is a bit compilcated is because we are
simulating circular array structure, and that too with an optimization
that we can skip the finished non-partial plans while wrapping around
to the next plan in the circular array. I have tried to add a couple
of more comments.

Also renamed exec_append_goto_next_plan() to
exec_append_get_next_plan() since it is not actually shifting any
counter, it is just returning what is the next counter.

Attachments:

ParallelAppend_v9.patchapplication/octet-stream; name=ParallelAppend_v9.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b379b67..a8e3737 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3618,6 +3618,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcb2d33..49a053a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -832,7 +832,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1092,6 +1092,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 86db73b..2ba9472 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -215,6 +216,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -279,6 +284,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -782,6 +791,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..e9e8676 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
-static bool exec_append_initialize_next(AppendState *appendstate);
 
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -72,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +229,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +246,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -225,16 +280,31 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 67c7de6..873c955 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -239,6 +239,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f09aa24..d5e3ca7 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1b9005f..7b22ca5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -372,6 +372,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 474f221..44da33a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1568,6 +1568,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a1e1a87..6611e45 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1264,7 +1267,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1300,7 +1307,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
@@ -1378,23 +1443,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1404,9 +1485,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1459,7 +1540,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1677,6 +1759,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1696,7 +1850,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a129d1e..e8df075 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -1704,6 +1705,98 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int		parallel_divisor;
+		Cost	highest_nonpartial_cost = 0;
+		int		worker;
+
+		/*
+		 * Make a note of the cost of first non-partial subpath, i.e. the first
+		 * one in the list, if at all there are any non-partial subpaths.
+		 */
+		if (num_nonpartial_subpaths > 0)
+			highest_nonpartial_cost = ((Path *) linitial(subpaths))->total_cost;
+
+		worker = 1;
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append. We don't consider non-partial paths separately. The
+			 * parallel_divisor for non-partial paths is 1, and so overall we
+			 * get a good approximation of per-worker cost.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (worker <= path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+				worker++;
+			}
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+		/*
+		 * No matter how fast the partial plans finish, the Append path is
+		 * going to take at least the time needed for the costliest non-partial
+		 * path to finish. This is actually an approximation. We can even
+		 * consider all the other non-partial plans and how workers would get
+		 * scheduled to determine the cost of finishing the non-partial paths.
+		 * But we anyway can't calculate the total cost exactly, especially
+		 * because we can't determine exactly when some of the workers would
+		 * start executing partial plans.
+		 */
+		path->total_cost = Max(highest_nonpartial_cost, path->total_cost);
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a0c67b..6e39fc1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c80c999..c517900 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5163,7 +5165,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5174,6 +5176,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 68d74cb..1529396 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3383,8 +3383,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d88738e..4069855 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,8 +566,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -678,7 +678,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fca96eb..9f962e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,69 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1264,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,44 +1278,51 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->partitioned_rels = partitioned_rels;
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..36b8750 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4feb26a..4f21c2e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a02b154..5383509 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -288,6 +288,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 90e84bc..8350220 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -248,6 +248,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4a95e16..1950192 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -235,6 +235,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1c88a79..70ccdbf 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1112,10 +1112,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1123,6 +1127,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..43dc72f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81640de..2203ab4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..802a380 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6163ed8..49d232f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index d43b75c..2270c53 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#61

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#60)

1 attachment(s)

Re: Parallel Append implementation

On 23 March 2017 at 16:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 23 March 2017 at 05:55, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar <amitdkhan.pg@gmail.com>

wrote:

Attached is the updated patch that handles the changes for all the
comments except the cost changes part. Details about the specific
changes are after the cost-related points discussed below.

For non-partial paths, I was checking following 3 options :

Option 1. Just take the sum of total non-partial child costs and
divide it by number of workers. It seems to be getting close to the
actual cost.

If the costs for all children are about equal, then that works fine.
But when they are very unequal, then it's highly misleading.

Option 2. Calculate exact cost by an algorithm which I mentioned
before, which is pasted below for reference :
Persubpath cost : 20 16 10 8 3 1, with 3 workers.
After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
times remaining are :
10 6 0 8 3 1
After 6 units (minimum of 10, 06, 08), the times remaining are :
4 0 0 2 3 1
After 2 units (minimum of 4, 2, 3), the times remaining are :
2 0 0 0 1 1
After 1 units (minimum of 2, 1, 1), the times remaining are :
1 0 0 0 0 0
After 1 units (minimum of 1, 0 , 0), the times remaining are :
0 0 0 0 0 0
Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

This gives the same answer as what I was proposing

Ah I see.

but I believe it's more complicated to compute.

Yes a bit, particularly because in my algorithm, I would have to do
'n' subtractions each time, in case of 'n' workers. But it looked more
natural because it follows exactly the way we manually calculate.

The way my proposal would work in this
case is that we would start with an array C[3] (since there are three
workers], with all entries 0. Logically C[i] represents the amount of
work to be performed by worker i. We add each path in turn to the
worker whose array entry is currently smallest; in the case of a tie,
just pick the first such entry.

So in your example we do this:

C[0] += 20;
C[1] += 16;
C[2] += 10;
/* C[2] is smaller than C[0] or C[1] at this point, so we add the next
path to C[2] */
C[2] += 8;
/* after the previous line, C[1] is now the smallest, so add to that
entry next */
C[1] += 3;
/* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
C[2] += 1;
/* final result: C[0] = 20, C[1] = 19, C[2] = 19 */

Now we just take the highest entry that appears in any array, which in
this case is C[0], as the total cost.

Wow. The way your final result exactly tallies with my algorithm
result is very interesting. This looks like some maths or computer
science theory that I am not aware.

I am currently coding the algorithm using your method.

While I was coding this, I was considering if Path->rows also should
be calculated similar to total cost for non-partial subpath and total
cost for partial subpaths. I think for rows, we can just take
total_rows divided by workers for non-partial paths, and this
approximation should suffice. It looks odd that it be treated with the
same algorithm we chose for total cost for non-partial paths.

Meanwhile, attached is a WIP patch v10. The only change in this patch
w.r.t. the last patch (v9) is that this one has a new function defined
append_nonpartial_cost(). Just sending this to show how the algorithm
looks like; haven't yet called it.

Attachments:

ParallelAppend_v10_WIP.patchapplication/octet-stream; name=ParallelAppend_v10_WIP.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de3540..a7aad08 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3643,6 +3643,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e930731..6f51372 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -832,7 +832,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1096,6 +1096,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b91b663..8b0ec2c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -215,6 +216,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -279,6 +284,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -782,6 +791,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..e9e8676 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
-static bool exec_append_initialize_next(AppendState *appendstate);
 
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -72,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +229,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +246,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -225,16 +280,31 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 93d4eb2..60f0b7e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -239,6 +239,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f09aa24..d5e3ca7 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1b9005f..7b22ca5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -372,6 +372,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 474f221..44da33a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1568,6 +1568,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a1e1a87..6611e45 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1264,7 +1267,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1300,7 +1307,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
@@ -1378,23 +1443,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1404,9 +1485,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1459,7 +1540,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1677,6 +1759,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1696,7 +1850,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a129d1e..9eff4b9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,7 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(Path *path, List *subpaths);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1704,6 +1706,161 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *	  subpaths contains only non-partial subpaths.
+ */
+static Cost
+append_nonpartial_cost(Path *path, List *subpaths)
+{
+	Cost	   *costarr;
+	int			len = path->parallel_workers;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			min_index;
+	int			max_index;
+
+	/* Build the cost array out of first 'parallel_workers' elements of subpaths */
+	costarr = (Cost *) palloc(sizeof(Cost) * len);
+	i = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+		if (i == len)
+			break;
+		costarr[i++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = len - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	    *subpath = (Path *) lfirst(l);
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < len; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+
+	for (max_index = i = 0; i < len; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int		parallel_divisor;
+		Cost	highest_nonpartial_cost = 0;
+		int		worker;
+
+		/*
+		 * Make a note of the cost of first non-partial subpath, i.e. the first
+		 * one in the list, if at all there are any non-partial subpaths.
+		 */
+		if (num_nonpartial_subpaths > 0)
+			highest_nonpartial_cost = ((Path *) linitial(subpaths))->total_cost;
+
+		worker = 1;
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append. We don't consider non-partial paths separately. The
+			 * parallel_divisor for non-partial paths is 1, and so overall we
+			 * get a good approximation of per-worker cost.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (worker <= path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+				worker++;
+			}
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+		/*
+		 * No matter how fast the partial plans finish, the Append path is
+		 * going to take at least the time needed for the costliest non-partial
+		 * path to finish. This is actually an approximation. We can even
+		 * consider all the other non-partial plans and how workers would get
+		 * scheduled to determine the cost of finishing the non-partial paths.
+		 * But we anyway can't calculate the total cost exactly, especially
+		 * because we can't determine exactly when some of the workers would
+		 * start executing partial plans.
+		 */
+		path->total_cost = Max(highest_nonpartial_cost, path->total_cost);
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a0c67b..6e39fc1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c80c999..c517900 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5163,7 +5165,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5174,6 +5176,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 68d74cb..1529396 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3383,8 +3383,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d88738e..4069855 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,8 +566,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -678,7 +678,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fca96eb..9f962e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,69 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1264,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,44 +1278,51 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->partitioned_rels = partitioned_rels;
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..36b8750 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 291bf76..3942e8a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a02b154..5383509 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -288,6 +288,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 90e84bc..8350220 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -248,6 +248,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4a95e16..1950192 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -235,6 +235,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1c88a79..70ccdbf 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1112,10 +1112,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1123,6 +1127,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..43dc72f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81640de..2203ab4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..802a380 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6163ed8..49d232f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index d43b75c..2270c53 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#62

Rajkumar Raghuwanshi

rajkumar.raghuwanshi@enterprisedb.com

almost 9 years ago

In reply to: Amit Khandekar (#61)

Re: Parallel Append implementation

On Fri, Mar 24, 2017 at 12:38 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Meanwhile, attached is a WIP patch v10. The only change in this patch
w.r.t. the last patch (v9) is that this one has a new function defined
append_nonpartial_cost(). Just sending this to show how the algorithm
looks like; haven't yet called it.

Hi,

I have given patch on latest pg sources (on commit
457a4448732881b5008f7a3bcca76fc299075ac3). configure and make all
install ran successfully, but initdb failed with below error.

[edb@localhost bin]$ ./initdb -D data
The files belonging to this database system will be owned by user "edb".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data ... ok
creating subdirectories ... ok
selecting default max_connections ... sh: line 1: 3106 Aborted
(core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=100 -c shared_buffers=1000 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 3112 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=50 -c shared_buffers=500 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 3115 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=40 -c shared_buffers=400 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 3118 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=30 -c shared_buffers=300 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 3121 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=20 -c shared_buffers=200 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 3124 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=10 -c shared_buffers=100 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
10
selecting default shared_buffers ... sh: line 1: 3127 Aborted
(core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=10 -c shared_buffers=16384 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... TRAP:
FailedAssertion("!(LWLockTranchesAllocated >=
LWTRANCHE_FIRST_USER_DEFINED)", File: "lwlock.c", Line: 501)
child process was terminated by signal 6: Aborted
initdb: removing data directory "data"

[edb@localhost bin]$

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Rajkumar Raghuwanshi (#62)

1 attachment(s)

Re: Parallel Append implementation

On 24 March 2017 at 13:11, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:

I have given patch on latest pg sources (on commit
457a4448732881b5008f7a3bcca76fc299075ac3). configure and make all
install ran successfully, but initdb failed with below error.

FailedAssertion("!(LWLockTranchesAllocated >=
LWTRANCHE_FIRST_USER_DEFINED)", File: "lwlock.c", Line: 501)

Thanks for reporting, Rajkumar.

With the new PARALLEL_APPEND tranche ID, LWTRANCHE_FIRST_USER_DEFINED
value has crossed the value 64. So we need to increase the initial
size of LWLockTrancheArray from 64 to 128. Attached is the updated
patch.

Attachments:

ParallelAppend_v11.patchapplication/octet-stream; name=ParallelAppend_v11.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de3540..a7aad08 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3643,6 +3643,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e930731..6f51372 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -832,7 +832,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1096,6 +1096,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b91b663..8b0ec2c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -215,6 +216,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -279,6 +284,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -782,6 +791,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..e9e8676 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
-static bool exec_append_initialize_next(AppendState *appendstate);
 
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -72,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +229,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +246,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -225,16 +280,31 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 93d4eb2..60f0b7e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -239,6 +239,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f09aa24..d5e3ca7 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1b9005f..7b22ca5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -372,6 +372,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 474f221..44da33a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1568,6 +1568,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a1e1a87..6611e45 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1264,7 +1267,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1300,7 +1307,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
@@ -1378,23 +1443,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1404,9 +1485,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1459,7 +1540,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1677,6 +1759,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1696,7 +1850,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a129d1e..9eff4b9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,7 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(Path *path, List *subpaths);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1704,6 +1706,161 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *	  subpaths contains only non-partial subpaths.
+ */
+static Cost
+append_nonpartial_cost(Path *path, List *subpaths)
+{
+	Cost	   *costarr;
+	int			len = path->parallel_workers;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			min_index;
+	int			max_index;
+
+	/* Build the cost array out of first 'parallel_workers' elements of subpaths */
+	costarr = (Cost *) palloc(sizeof(Cost) * len);
+	i = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+		if (i == len)
+			break;
+		costarr[i++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = len - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	    *subpath = (Path *) lfirst(l);
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < len; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+
+	for (max_index = i = 0; i < len; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (path->parallel_aware)
+	{
+		int		parallel_divisor;
+		Cost	highest_nonpartial_cost = 0;
+		int		worker;
+
+		/*
+		 * Make a note of the cost of first non-partial subpath, i.e. the first
+		 * one in the list, if at all there are any non-partial subpaths.
+		 */
+		if (num_nonpartial_subpaths > 0)
+			highest_nonpartial_cost = ((Path *) linitial(subpaths))->total_cost;
+
+		worker = 1;
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * The subpath rows and cost is per worker. We need total count
+			 * of each of the subpaths, so that we can determine the total cost
+			 * of Append. We don't consider non-partial paths separately. The
+			 * parallel_divisor for non-partial paths is 1, and so overall we
+			 * get a good approximation of per-worker cost.
+			 */
+			parallel_divisor = get_parallel_divisor(subpath);
+			path->rows += (subpath->rows * parallel_divisor);
+			path->total_cost += (subpath->total_cost * parallel_divisor);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (worker <= path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+				worker++;
+			}
+		}
+
+		/* The row count and cost should represent per-worker figures. */
+		parallel_divisor = get_parallel_divisor(path);
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+		path->total_cost /= parallel_divisor;
+
+		/*
+		 * No matter how fast the partial plans finish, the Append path is
+		 * going to take at least the time needed for the costliest non-partial
+		 * path to finish. This is actually an approximation. We can even
+		 * consider all the other non-partial plans and how workers would get
+		 * scheduled to determine the cost of finishing the non-partial paths.
+		 * But we anyway can't calculate the total cost exactly, especially
+		 * because we can't determine exactly when some of the workers would
+		 * start executing partial plans.
+		 */
+		path->total_cost = Max(highest_nonpartial_cost, path->total_cost);
+	}
+	else
+	{
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+
+			path->total_cost += subpath->total_cost;
+			if (l == list_head(subpaths))	/* first node? */
+				path->startup_cost = subpath->startup_cost;
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a0c67b..6e39fc1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c80c999..c517900 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5163,7 +5165,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5174,6 +5176,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 68d74cb..1529396 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3383,8 +3383,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d88738e..4069855 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,8 +566,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -678,7 +678,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fca96eb..9f962e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,69 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1264,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,44 +1278,51 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->partitioned_rels = partitioned_rels;
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..f8f25e6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 						  LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 291bf76..3942e8a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a02b154..5383509 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -288,6 +288,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 90e84bc..8350220 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -248,6 +248,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4a95e16..1950192 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -235,6 +235,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1c88a79..70ccdbf 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1112,10 +1112,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1123,6 +1127,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..43dc72f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81640de..2203ab4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..802a380 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6163ed8..49d232f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index d43b75c..2270c53 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#64

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#61)

1 attachment(s)

Re: Parallel Append implementation

On 24 March 2017 at 00:38, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 23 March 2017 at 16:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 23 March 2017 at 05:55, Robert Haas <robertmhaas@gmail.com> wrote:

So in your example we do this:

C[0] += 20;
C[1] += 16;
C[2] += 10;
/* C[2] is smaller than C[0] or C[1] at this point, so we add the next
path to C[2] */
C[2] += 8;
/* after the previous line, C[1] is now the smallest, so add to that
entry next */
C[1] += 3;
/* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
C[2] += 1;
/* final result: C[0] = 20, C[1] = 19, C[2] = 19 */

Now we just take the highest entry that appears in any array, which in
this case is C[0], as the total cost.

Wow. The way your final result exactly tallies with my algorithm
result is very interesting. This looks like some maths or computer
science theory that I am not aware.

I am currently coding the algorithm using your method.

While I was coding this, I was considering if Path->rows also should
be calculated similar to total cost for non-partial subpath and total
cost for partial subpaths. I think for rows, we can just take
total_rows divided by workers for non-partial paths, and this
approximation should suffice. It looks odd that it be treated with the
same algorithm we chose for total cost for non-partial paths.

Attached is the patch v12, where Path->rows calculation of non-partial
paths is kept separate from the way total cost is done for non-partial
costs. rows for non-partial paths is calculated as total_rows divided
by workers as approximation. And then rows for partial paths are just
added one by one.

Meanwhile, attached is a WIP patch v10. The only change in this patch
w.r.t. the last patch (v9) is that this one has a new function defined
append_nonpartial_cost(). Just sending this to show how the algorithm
looks like; haven't yet called it.

Now append_nonpartial_cost() is used, and it is tested.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v12.patchapplication/octet-stream; name=ParallelAppend_v12.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de3540..a7aad08 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3643,6 +3643,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e930731..6f51372 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -832,7 +832,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1096,6 +1096,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b91b663..8b0ec2c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -215,6 +216,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -279,6 +284,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -782,6 +791,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..e9e8676 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
-static bool exec_append_initialize_next(AppendState *appendstate);
 
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -72,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +229,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +246,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -225,16 +280,31 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 93bda42..f8448be 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -239,6 +239,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f09aa24..d5e3ca7 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1b9005f..7b22ca5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -372,6 +372,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 474f221..44da33a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1568,6 +1568,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a1e1a87..6611e45 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1264,7 +1267,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1300,7 +1307,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
@@ -1378,23 +1443,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1404,9 +1485,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1459,7 +1540,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1677,6 +1759,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1696,7 +1850,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a129d1e..84ab4ce 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1704,6 +1707,190 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ * 
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a0c67b..6e39fc1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c80c999..c517900 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -199,7 +199,8 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1026,7 +1027,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5163,7 +5165,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5174,6 +5176,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 68d74cb..1529396 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3383,8 +3383,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d88738e..4069855 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,8 +566,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -678,7 +678,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fca96eb..9f962e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,69 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8), the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data
+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1264,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,44 +1278,51 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->partitioned_rels = partitioned_rels;
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..f8f25e6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 						  LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 291bf76..3942e8a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a02b154..5383509 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -288,6 +288,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..c822cf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1187,12 +1188,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 90e84bc..8350220 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -248,6 +248,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4a95e16..1950192 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -235,6 +235,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1c88a79..70ccdbf 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1112,10 +1112,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1123,6 +1127,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..43dc72f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -103,6 +104,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81640de..2203ab4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..802a380 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6163ed8..49d232f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index d43b75c..2270c53 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#65

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Amit Khandekar (#64)

Re: Parallel Append implementation

Hi,

On 2017-03-24 21:32:57 +0530, Amit Khandekar wrote:

diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..e9e8676 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@

#include "executor/execdebug.h"
#include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;

+typedef ParallelAppendDescData *ParallelAppendDesc;

Pointer hiding typedefs make this Andres sad.

@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+

Is it actually guaranteed that none of the parallel workers are doing
something at that point?

+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);

It's a bit weird that the leader's case does is so separate, and does
it's own lock acquisition.

+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */

IIRC not all node types are safe against being executed again when
they've previously returned NULL. That's why e.g. nodeMaterial.c
contains the following blurb:
/*
* If necessary, try to fetch another row from the subplan.
*
* Note: the eof_underlying state variable exists to short-circuit further
* subplan calls. It's not optional, unfortunately, because some plan
* node types are not robust about being called again when they've already
* returned NULL.
*/

+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */

Why is that?

+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with workers (2, 8, 8),

That's the per-subplan workers for three subplans? That's not
necessarily clear.

the Append workers should be at least
+	 * 8, whereas the formula gives 2. In this case, it seems better to follow
+	 * the method used for calculating parallel_workers of an unpartitioned
+	 * table : log3(table_size). So we treat the UNION query as if the data

Which "UNION query"?

+	 * belongs to a single unpartitioned table, and then derive its workers. So
+	 * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+	 * workers and b is some logarithmic base such as 2 or 3. It turns out that
+	 * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}

Hm. I'm not really convinced by the logic here. Wouldn't it be better
to try to compute the minimum total cost across all workers for
1..#max_workers for the plans in an iterative manner? I.e. try to map
each of the subplans to 1 (if non-partial) or N workers (partial) using
some fitting algorith (e.g. always choosing the worker(s) that currently
have the least work assigned). I think the current algorithm doesn't
lead to useful #workers for e.g. cases with a lot of non-partial,
high-startup plans - imo a quite reasonable scenario.

I'm afraid this is too late for v10 - do you agree?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Andres Freund (#65)

Re: Parallel Append implementation

On Mon, Apr 3, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:

Hm. I'm not really convinced by the logic here. Wouldn't it be better
to try to compute the minimum total cost across all workers for
1..#max_workers for the plans in an iterative manner? I.e. try to map
each of the subplans to 1 (if non-partial) or N workers (partial) using
some fitting algorith (e.g. always choosing the worker(s) that currently
have the least work assigned). I think the current algorithm doesn't
lead to useful #workers for e.g. cases with a lot of non-partial,
high-startup plans - imo a quite reasonable scenario.

Well, that'd be totally unlike what we do in any other case. We only
generate a Parallel Seq Scan plan for a given table with one # of
workers, and we cost it based on that. We have no way to re-cost it
if we changed our mind later about how many workers to use.
Eventually, we should probably have something like what you're
describing here, but in general, not just for this specific case. One
problem, of course, is to avoid having a larger number of workers
always look better than a smaller number, which with the current
costing model would probably happen a lot.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Robert Haas (#66)

Re: Parallel Append implementation

On 2017-04-03 22:13:18 -0400, Robert Haas wrote:

On Mon, Apr 3, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:

Hm. I'm not really convinced by the logic here. Wouldn't it be better
to try to compute the minimum total cost across all workers for
1..#max_workers for the plans in an iterative manner? I.e. try to map
each of the subplans to 1 (if non-partial) or N workers (partial) using
some fitting algorith (e.g. always choosing the worker(s) that currently
have the least work assigned). I think the current algorithm doesn't
lead to useful #workers for e.g. cases with a lot of non-partial,
high-startup plans - imo a quite reasonable scenario.

Well, that'd be totally unlike what we do in any other case. We only
generate a Parallel Seq Scan plan for a given table with one # of
workers, and we cost it based on that. We have no way to re-cost it
if we changed our mind later about how many workers to use.
Eventually, we should probably have something like what you're
describing here, but in general, not just for this specific case. One
problem, of course, is to avoid having a larger number of workers
always look better than a smaller number, which with the current
costing model would probably happen a lot.

I don't think the parallel seqscan is comparable in complexity with the
parallel append case. Each worker there does the same kind of work, and
if one of them is behind, it'll just do less. But correct sizing will
be more important with parallel-append, because with non-partial
subplans the work is absolutely *not* uniform.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Andres Freund (#67)

Re: Parallel Append implementation

Thanks Andres for your review comments. Will get back with the other
comments, but meanwhile some queries about the below particular
comment ...

On 4 April 2017 at 10:17, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-03 22:13:18 -0400, Robert Haas wrote:

On Mon, Apr 3, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:

Hm. I'm not really convinced by the logic here. Wouldn't it be better
to try to compute the minimum total cost across all workers for
1..#max_workers for the plans in an iterative manner? I.e. try to map
each of the subplans to 1 (if non-partial) or N workers (partial) using
some fitting algorith (e.g. always choosing the worker(s) that currently
have the least work assigned). I think the current algorithm doesn't
lead to useful #workers for e.g. cases with a lot of non-partial,
high-startup plans - imo a quite reasonable scenario.

I think I might have not understood this part exactly. Are you saying
we need to consider per-subplan parallel_workers to calculate total
number of workers for Append ? I also didn't get about non-partial
subplans. Can you please explain how many workers you think should be
expected with , say , 7 subplans out of which 3 are non-partial
subplans ?

Well, that'd be totally unlike what we do in any other case. We only
generate a Parallel Seq Scan plan for a given table with one # of
workers, and we cost it based on that. We have no way to re-cost it
if we changed our mind later about how many workers to use.
Eventually, we should probably have something like what you're
describing here, but in general, not just for this specific case. One
problem, of course, is to avoid having a larger number of workers
always look better than a smaller number, which with the current
costing model would probably happen a lot.

I don't think the parallel seqscan is comparable in complexity with the
parallel append case. Each worker there does the same kind of work, and
if one of them is behind, it'll just do less. But correct sizing will
be more important with parallel-append, because with non-partial
subplans the work is absolutely *not* uniform.

Greetings,

Andres Freund

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Amit Khandekar (#68)

1 attachment(s)

Re: Parallel Append implementation

On 4 April 2017 at 01:47, Andres Freund <andres@anarazel.de> wrote:

+typedef struct ParallelAppendDescData
+{
+     LWLock          pa_lock;                /* mutual exclusion to choose next subplan */
+     int                     pa_first_plan;  /* plan to choose while wrapping around plans */
+     int                     pa_next_plan;   /* next plan to choose by any worker */
+
+     /*
+      * pa_finished : workers currently executing the subplan. A worker which
+      * finishes a subplan should set pa_finished to true, so that no new
+      * worker picks this subplan. For non-partial subplan, a worker which picks
+      * up that subplan should immediately set to true, so as to make sure
+      * there are no more than 1 worker assigned to this subplan.
+      */
+     bool            pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;

+typedef ParallelAppendDescData *ParallelAppendDesc;

Pointer hiding typedefs make this Andres sad.

Yeah .. was trying to be consistent with other parts of code where we
have typedefs for both structure and a pointer to that structure.

@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+
+     if (padesc)
+     {
+             padesc->pa_first_plan = padesc->pa_next_plan = 0;
+             memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+     }
+

Is it actually guaranteed that none of the parallel workers are doing
something at that point?

ExecReScanAppend() would be called by ExecReScanGather().
ExecReScanGather() shuts down all the parallel workers before calling
its child node (i.e. ExecReScanAppend).

+static bool
+exec_append_parallel_next(AppendState *state)
+{
+     ParallelAppendDesc padesc = state->as_padesc;
+     int             whichplan;
+     int             initial_plan;
+     int             first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+     bool    found;
+
+     Assert(padesc != NULL);
+
+     /* Backward scan is not supported by parallel-aware plans */
+     Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+     /* The parallel leader chooses its next subplan differently */
+     if (!IsParallelWorker())
+             return exec_append_leader_next(state);

It's a bit weird that the leader's case does is so separate, and does
it's own lock acquisition.

Since we wanted to prevent it from taking the most expensive
non-partial plans first , thought it would be better to keep its logic
simple and separate, so could not merge it in the main logic code.

+     found = false;
+     for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+     {
+             /*
+              * Ignore plans that are already done processing. These also include
+              * non-partial subplans which have already been taken by a worker.
+              */
+             if (!padesc->pa_finished[whichplan])
+             {
+                     found = true;
+                     break;
+             }
+
+             /*
+              * Note: There is a chance that just after the child plan node is
+              * chosen above, some other worker finishes this node and sets
+              * pa_finished to true. In that case, this worker will go ahead and
+              * call ExecProcNode(child_node), which will return NULL tuple since it
+              * is already finished, and then once again this worker will try to
+              * choose next subplan; but this is ok : it's just an extra
+              * "choose_next_subplan" operation.
+              */
IIRC not all node types are safe against being executed again when
they've previously returned NULL. That's why e.g. nodeMaterial.c
contains the following blurb:
/*
* If necessary, try to fetch another row from the subplan.
*
* Note: the eof_underlying state variable exists to short-circuit further
* subplan calls. It's not optional, unfortunately, because some plan
* node types are not robust about being called again when they've already
* returned NULL.
*/

This scenario is different from the parallel append scenario described
by my comment. A worker sets pa_finished to true only when it itself
gets a NULL tuple for a given subplan. So in
exec_append_parallel_next(), suppose a worker W1 finds a subplan with
pa_finished=false. So it chooses it. Now a different worker W2 sets
this subplan's pa_finished=true because W2 has got a NULL tuple. But
W1 hasn't yet got a NULL tuple. If it had got a NULL tuple earlier, it
would have itself set pa_finished to true, and then it would have
never again chosen this subplan. So effectively, a worker would never
execute the same subplan once that subplan returns NULL.

+     else if (IsA(subpath, MergeAppendPath))
+     {
+             MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+             /*
+              * If at all MergeAppend is partial, all its child plans have to be
+              * partial : we don't currently support a mix of partial and
+              * non-partial MergeAppend subpaths.
+              */

Why is that?

The mix of partial and non-partial subplans is being implemented only
for Append plan. In the future if and when we extend this support for
MergeAppend, then we would need to change this. Till then, we can
assume that if MergeAppend is partial, all it child plans have to be
partial otherwise there woudn't have been a partial MergeAppendPath.

BTW MergeAppendPath currently is itself never partial. So in the
comment it is mentioned "If at all MergeAppend is partial".

+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+     ListCell   *lc;
+     double          log2w;
+     int                     num_workers;
+     int                     max_per_plan_workers;
+
+     /*
+      * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+      * workers for Append path either having high number of children (> 100) or
+      * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+      * Whereas, if the subpaths->parallel_workers is high, this formula is not
+      * suitable, because it does not take into account per-subpath workers.
+      * For e.g., with workers (2, 8, 8),

That's the per-subplan workers for three subplans? That's not
necessarily clear.

Right. Corrected it to : "3 subplans having per-subplan workers such
as (2, 8, 8)"

the Append workers should be at least
+      * 8, whereas the formula gives 2. In this case, it seems better to follow
+      * the method used for calculating parallel_workers of an unpartitioned
+      * table : log3(table_size). So we treat the UNION query as if the data

Which "UNION query"?

Changed it to "partitioned table". The idea is : treat all the data of
a partitioned table as if it belonged to a single non-partitioned
table, and then calculate the workers for such a table. It may not
exactly apply for UNION query because that can involve different
tables and with joins too. So replaced UNION query to partitioned
table.

+      * belongs to a single unpartitioned table, and then derive its workers. So
+      * it will be : logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan
+      * workers and b is some logarithmic base such as 2 or 3. It turns out that
+      * this evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+      * just use the maximum of workers formula. But this formula gives too few
+      * workers when all paths have single worker (meaning they are non-partial)
+      * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+      * workers, whereas this method allocates only 1.
+      * So we use whichever method that gives higher number of workers.
+      */
+
+     /* Get log2(num_subpaths) */
+     log2w = fls(list_length(partial_subpaths) +
+                             list_length(nonpartial_subpaths));
+
+     /* Avoid further calculations if we already crossed max workers limit */
+     if (max_parallel_workers_per_gather <= log2w + 1)
+             return max_parallel_workers_per_gather;
+
+
+     /*
+      * Get the parallel_workers value of the partial subpath having the highest
+      * parallel_workers.
+      */
+     max_per_plan_workers = 1;
+     foreach(lc, partial_subpaths)
+     {
+             Path       *subpath = lfirst(lc);
+             max_per_plan_workers = Max(max_per_plan_workers,
+                                                                subpath->parallel_workers);
+     }
+
+     /* Choose the higher of the results of the two formulae */
+     num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+     /* In no case use more than max_parallel_workers_per_gather workers. */
+     num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+     return num_workers;
+}

Have responded in a separate reply.

I'm afraid this is too late for v10 - do you agree?

I am not exactly sure; may be it depends upon how much more review
comments would follow this week. I anticipate there would not be any
high level/design-level changes now.

Attached is an updated patch v13 that has some comments changed as per
your review, and also rebased on latest master.

Attachments:

ParallelAppend_v13.patchapplication/octet-stream; name=ParallelAppend_v13.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ac339fb..59d24c0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3643,6 +3643,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9856968..f4c78e4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -843,7 +843,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1107,6 +1107,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 469a32c..cc8422c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -215,6 +216,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -279,6 +284,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -782,6 +791,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a107545..e9e8676 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
-static bool exec_append_initialize_next(AppendState *appendstate);
 
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -72,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +229,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +246,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -225,16 +280,31 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 61bc502..e939dda 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -239,6 +239,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f09aa24..d5e3ca7 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 83fb39f..b60bc16 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -372,6 +372,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 766f2d8..cb246f2 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1575,6 +1575,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b93b4fc..67f7c89 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1280,7 +1283,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1316,7 +1323,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 									   linitial(childrel->partial_pathlist));
@@ -1394,23 +1459,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1420,9 +1501,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1475,7 +1556,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1693,6 +1775,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1712,7 +1866,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ed07e2f..4179145 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,190 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ * 
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a0c67b..6e39fc1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index b121f40..b1f9dc6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 			 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1038,7 +1039,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5245,7 +5247,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5256,6 +5258,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 17cd683..85a1110 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3587,8 +3587,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b5cb4de..8f18841 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,8 +566,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -678,7 +678,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8536212..a589c22 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1265,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,44 +1279,51 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->partitioned_rels = partitioned_rels;
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..f8f25e6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 						  LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8b5f064..580d8e0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -912,6 +912,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8a93bdc..141bd92 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -288,6 +288,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 6fb4662..e76027f 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif   /* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fa99244..58fc0ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -944,12 +945,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 90e84bc..8350220 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -248,6 +248,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a2dd26f..f481532 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -235,6 +235,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index fc53eb1..1e64c1c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1151,10 +1151,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1162,6 +1166,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6909359..d71463d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 2e712c6..04f1f32 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..802a380 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 6163ed8..49d232f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 038a62e..6ffe23d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index d43b75c..2270c53 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 9311a77..0623319 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#70

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Andres Freund (#67)

Re: Parallel Append implementation

On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund <andres@anarazel.de> wrote:

I don't think the parallel seqscan is comparable in complexity with the
parallel append case. Each worker there does the same kind of work, and
if one of them is behind, it'll just do less. But correct sizing will
be more important with parallel-append, because with non-partial
subplans the work is absolutely *not* uniform.

Sure, that's a problem, but I think it's still absolutely necessary to
ramp up the maximum "effort" (in terms of number of workers)
logarithmically. If you just do it by costing, the winning number of
workers will always be the largest number that we think we'll be able
to put to use - e.g. with 100 branches of relatively equal cost we'll
pick 100 workers. That's not remotely sane.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Andres Freund (#65)

Re: Parallel Append implementation

On Mon, Apr 3, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:

I'm afraid this is too late for v10 - do you agree?

Yeah, I think so. The benefit of this will be a lot more once we get
partitionwise join and partitionwise aggregate working, but that
probably won't happen for this release, or at best in limited cases.
And while we might not agree on exactly what work this patch still
needs, I think it does still need some work. I've moved this to the
next CommitFest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Robert Haas (#70)

Re: Parallel Append implementation

On 2017-04-04 08:01:32 -0400, Robert Haas wrote:

On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund <andres@anarazel.de> wrote:

I don't think the parallel seqscan is comparable in complexity with the
parallel append case. Each worker there does the same kind of work, and
if one of them is behind, it'll just do less. But correct sizing will
be more important with parallel-append, because with non-partial
subplans the work is absolutely *not* uniform.

Sure, that's a problem, but I think it's still absolutely necessary to
ramp up the maximum "effort" (in terms of number of workers)
logarithmically. If you just do it by costing, the winning number of
workers will always be the largest number that we think we'll be able
to put to use - e.g. with 100 branches of relatively equal cost we'll
pick 100 workers. That's not remotely sane.

I'm quite unconvinced that just throwing a log() in there is the best
way to combat that. Modeling the issue of starting more workers through
tuple transfer, locking, startup overhead costing seems a better to me.

If the goal is to compute the results of the query as fast as possible,
and to not use more than max_parallel_per_XXX, and it's actually
beneficial to use more workers, then we should. Because otherwise you
really can't use the resources available.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#72)

Re: Parallel Append implementation

On Wed, Apr 5, 2017 at 1:43 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-04 08:01:32 -0400, Robert Haas wrote:

On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund <andres@anarazel.de>

wrote:

I don't think the parallel seqscan is comparable in complexity with the
parallel append case. Each worker there does the same kind of work,

and

if one of them is behind, it'll just do less. But correct sizing will
be more important with parallel-append, because with non-partial
subplans the work is absolutely *not* uniform.

Sure, that's a problem, but I think it's still absolutely necessary to
ramp up the maximum "effort" (in terms of number of workers)
logarithmically. If you just do it by costing, the winning number of
workers will always be the largest number that we think we'll be able
to put to use - e.g. with 100 branches of relatively equal cost we'll
pick 100 workers. That's not remotely sane.

I'm quite unconvinced that just throwing a log() in there is the best
way to combat that. Modeling the issue of starting more workers through
tuple transfer, locking, startup overhead costing seems a better to me.

If the goal is to compute the results of the query as fast as possible,
and to not use more than max_parallel_per_XXX, and it's actually
beneficial to use more workers, then we should. Because otherwise you
really can't use the resources available.

+1. I had expressed similar opinion earlier, but yours is better
articulated. Thanks.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#74

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Andres Freund (#72)

Re: Parallel Append implementation

On 5 April 2017 at 01:43, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-04 08:01:32 -0400, Robert Haas wrote:

On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund <andres@anarazel.de> wrote:

I don't think the parallel seqscan is comparable in complexity with the
parallel append case. Each worker there does the same kind of work, and
if one of them is behind, it'll just do less. But correct sizing will
be more important with parallel-append, because with non-partial
subplans the work is absolutely *not* uniform.

Sure, that's a problem, but I think it's still absolutely necessary to
ramp up the maximum "effort" (in terms of number of workers)
logarithmically. If you just do it by costing, the winning number of
workers will always be the largest number that we think we'll be able
to put to use - e.g. with 100 branches of relatively equal cost we'll
pick 100 workers. That's not remotely sane.

I'm quite unconvinced that just throwing a log() in there is the best
way to combat that. Modeling the issue of starting more workers through
tuple transfer, locking, startup overhead costing seems a better to me.

If the goal is to compute the results of the query as fast as possible,
and to not use more than max_parallel_per_XXX, and it's actually
beneficial to use more workers, then we should. Because otherwise you
really can't use the resources available.

- Andres

This is what the earlier versions of my patch had done : just add up
per-subplan parallel_workers (1 for non-partial subplan and
subpath->parallel_workers for partial subplans) and set this total as
the Append parallel_workers.

Robert had a valid point that this would be inconsistent with the
worker count that we would come up with if it were a single table with
a cost as big as the total cost of all Append subplans. We were
discussing rather about partitioned table versus if it were
unpartitioned, but I think the same argument goes for a union query
with non-partial plans : if we want to clamp down the number of
workers for a single table for a good reason, we should then also
follow that policy and prevent assigning too many workers even for an
Append.

Now I am not sure of the reason why for a single table parallel scan,
we increase number of workers logarithmically; but I think there might
have been an observation that after certain number of workers, adding
up more workers does not make significant difference, but this is just
my guess.

If we try to calculate workers based on each of the subplan costs
rather than just the number of workers, still I think the total worker
count should be a *log* of the total cost, so as to be consistent with
what we did for other scans. Now log(total_cost) does not increase
significantly with cost. For cost of 1000 units, the log3(cost) will
be 6, and for cost of 10,000 units, it is 8, i.e. just 2 more workers.
So I think since its a logarithmic value, it would be might as well
better to just drop the cost factor, and consider only number of
workers.

But again, in the future if we drop the method of log(), then the
above is not valid. But I think till then we should follow some common
strategy we have been following.

BTW all of the above points apply only for non-partial plans. For
partial plans, what we have done in the patch is : Take the highest of
the per-subplan parallel_workers, and make sure that Append workers is
at least as high as this value.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Andres Freund (#72)

Re: Parallel Append implementation

On Tue, Apr 4, 2017 at 4:13 PM, Andres Freund <andres@anarazel.de> wrote:

I'm quite unconvinced that just throwing a log() in there is the best
way to combat that. Modeling the issue of starting more workers through
tuple transfer, locking, startup overhead costing seems a better to me.

Knock yourself out. There's no doubt that the way the number of
parallel workers is computed is pretty stupid right now, and it
obviously needs to get a lot smarter before we can consider doing
things like throwing 40 workers at a query. If you throw 2 or 4
workers at a query and it turns out that it doesn't help much, that's
sad, but if you throw 40 workers at a query and it turns out that it
doesn't help much, or even regresses, that's a lot sadder. The
existing system does try to model startup and tuple transfer overhead
during costing, but only as a way of comparing parallel plans to each
other or to non-parallel plans, not to work out the right number of
workers. It also does not model contention, which it absolutely needs
to do. I was kind of hoping that once the first version of parallel
query was committed, other developers who care about the query planner
would be motivated to improve some of this stuff, but so far that
hasn't really happened. This release adds a decent number of new
execution capabilities, and there is a lot more work to be done there,
but without some serious work on the planner end of things I fear
we're never going to be able to get more than ~4x speedup out of
parallel query, because we're just too dumb to know how many workers
we really ought to be using.

That having been said, I completely and emphatically disagree that
this patch ought to be required to be an order-of-magnitude smarter
than the existing logic in order to get committed. There are four
main things that this patch can hope to accomplish:

1. If we've got an Append node with children that have a non-zero
startup cost, it is currently pretty much guaranteed that every worker
will pay the startup cost for every child. With Parallel Append, we
can spread out the workers across the plans, and once a plan has been
finished by however many workers it got, other workers can ignore it,
which means that its startup cost need not be paid by those workers.
This case will arise a lot more frequently once we have partition-wise
join.

2. When the Append node's children are partial plans, spreading out
the workers reduces contention for whatever locks those workers use to
coordinate access to shared data.

3. If the Append node represents a scan of a partitioned table, and
the partitions are on different tablespaces (or there's just enough
I/O bandwidth available in a single tablespace to read more than one
of them at once without slowing things down), then spreading out the
work gives us I/O parallelism. This is an area where some
experimentation and benchmarking is needed, because there is a
possibility of regressions if we run several sequential scans on the
same spindle in parallel instead of consecutively. We might need to
add some logic to try to avoid this, but it's not clear how that logic
should work.

4. If the Append node is derived from a UNION ALL query, we can run
different branches in different processes even if the plans are not
themselves able to be parallelized. This was proposed by Stephen
among others as an "easy" case for parallelism, which was maybe a tad
optimistic, but it's sad that we're going to release v10 without
having done anything about it.

All of those things (except possibly #3) are wins over the status quo
even if the way we choose the number of workers is still pretty dumb.
It shouldn't get away with being dumber than what we've already got,
but it shouldn't be radically smarter - or even just radically
different because, if it is, then the results you get when you query a
partitioned table will be very different from what you get when you
query an partitioned table, which is not sensible. I very much agree
that doing something smarter than log-scaling on the number of workers
is an a good project for somebody to do, but it's not *this* project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Amit Khandekar (#74)

Re: Parallel Append implementation

On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:

This is what the earlier versions of my patch had done : just add up
per-subplan parallel_workers (1 for non-partial subplan and
subpath->parallel_workers for partial subplans) and set this total as
the Append parallel_workers.

I don't think that's great, consider e.g. the case that you have one
very expensive query, and a bunch of cheaper ones. Most of those workers
wouldn't do much while waiting for the the expensive query. What I'm
basically thinking we should do is something like the following
pythonesque pseudocode:

best_nonpartial_cost = -1
best_nonpartial_nworkers = -1

for numworkers in 1...#max workers:
worker_work = [0 for x in range(0, numworkers)]

nonpartial_cost += startup_cost * numworkers

# distribute all nonpartial tasks over workers. Assign tasks to the
# worker with the least amount of work already performed.
for task in all_nonpartial_subqueries:
least_busy_worker = worker_work.smallest()
least_busy_worker += task.total_nonpartial_cost

# the nonpartial cost here is the largest amount any single worker
# has to perform.
nonpartial_cost += worker_work.largest()

total_partial_cost = 0
for task in all_partial_subqueries:
total_partial_cost += task.total_nonpartial_cost

# Compute resources needed by partial tasks. First compute how much
# cost we can distribute to workers that take shorter than the
# "busiest" worker doing non-partial tasks.
remaining_avail_work = 0
for i in range(0, numworkers):
remaining_avail_work += worker_work.largest() - worker_work[i]

# Equally divide up remaining work over all workers
if remaining_avail_work < total_partial_cost:
nonpartial_cost += (worker_work.largest - remaining_avail_work) / numworkers

# check if this is the best number of workers
if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
best_nonpartial_cost = worker_work.largest
best_nonpartial_nworkers = nworkers

Does that make sense?

BTW all of the above points apply only for non-partial plans.

Indeed. But I think that's going to be a pretty common type of plan,
especially if we get partitionwise joins.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Amit Khandekar

amitdkhan.pg@gmail.com

almost 9 years ago

In reply to: Andres Freund (#76)

Re: Parallel Append implementation

On 6 April 2017 at 07:33, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:

This is what the earlier versions of my patch had done : just add up
per-subplan parallel_workers (1 for non-partial subplan and
subpath->parallel_workers for partial subplans) and set this total as
the Append parallel_workers.

I don't think that's great, consider e.g. the case that you have one
very expensive query, and a bunch of cheaper ones. Most of those workers
wouldn't do much while waiting for the the expensive query. What I'm
basically thinking we should do is something like the following
pythonesque pseudocode:

best_nonpartial_cost = -1
best_nonpartial_nworkers = -1

for numworkers in 1...#max workers:
worker_work = [0 for x in range(0, numworkers)]

nonpartial_cost += startup_cost * numworkers

# distribute all nonpartial tasks over workers. Assign tasks to the
# worker with the least amount of work already performed.
for task in all_nonpartial_subqueries:
least_busy_worker = worker_work.smallest()
least_busy_worker += task.total_nonpartial_cost

# the nonpartial cost here is the largest amount any single worker
# has to perform.
nonpartial_cost += worker_work.largest()

total_partial_cost = 0
for task in all_partial_subqueries:
total_partial_cost += task.total_nonpartial_cost

# Compute resources needed by partial tasks. First compute how much
# cost we can distribute to workers that take shorter than the
# "busiest" worker doing non-partial tasks.
remaining_avail_work = 0
for i in range(0, numworkers):
remaining_avail_work += worker_work.largest() - worker_work[i]

# Equally divide up remaining work over all workers
if remaining_avail_work < total_partial_cost:
nonpartial_cost += (worker_work.largest - remaining_avail_work) / numworkers

# check if this is the best number of workers
if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
best_nonpartial_cost = worker_work.largest
best_nonpartial_nworkers = nworkers

Does that make sense?

Yeah, I gather what you are trying to achieve is : allocate number of
workers such that the total cost does not exceed the cost of the first
non-partial plan (i.e. the costliest one, because the plans are sorted
by descending cost).

So for non-partial costs such as (20, 10, 5, 2) allocate only 2
workers because the 2nd worker will execute (10, 5, 2) while 1st
worker executes (20).

But for costs such as (4, 4, 4, .... 20 times), the logic would give
us 20 workers because we want to finish the Append in 4 time units;
and this what we want to avoid when we go with
don't-allocate-too-many-workers approach.

BTW all of the above points apply only for non-partial plans.

Indeed. But I think that's going to be a pretty common type of plan,

Yes it is.

especially if we get partitionwise joins.

About that I am not sure, because we already have support for parallel
joins, so wouldn't the join subpaths corresponding to all of the
partitions be partial paths ? I may be wrong about that.

But if the subplans are foreign scans, then yes all would be
non-partial plans. This may provoke off-topic discussion, but here
instead of assigning so many workers to all these foreign plans and
all those workers waiting for the results, a single asynchronous
execution node (which is still in the making) would be desirable
because it would do the job of all these workers.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Amit Khandekar (#77)

Re: Parallel Append implementation

Hi,

On 2017-04-07 11:44:39 +0530, Amit Khandekar wrote:

On 6 April 2017 at 07:33, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:

This is what the earlier versions of my patch had done : just add up
per-subplan parallel_workers (1 for non-partial subplan and
subpath->parallel_workers for partial subplans) and set this total as
the Append parallel_workers.

I don't think that's great, consider e.g. the case that you have one
very expensive query, and a bunch of cheaper ones. Most of those workers
wouldn't do much while waiting for the the expensive query. What I'm
basically thinking we should do is something like the following
pythonesque pseudocode:

best_nonpartial_cost = -1
best_nonpartial_nworkers = -1

for numworkers in 1...#max workers:
worker_work = [0 for x in range(0, numworkers)]

nonpartial_cost += startup_cost * numworkers

# distribute all nonpartial tasks over workers. Assign tasks to the
# worker with the least amount of work already performed.
for task in all_nonpartial_subqueries:
least_busy_worker = worker_work.smallest()
least_busy_worker += task.total_nonpartial_cost

# the nonpartial cost here is the largest amount any single worker
# has to perform.
nonpartial_cost += worker_work.largest()

total_partial_cost = 0
for task in all_partial_subqueries:
total_partial_cost += task.total_nonpartial_cost

# Compute resources needed by partial tasks. First compute how much
# cost we can distribute to workers that take shorter than the
# "busiest" worker doing non-partial tasks.
remaining_avail_work = 0
for i in range(0, numworkers):
remaining_avail_work += worker_work.largest() - worker_work[i]

# Equally divide up remaining work over all workers
if remaining_avail_work < total_partial_cost:
nonpartial_cost += (worker_work.largest - remaining_avail_work) / numworkers

# check if this is the best number of workers
if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
best_nonpartial_cost = worker_work.largest
best_nonpartial_nworkers = nworkers

Does that make sense?

Yeah, I gather what you are trying to achieve is : allocate number of
workers such that the total cost does not exceed the cost of the first
non-partial plan (i.e. the costliest one, because the plans are sorted
by descending cost).

So for non-partial costs such as (20, 10, 5, 2) allocate only 2
workers because the 2nd worker will execute (10, 5, 2) while 1st
worker executes (20).

But for costs such as (4, 4, 4, .... 20 times), the logic would give
us 20 workers because we want to finish the Append in 4 time units;
and this what we want to avoid when we go with
don't-allocate-too-many-workers approach.

I guess, my problem is that I don't agree with that as a goal in an of
itself. If you actually want to run your query quickly, you *want* 20
workers here. The issues of backend startup overhead (already via
parallel_setup_cost), concurrency and such cost should be modelled, not
burried in a formula the user can't change. If we want to make it less
and less likely to start more workers we should make that configurable,
not the default.
I think there's some precedent taken from the parallel seqscan case,
that's not actually applicable here. Parallel seqscans have a good
amount of shared state, both on the kernel and pg side, and that shared
state reduces gains of increasing the number of workers. But with
non-partial scans such shared state largely doesn't exist.

especially if we get partitionwise joins.

About that I am not sure, because we already have support for parallel
joins, so wouldn't the join subpaths corresponding to all of the
partitions be partial paths ? I may be wrong about that.

We'll probably generate both, and then choose the cheaper one. The
startup cost for partitionwise joins should usually be a lot cheaper
(because e.g. for hashtables we'll generate smaller hashtables), and we
should have less cost of concurrency.

But if the subplans are foreign scans, then yes all would be
non-partial plans. This may provoke off-topic discussion, but here
instead of assigning so many workers to all these foreign plans and
all those workers waiting for the results, a single asynchronous
execution node (which is still in the making) would be desirable
because it would do the job of all these workers.

That's something that probably shouldn't be modelled throug a parallel
append, I agree - it shouldn't be too hard to develop a costing model
for that however.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Andres Freund (#78)

Re: Parallel Append implementation

On 7 April 2017 at 20:35, Andres Freund <andres@anarazel.de> wrote:

But for costs such as (4, 4, 4, .... 20 times), the logic would give
us 20 workers because we want to finish the Append in 4 time units;
and this what we want to avoid when we go with
don't-allocate-too-many-workers approach.

I guess, my problem is that I don't agree with that as a goal in an of
itself. If you actually want to run your query quickly, you *want* 20
workers here. The issues of backend startup overhead (already via
parallel_setup_cost), concurrency and such cost should be modelled, not
burried in a formula the user can't change. If we want to make it less
and less likely to start more workers we should make that configurable,
not the default.
I think there's some precedent taken from the parallel seqscan case,
that's not actually applicable here. Parallel seqscans have a good
amount of shared state, both on the kernel and pg side, and that shared
state reduces gains of increasing the number of workers. But with
non-partial scans such shared state largely doesn't exist.

After searching through earlier mails about parallel scan, I am not
sure whether the shared state was considered to be a potential factor
that might reduce parallel query gains, when deciding the calculation
for number of workers for a parallel seq scan. I mean even today if we
allocate 10 workers as against a calculated 4 workers count for a
parallel seq scan, they might help. I think it's just that we don't
know if they would *always* help or it would regress sometimes.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Rafia Sabih

rafia.sabih@enterprisedb.com

over 8 years ago

In reply to: Amit Khandekar (#69)

Re: Parallel Append implementation

On Tue, Apr 4, 2017 at 12:37 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:

Attached is an updated patch v13 that has some comments changed as per
your review, and also rebased on latest master.

This is not applicable on the latest head i.e. commit
-- 08aed6604de2e6a9f4d499818d7c641cbf5eb9f7, looks like need a rebasing.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#81

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Rafia Sabih (#80)

1 attachment(s)

Re: Parallel Append implementation

On 30 June 2017 at 15:10, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:

On Tue, Apr 4, 2017 at 12:37 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:

Attached is an updated patch v13 that has some comments changed as per
your review, and also rebased on latest master.

This is not applicable on the latest head i.e. commit --
08aed6604de2e6a9f4d499818d7c641cbf5eb9f7, looks like need a rebasing.

Thanks for notifying. Attached is the rebased version of the patch.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v13_rebased.patchapplication/octet-stream; name=ParallelAppend_v13_rebased.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 80d1679..8639922 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3655,6 +3655,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index be3dc67..e7396f3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1109,6 +1109,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ce47f1d..26e0a28 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -231,6 +232,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -295,6 +300,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -798,6 +807,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index aae5e3f..539c75e 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,47 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
+
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
-static bool exec_append_initialize_next(AppendState *appendstate);
 
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -72,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -182,10 +229,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -199,6 +246,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -225,16 +280,31 @@ ExecAppend(AppendState *node)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -272,6 +342,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 67ac814..3bf0f85 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3a23f0b..688f16f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -376,6 +376,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2988e8b..48e3973 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1579,6 +1579,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f087ddb..306fc1e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1281,7 +1284,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1317,7 +1324,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1395,23 +1460,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1476,7 +1557,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1694,6 +1776,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1713,7 +1867,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index eb653cf..5bbc683 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee2350..0eee647 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e589d92..a1297d8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1049,7 +1050,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5269,7 +5271,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5280,6 +5282,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2988c11..7d439d8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3589,8 +3589,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index cf46b74..64479ce 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -576,8 +576,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -688,7 +688,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f2d6385..0b79f0e 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1265,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,43 +1279,50 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5..f2770fa 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 82e54c0..2dba157 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -910,6 +910,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2b1ebb7..dbf0cb3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -294,6 +294,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index ee0b6ad..d47163b 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,11 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecAppend(AppendState *node);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 85fac8a..3f8b124 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -976,12 +977,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f1a1b24..74da90d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9bae3c6..247cc34 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1167,10 +1167,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1178,6 +1182,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 63feba0..8e66cf0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0c0549d..40d31bb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132..35adf12 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 35d182d..6ab7cc7 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 3e35e96..f5bb820 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 70fe971..6cdc009 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index d2d262c..4b07c03 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#82

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#81)

Re: Parallel Append implementation

On Wed, Jul 5, 2017 at 7:53 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

This is not applicable on the latest head i.e. commit --
08aed6604de2e6a9f4d499818d7c641cbf5eb9f7, looks like need a rebasing.

Thanks for notifying. Attached is the rebased version of the patch.

This again needs a rebase.

(And, hey everybody, it also needs some review!)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Robert Haas (#82)

1 attachment(s)

Re: Parallel Append implementation

On 9 August 2017 at 19:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 5, 2017 at 7:53 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

This is not applicable on the latest head i.e. commit --
08aed6604de2e6a9f4d499818d7c641cbf5eb9f7, looks like need a rebasing.

Thanks for notifying. Attached is the rebased version of the patch.

This again needs a rebase.

Attached rebased version of the patch. Thanks.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v13_rebased_2.patchapplication/octet-stream; name=ParallelAppend_v13_rebased_2.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c33d6a0..d844b99 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3679,6 +3679,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12d5628..d365d0b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1109,6 +1109,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ce47f1d..26e0a28 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -231,6 +232,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendEstimate((AppendState *) planstate,
+										e->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
@@ -295,6 +300,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
@@ -798,6 +807,9 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 				break;
+			case T_AppendState:
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+				break;
 			case T_CustomScanState:
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..11f9688 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,46 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +230,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +249,14 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +285,31 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -279,6 +347,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -298,6 +367,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 45a04b0..dd23eae 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 379d92a..167a28b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -394,6 +394,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 86c811d..51210b5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1594,6 +1594,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f087ddb..306fc1e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1281,7 +1284,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1317,7 +1324,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1395,23 +1460,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1476,7 +1557,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1694,6 +1776,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1713,7 +1867,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b35acb7..fe677a3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee2350..0eee647 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5c934f2..9c7a6d6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1049,7 +1050,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5270,7 +5272,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5281,6 +5283,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2988c11..7d439d8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3589,8 +3589,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index cf46b74..64479ce 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -576,8 +576,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -688,7 +688,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f2d6385..0b79f0e 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1265,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,43 +1279,50 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5..f2770fa 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 246fea8..0782aa3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -910,6 +910,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index df5d2f3..0a079b2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -297,6 +297,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..7d9e881 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,14 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 35c28a6..ea76d4b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -992,12 +993,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f1a1b24..74da90d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9bae3c6..247cc34 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1167,10 +1167,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1178,6 +1182,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 63feba0..8e66cf0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0c0549d..40d31bb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132..35adf12 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 1fa9650..7a5b3c7 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 3e35e96..f5bb820 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index c96580c..60ac387 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index d2d262c..4b07c03 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#84

Rafia Sabih

rafia.sabih@enterprisedb.com

over 8 years ago

In reply to: Amit Khandekar (#83)

1 attachment(s)

Re: Parallel Append implementation

On Thu, Aug 10, 2017 at 11:04 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 9 August 2017 at 19:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 5, 2017 at 7:53 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

This is not applicable on the latest head i.e. commit --
08aed6604de2e6a9f4d499818d7c641cbf5eb9f7, looks like need a rebasing.

Thanks for notifying. Attached is the rebased version of the patch.

This again needs a rebase.

Attached rebased version of the patch. Thanks.

I tested this patch for partitioned TPC-H queries along with
partition-wise join patches [1]CAFjFpRfy-YBL6AX3yeO30pAupTMQXgkxDc2P3XBK52QDzGtX5Q@mail.gmail.com. The experimental setup used is as
follows,
Partitions were done on tables lineitem and orders and the partitioned
keys were l_orderkey and o_orderkey respectively. Range partitioning
scheme was used and the total number of partitions for each of the
tables was 17. These experiments are on scale factor 20. Server
parameters are kept as follows,
work_mem = 1GB
shared_buffers = 10GB
effective_cache_size = 10GB

All the values of time are in seconds

Query | Head | ParallelAppend + PWJ | Patches used by query
Q1 | 395 | 398 | only PA
Q3 | 130 | 90 | only PA
Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41
Q5 | 123 | 77 | PA only
Q6 | 29 | 12 | PA only
Q7 | 134 | 88 | PA only
Q9 | 1051 | 1135 | PA only
Q10 | 111 | 70 | PA and PWJ, time by only PWJ - 89
Q12 | 114 | 70 | PA and PWJ, time by only PWJ - 100
Q14 | 13 | 12 | PA only
Q18 | 508 | 489 | PA only
Q21 | 649 | 163 | PA only

To conclude, the patch is working good for the benchmark with no
serious cases of regression atleast at this scale factor and the
improvement in performance is significant. Please find the attached
file for the explain analyse output of the queries.

[1]: CAFjFpRfy-YBL6AX3yeO30pAupTMQXgkxDc2P3XBK52QDzGtX5Q@mail.gmail.com

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

PA_test.zipapplication/zip; name=PA_test.zipDownload

PK
�FKPA_test/UX���Y���Y�PK�FKPA_test/.DS_StoreUX���Y���Y���[hU����6����M;i�4�������X���dcHB7�6v3�36���ugvS
�h�jQAD�Z�R����
EA6}�_T�<3s6������?���2���|s�3KAG^i$��"�iYU?W(�=�N����H���i+��W���/_�|����	k��}��u�Y��c��<�p.)C�G����y~]�{����r�r�q���/Z���;��P@�G���V�}����<>�������t�n�Y�o���YvC�%�u�K���2���
�$'���-�y�C�rc���y8!�����i��a9=���)�9n��jnI���P��3o(�8O�{2E�DN-��	�P�5KKk�f�!��#E+gf���7��Gg��XxK��,�Oj�4���eg�?����O�9���^��k	��lP>Y9(�UHx�x�8(��x���Wv�)Z�4r kk�a
�9�y��[��=�����z~��R4���L^�m�]������\�����0-��fl�V��N���������ne������-�f�U��NV�E:�q���UW_��r�|9
W������3��:4����leTC��#���v���B��c��!O��4\^U-��
���T�|��!����jI:���s*�U�M��C-�**�w����ml���R�����.�U�*!�XC�(�����<��:Q��SS�qV�<��M�;w��-{ ��U���;����vAsr���jgq�w�v
^ �^B�d�$}d���y�<AN�����-���|C�w�G����B~�aT�u��=���hA+�������8#�	h�l�q��q<���8N�4��,^��x
��|����5*P���bHqaUJ��8�:������{.aJ6:�������_�q�����\y��@�Y�E��WUo��V+F�X��r��O-�����Z%��\�>���w�E"4�������HK\��(#7B�� ;Q��H�.�N�����w�v�b�*������ED���cF�`��'�����?P�*lgd^��<�v�dn�Hb�q����0�E�esw���c��x��1<���4��I��x/�e��w�.����
�.c�w��W�ZH��4������W%�U9����������,;��d�u��kG�9Q,@��C��ax�}�������PK=�z�� PK
�FK	__MACOSX/UX���Y���Y�PK
�FK__MACOSX/PA_test/UX���Y���Y�PK�FK__MACOSX/PA_test/._.DS_StoreUX���Y���Y�c`cg`b`�MLV�V�P��'���@�W!��8��A�0]�PK���4xPK
�FK
PA_test/head/UX���Y���Y�PK18�JPA_test/head/10_1.outUX���Yv�vY��]Y��~���7K)	�F���\%�q"+�W*O.�;���K�y��}����sp���
J��I{���>�
0�>���O���w����W���G��������f������f�\h���c��Gg������g�..�W�@<������:�.n���
w�9����.�^�n7���#��������E6�m���|��?����O��-g7����fw����b�/�������9�m�//����b�g�_\\,wt�b�=��~���W���	�=��|�nv�4[�����-K���\^���f�����r�]���M���o���,{�Z�����`��E�����y��9����g�a~����d�|���A?�����E�]�>{�]���M�����jM����W�%^��-�����y��z��}���:?�����v�qA��hw���#$"]="�v�����Z�NT����D�(F�� ]_uO���^���W���~��4a4X����-��hqDi�FvQ��K�H�8'{uv\s��X����7������5����tX�����&{}=[.s"�~���n9�x�W���z� ���S����`�J�9g��"�m
Z�W� �2-��
�ahly)��/�=�>_�i�=���qy����E����|�N�&sI�*y�\	�"���p���I��h�x)S���Xw�O>����Z	�I��E��XT�-�C�5}D�;�F0w�b1�A�[�F�M�b�1�Y�k|X�~�-��C�B3l�C(��\�F����OFZ�wB���wc��:=�(��7��N�|�����u�XF\+B2�yd�LH�%�xGT[��KK�9��U�t*����c���^-	���a���	��Yx�	�����<��x�*f��%C�9����=)��"(6��Pz(���bZc��
��c��y���� �Jn#j�h�6�R�N3 -�t&ly���+{���="v����v�����'M"'���z6G72I/A��^(�CQ�!�)��T��	�Ca�a�CJ�4���s�(y2B��q�����J�	�,�)����a��B�^P+�����s����t��O�T�GSa���]�=�bO���,�����3���o�9���:�����do��e�Zf��w��9k�b_C`y,�M�L�#�!�*���x��'M���b�Is|�������Z���z����{�U_�������j�>��������u�o~���X�k����<{�#2ey�#��A����c��hE
�r"��6�)�1
0Z"?��2���&���^{t__�_$�����K|Q�Q2m*=ZJ���:�`)�;,�_D����B5|�� �FU��i�����>�c2���_	��N�z�<�C4���y��2��XG+�O|�������
!�������
�+�ku�<�9:�/�/��/�_N|4?�� 3����;�DvX|,��1�Q��_�->8�"�)��D]���N��*\W��\�J|5|�|������|���B��5s1�#�+Hh��H�D,=(J��T_���%�.�:@��1����T_���G����N|==�) I|2]�G����2�&!�e���SjQ��2����O_�UU	G��g( 
3��%�f���&�����t*��1��qY��"YJ�� '��RS`�S%��#��4��1>�=8�Ul���I�o�o��o��~������I���PG�SZ|��:.�h�>����
|�D�$�z$��*�X�2R�4�w��w}�w����u�K��� ��r5�XP[uV��'����m�Xc'
u����t��i��/Kz�eLE[�l�P����]�/$~����o1��/�h���V��4�1A$2WAG(~^>}���[�}V�5�����@M3�l|�i�[��B_��Una��������d*F�d��8kmqOd������UtePL���[iY��5��������m���3�y�$���`��*���r�WnQbO��]F/��z����N��)�3�
"��V53l�oI����La��c�Z���_a�ka��r}�[HV��OP���S,g�2����JKg-����)`y�R�.	Dq�-}���+��|���3Q��&�P�
�MU�[!OS����-�Un!Y�>A��Z�(��c����	�S�:!��B��5���i����Z�X��e4M�>����DQ���D��`��*���r�Wn��T�:���d��b\��*D��W[��p��=�!���+�~S����V[7��������vQ��4��0X����-$�����[�$Qwf�������g�DvX������n�P��r[m��RZ�w���O��Dk8�uZ���B�:-L\�M��r���)��5Q���dQ���Zs_�����5rT�)"� ��Q#0��Ft�[4�����i��������s����]�'��m�z[�g���
I�����

i����E
?�� PZh{O���l����o�H'��6����mv�h�o�d^��������y��$*[w�����7��}*i%�q�
��=�g��
���a�$m��\���/��zw�O	��+��7�����{-�����l@c(j���P7����}�
����j�
�����9�
}���$z��e�������2P�u@7u�)?�M��=�	(��[1>!����C���qX���r���m�3F,lO����{��/U�����E
zH�_�$w�	�3��m�v��_��5��Jx�P�c}��pcfo�Z�0<t����^�6?{��[������D�$�C��N�j.��Ms�+5�U���!�	o�������X�k�h���;��i��}?n��e}I�8'���R��'��W�d}��9\����/��"��K��n���[�n��w�0���d���4���(%����9�Y�{\}7<TG�4�r6������1#l��X&Z,�{_c�"�^�q�
E���:$����b1Z,�,S�L��2� X������o�	8�4�(�Kb��8�c��X�m_�j�	P��2���Y���128F~s�X��,�)X����(fI��|��?r���`I�����uU�u��X&���f�6������P���� ����h�Lv�LN�2y:���`�L1�K�o_q�	���������Q�c�j�L��L:1��p��@'+7l 9�T'��,S��L=�	��e�8>]P�#�`I��� ���u"��c,����7"F_�e���
��0�[�hkm��t'��,���L?�	c�(�7��Y����$7��<b�a�E1�z���~�e���\�2.�Q��a:�{�|��t��L�2s:��C`�s�s��7/�a���k�k�%y�(�O���B�1��A������[
��Z.e�Fs�Xf;Yf�`�=�e�!��H�����QSo@m�e�5F���r%�~�e�1���� ��}j�N���&c��d���e�t����2�2�/S&$�i�`I����f�������t�_� "F���k`-�S�R��X����lj��B����K��,=.��'�����(G�uH��eq���I ��:z?�����K�A}d]��8��9Z,����)z?���x���8�9�W?�U>bv�%9�tY/��$�~�e���&�������<�D�J�+�r��d��bYg�L����~�������e,H8.�`I�2U�1Zb{
��X���hF�Z��>���2����nx�]���bYg�L����~������e"��0�;���ea�N�=]PM��v�G�^��o���2o �?�KB�<��*
��0E������#}U:���M2�u�=����~��oOY�><~���4X��@���L+J)+��Q��k��:{?`��8��F��1�tXI��C�6XR�L�k�L�@1��c,���ht��Z��nX��D��������N��������^�(��r1�6X�G��C@f��������}�H����.pZ�3��"���Ug�L���7{�h�HL(�B�3��%b�@1�Ml��<�;'���34��a�i���Ay��l�Q���u|<������u���?���E&�>%�����v~��3)�~�������8u�Q%���E�3�;��.&���Nx�����Q���4�~V(��n;'ol���p���f�B�x�������%�C�1L��A�G���>�������0F��Gu�=�6��[��xd@��#%Z�F8�c�,�����}N���*�%����{�M1{�*�*���CS�wl��P��<�3��-���^��rQS����r�X�����-rg���������.(�x	�U�)�_~,la$�<z�?PK��y���PK=8�JPA_test/head/10_2.outUX���Y��vY��]���������bwl��M����N�:�t�)��X��w�E�&�_�H����I"�\����:�Ka�}`�,�H�?|���������XO�=y�<�d�w�m�=��6��B����X���W�l^
��������,{:�ow��l���_7 %�2_+Y]v�Z�o^��'4��m�w��l��lWw����/���7�[�������������2_l��9v�c��6_^�����<���=�>��Z����<{�|�z���v��3{"f�������l9�.VK�,�6?r}��7����oV��O�Www�r����O�����oF��(C(��(zD�0���Wi:����-F�����F��_~����o�����*���_|�����M�����jM����g�%^�_-����oy��z�����:7�����v�qA��hw��~ $*.���#B����Z�NT���qgv"d#L{����'��Z���g��|M
^��Q�0,r�K�-)�hqDi��p��0��CTN������'�$��Wo�����#k�G�c���Z����M��v�\�DB1�����r~3�A��7��vA��g������`�J�9g���d��A���2�	%�H r�(/�M�h�Q���N��)�]�K���D���=��
3J{�!�EU��k#l���wF�n%�����1*^H'r���a������T�X��E�)&sH_�	^9
��0,�D�������W�:��u�������b�~?dh��7���1�bm�W,h��t1z ���`m���X�Z��k'76��m~�����u�XF\+B2�yK�}#>�V�t�?(�<1�Du1���.F���WK���cX��s���e^n������e7^Ie�0��W�DA����=)p(�f��C!�Cm1�1=���)���6��e"�p+������8���DC�:c������+����-F���8���i�P���n7`<���{��l�nd�^�}<@z�W�kN��d��	�CaB�����!��b�'��'�����=)<"�r
dG��V�Q�
iP��
|:w&>���`��<��PB=��Nb/mq��3��h�������Q�����6�>�9�~>[f�ev���v1gMV�k�#�E��4S��*��T�<�	���1�&�O�����`H�����6_�=�:������������-�/?��of���[���-Y�����O*�`�p\��A����+C-5%���pOd7���hH��'���8��
|���k^FK��S����,��F��c�������H|19��)���
S�t/Ja�r/R�g��H�.y�����E��jj�>0��S0�)|������A�c�1�qz��O|��
��(,��eg��"{�O��J�h4Ne�e�Ff��L	����Xn����V��I��|9|�|��rr�{l$�
��'��Vqh��&DWV�"�4?;�+�	�Wm�vE�[
U��3��&4g&��.�j�dd��A��>��d�W!�W���
|iJ=�01��R�:*��(��*���\�K�|����V+�������'�3�o"�%�i��������_O|),�	�.^��:��oTH���
��z��rxQ>}�M�4	.&���;C�'��#�o��%�f���&�����t*E��{�����Vu�Q�"��'�
|'K�P�[��Wy���u4��b�7��hK��9Q�:��v���6��G�M�(�����0��3�";BD
fy\����f"��6�C9'.g"9��?,�	C_*�K%��
���%��>�1.�rf�G��U����=���"�q�U�4j;Q��
|���LC���}���0g��:���~����$�w���$���Yc|
S�:�jN"%���M+
�'��B�rk������fx?�HMy|�i�[��B_��Una�����F=
�b��6`UJ��=��`�*�RNe�;*��2�#���j�~X�_��Wu4���I�����-�Un!Y�����������^���K`�i
���2��'�|�:�l��I/s2'��r�a;�}��<��JIIz�����|d�Vn��r�*��*�u�PP0]�Q��O��������[��T�]�U�0U-g�CMj!C��QT-����`��*���r�rk�I��#�Uq�q)C�"��|Y-g:=�:>�+��;Vo}.�����
�5�^�7�B���-�Un!Y����J�)Vu4/[%���3I�)-����������l���+�Fp{����
%�D���Qmj �s�f9+��W��d�[��r�Z�>��R�u�	J��W���U�b-4���X��6tW)�i���������p���W��duZ��N�"��4�a_s��+��3E��S�
�'� ������(��JF�<�,j�UT���������/Dyno�I�I����y�.����6G�������aCR�ii��o�,�KyWP�8�64/���	�������<���7[2�����w�-s�4+������0��	�d����%1���"���~1�Y.I�?,?���l���u����r{S����a�x�����p�8�dCQ�YB�43($S�2��P#nX� �I$���,�O�M�'�_f���m�M�F@u�������@7e}�n$9���c�H�t]@�����o#�����������x�n(�a� D����'��8�N��-i4�}[�](������J%�s2��8���	�7E�`�����Ki���=������/��4�C��N�j.��MsC��m��P��J~�R�S�EZc,�5G4w]���4G�����\���>����/����)�����w_d}��9\����/��*��K��n���{�n��w�0��od���4���(%^���9zY&��M@.���},�|Oz�T�ahy��X&Z,�{_�� ��eY���f.�p�u�-��N��)X&Ng�x,#]�eY�2�)c�����e<�30���X�1�c�}��Y&@a?�l���%���@����bv��`��2|,��%�/�`\1�
��,3������L�}Y�2mt������ �;�]���(4G�e��er
���Y&�d��1�������������Q{_�	�c�j�L��L:1�2���p���|���,S�,SS�L��2�(X&\��,�{}��Q�
��,�6�2��2��c,����7"F_�e����E��hkm��t'��,���L?�	c/(�7G�x���m,�#�v`�L!��L����e\������y�|��t��L�2s:��c`�K�s���e��u�����,s�Z��������7����nX�z�d�-��b��d���e�t����2"Yj_V�|_W��5�2I^&Be�$�~�e�1�����7Bf��eJ%�e��en
���Y��P�������eZw�%u�,d�(�C�	�a���
�AE�*�L1[�~XL]�����nj��B���K~�,=.�{�%3�c
���,,�{��:bQ�1�u�~@��i9�����$��+�n��9���������N���������C����2.�Q6���e��
Cy�	�cY����2TRaYA��N�������������N������!�M���s��,Q����e��x��@�1��{?������L)���Pu�8����h���������{?�q�~O�2�[�(�&MX&a���>viT��X���h����
tX����]/���JCg�L����~����H^�.����Il�%yU�/!���Kv4P$gY���R��e�t������zQr������������N������A��^�=Hh�%�#�����^�	�cYG�G�[X*3��a�nabY��D��������N���G�����^�(��X2��
��+�����*��c,k,��
W�����i��:�=`�f8��A�G��D����SE���~�&��|������7���|YCNCV�
�C��g}���F�x����������K��g���|��������6|6��orz�!%�����*S��N=��=����e���_u)0�Nd�O�g��WI������Oq��\a)�tB�A1�`(�`��m+��bO~w'��d��p���H����z1�]���(�G�:�yn��R�sx�>)�Z7�{g�<�8.f�s*�Uy,��'T���cn��;Ty��T�?��S����W���k�9m�$��2��?VWB�mJ|s;[.�w��W
��9�m�d_���wA���(��h������(���=y�?PK7��X��PKF8�JPA_test/head/10_3.outUX���Y��vY��][��6�~���[�-��;��S��dfg��x�Mm�SJQs�*wK]v&��{H��xQS"�t���n������s�e_h���o>�w�������d/��?/�����>�^.7��[a��Nz����L���c�V�������������������-p��d\J��k%���7���[x���o���a�-���!������s���������u�rwxx��~��W���G������s��o����j�g����
���Y���2{����v�[n��+G"�������l���6k�.�6�r{��w���=�m���Zn���>��?d���m��j��LB15V�#r��y���J�9���}�n1���G���� ��7���D����ns{��7�o������.����a��g�����%��?������y���������m�i��#$@z.�w����Ns��'B�%<��	NY|-]y���`c���	��^���I=�������.�����z�&����%��VY#�(
�FJ���^*���S�y���b���>��_���X�=:�J������s��e��u�,��{�8��wo$�~Xl�+�L?=���yb�rJ��/����!�`@��x����R�����R�R��G���;����v1��o��N��iL�^���x�����O�5���0Tq�5����xk��}��O�c8Jt�����#�n8:��FrO�+.F�}�q
&�`��kD�|�+F�W\��������a��|�6a�?r�����X�5#�uF[�����Bp�7��o��#�O�.��������Yt�����Q�+#���v�eZ|8���W���.����(����1y�4l��9���m^���7���<��xuq�^.'�Yi���2�I��4w�l�G~8|�.Cm1N����	����#Z���E^�@�*�"j�	h��x�Vs�.?���	�p��bD�^���{��tNw����\?�x6G70��m���4J�)Q*����#ip*
P��"�@�c	����3���P�=>��#T_�54�J���O)/r8�H���^o(��2������"�A��S6G3/�esq$��T�8)�5���WF����>�>�9�~�Xg�u�x��~�dMb�,�f���,����>	��_���J^��b�lIs�yu�����j������/>�r��G�a~z\�-�W|��8��G40���f?�R������_ ��\

�A�B{�!��: �q4Q�V����o��h��,#���0|&?��"�
=��k���A��>��d����O|
A����R���*5��%t����a�L���r�1�����,���#�cF�
� U��A��>��d���[|-x�+Kz?�r�7�ss
�Dv��Hr4E�h��l��j��.vU����O�����}i���W��W}�W���f��F���<��s1������ �!�K�����n�������rP��:��h��HW���U�G�� �u�u2��/����Q�a�
��(���������F/���
cA�����7-�k��mL�a���,�!�K���\�q��A��>��d�7���_=R�tu����2��L����������m�[�iEW��o��["�D,�Pp+�v���6������:���mH����'����li������v4���W��������
^�e��;o��
���%���w�\Td]�6h�� ���-�7�*���o�jsZ�J\\�,>��q�O�#��"
�� �}�}2���]�S�3�z���������
������R�L�G9-�kS�3-~�s��0��.��qu�?����I��<3��La�5u�����L�+S,G"���������A��'�����oWn-z���1qv0�O�!x\(M]�2J�	na�r}�[HV���+�
T��R���+7�O	|�*��32��HIu�y��Q�U��|O]
�Y���@�t��������B_��Una���T���)���="
cV��EJW�C	�����P3�,@G���e��c���a�Dl��%\�Vn��r�*��*�N��=J�c���1W�G"{]�lD�6���\�v�V���*�)�lR�n)h���������`��*���r_�r��M��V�Q1��;�U�G";�D�P��_�]�~�r��g��c	\�!�
X�z��^a�:��[���B��-�_�UV����P��������UI�#�=�:�PV����pUu���[+8s1������-�VJN)�
���'�`��*���r�Wn�QI�3��f�*Z)�3
�����&��������i����]���
w�y~oM����B_���ia�:m� �C��c�%v#.��
l�B/�lzE� �V�*���Yl�k9p�����x����G+1�����Go�m�z#[�����&Q�O��M:�~�"���"hJ!nn�NQ�~�]l�E�HO��UL��K%���N���|�G��ol��NHO�Y+�]�9��c�L�1H�$����BE=�W����������z��!�����Ew	�����h*���M��g-y�N�#� �E����P7��m��(�,����P�X4(�i6eZ
G�i?H$���bww�3�I7� {�WvB7}�/?�
d��<��9���S�H_�t]@�����o#��������!��x�n(C5���Lr>���vG����p�J1���M��;$l�9��q�P�n���gMG��)j9���v\��<�d�s$����C������US��>�C���z3�*��s)�2J�y:�A�%��������u�wr������-������~T�z�
�S�����w��)���_����d����R���~�������af9�M���&�?����Wl�k�a���)\,W��Hz���
�����L�XF{_c��� zYV4:�����u�-��N��9X&��L<�q}E}�9�3���"=������e��t\��?	��c�l�e�fm�g�%A
=�j!���o��d'��,��Y&���gI�2J �A�Ik�`I��R:��"��s,S���f�����������[_/�S���2��25��x����2��c,�=AX2�m��fY�1Z��M ��t�e�^��,+<F�����E�79�t'��,��Y����O�2Gq'�h�%5�|����P��~�e��1���Hu�^�y�b�:��Z�R��t����23�e�9�LXwE��9zY���52f�K��G��X���	��c�m�e�f�����2
)��-�7D����d���ev<��s`��t�6Go���2!���V`I���r�@�;�B�9��A�Q�z?���#����%�l�h��\'��,s�Y����d��2��(���,��2���b]��)��c�o{���21�2O�@W�W������2��2?��x����2�T��,d��`��6XR{���R8���	��a���
'k�5�1���c@[W��L]�F���e���YVy2�N.�����8��J#����
��k�+SJw@59�:z?���5�1��})S�j?�����bYg�����{?�y�~H~�!p�1�g^P�}�*k�%9�|)���������d��������d�:6���������������~�����%f��e_m�m�e��L:fmT����������}������_����$�6�����������~������e!�G�xh�%5�����O�����������������
'	G���I��Ui����9z?`|�<����U�"��/�q�
�Yz?�A����J��A,�
�y>�-�a0LT�n���x��h�����������<z?�M����J9F�D,�t�D��4��s,���ht+m���8���fV��"y�����������,z?$���Ex�5����
�$����dRC��X�>X$��0�)GF�	�i��:�=`�f�����#5���K�t��t��������6��}�h4��b�/a��P�i���5F�:������u|:����_�q�>,?���M�h�mm�z�_������d?����Lg��o�n��-��m�������h'���3M�T|�Dx�Z{�R�#@��
+��X1�����q��Ye:�1�D����Qq$�#J�����/~U��'$~���v��_����(�|��${��p�����o���QJ:�'8���,����L}N���*�%�����{�M1{�*���9�r����1�z���jB���+\����"����C��-���b�^�?���dR0+l��{�}��|y
,��F������W/^�PK1�%�3��PKB8�JPA_test/head/10_4.outUX���Y��vY��]Y��~���7K)	�F���R%9eG��J��Es'Z�v�5�$��O3�r^���A�j)�����t��O4���W����{���Ou��W���Y�v���f���j�}-4Zn�1�����z���k���,n�w�@��������>�.��`�f�*��c�e�����5�xF�����q������v�����{��c����s��C�2{��=<~�X��m��=�������������z1���d�?��nn�;�p1�^eo�]l���r������|��av�2[�����-K��������f�����r�]���C���o��wY��j�7�J[��a95F�-r~��b=�Vu�k1~��[�0����d_~������w���l�z|�Mv��7A�_��5������x��~���/~��?�W��7>���m!�8jg	�Vq��!����L�(s��k��������D�(F�� ]_uO���^�m��5)x��Gi�h�����k��4ZQW����z,J�;��`�����K��{���z�vd���t,%����|��������X(���v�[������n��.H3����)KZ�4���B[d��A���V;���������(AT��4�h�z}�Q��.�5�mc"�7�0��^?B8�P����+K&���p���p��w�z��IQ�R
8���}8��V�fRu�l7��{r����Yhs�q
&�`�z5�n:���+��]��b���n1��6�Z�f��7�plx�:r,��A��)X���N0k�?_;���%��{�������.|��W�e��"$p�&CIa��\]ND��8���b�@:�^��b���]�B
_����C��Z������Yx�	����O��n�Z'�#�Z�2o~����uO�j�u��X��t��e���Em1�1t+���Z����u�>�(-$������`�X�y�|0P���pe�l�#b�B����{�M�x7`<��*��)��9����6��iH/��f����Q�9Or���*�V��>�(� �����xP��7�����TJN(�!;*���xF�*$��FpG)
4��������������D$e�1���y�mqX�s�X�f^r
��W�����>�6�1�v>[f�e����~1gMV�k�7�L]%�}����2^f���x�r��K0$�����6_�=�:������>����=%0?<��f�+��{���=Y����?U��Z�����aOzDf���RFU��E(����t�,����X�f"��6�}���������=z3��P��z��|1|�|��br��Iv	�O����`�LTQ�]B����>7/��)N��kg;,�_H������(0�0
�+��4��A�c�1�qz�Oa`������Q��]e~�D���y����H-�U\;��|�
����
�gq�Z�BU����<�9:�/�/��/�_N|4S_=r
ul4��U�������ea,�(K�;;�o�3���s��X(�o<m_�h�V�����J|�	b|u=
���A���D=�������QB2�\���S�:�|�5X�2Mo��{�j'��J���z���:����������b�OsK:r�)-�*���F�ig����
|,.$�|�7|���O��_`������$��>��N%���G�le�
O��:��l��Njy�Br1���u4�1_}���L��$�5�u:
�� �m�m2��O|�"������r5!4�����l?����+��K�5�8��
|�T�F$�:���]��T��r�T"
�� �]�]2���C�R,g������b���i��T���z�������r��f�2��*�hMV���iB�C��v?�����'��_=j�l����MH	Wu
�pf���RpU>}���[��X��b�\�.R gR�z1�$�0X����-$�����[I61�M���XTW(��*��������@�U�@�FN���[s���q��a ������t�����')F�+��W��d�[��r�{��O��:��4��X-�����[QB��:�08U��Q�������7����:d/��Wu�8m����`��*���r��rkQ��
z����K�����0@GV"���T�N�r��!�WuP��&���-�����������`��*���r��rk�I|W��3U��U���KS�4�J)���I
��[�}�h>�}{GZ���#��a�:��[���B��-L_���4���c)d�:��rIX6@G�"�4�}-��]�5�r���i���
?j��P�����Y����-�Un!Y������)��
="���J�s&%�Czh����H�wM���m���JiM�D
�oM��3X���:-$����u�I�o�	�5����?�jm���[8����/���D�A^%�F`��K���Z�)�������nKo����s����]�g��m�z[�g�����9J"���F���C���2t�d��]m��g��FKa0��:@�8fc�
A�z�q��o���L�[�����>�9o��$�����0�}qX)bjC9���
�{��2��*��O�%i�����]~�m���>%l�����T��+v7��|�"%:�=�����.Z������=�P�r�b���7,j�3a_2E��!��$z��q����������5{tS����������+5���1b�O@�. �P���7�T[n����r����i�B7�����~����2���vGs_�s�]q'F������
Y
�&kl+��07��b
G�OS�
����2e`������v���$E����2��>�S���z���=�J�R)-�������F�h6G4w]���4G���������~G���s}��GH�g_do��2�����
Eo��7�]��t��=<�tS����Y�~&S�b���_G�(���~�1�2��@��l��j��I�&1�tPM�2�b����D/��j:J4L��p�u�-��N��)X&�g�x,����Ts.���.t�`I����H�V��cJ?�2l�2U����~���: �_����1:����2�dN�2<�e�$XF1Kb�����W	�m��f�?,��z������L�}Y�2mt�����y���}�r�c���2��29���,�O�e2q�Z�C�LXR�,����]P�&�~�e��2]�2���
��8�j���,S�,SS�L��2�$X&\j���2��?���,��YT���K ��t;b�������Y�]_}�eB�m���b��d���e�|����2a�E���g�&YZ�Kr�ag.�:��c,3m_�j�q!XV���~Wn���J�2��23���,3O�e(���m��������o�4��<b4A���'�~�ev�e�y}�e��s���Z.e�Fs�Xf;Yf�`�=�e�)��H����P1���h�%9�t����jr��v�(,,+~�H8�;���R%�e��en
���Y���P���@�I�e,��e�t�kO:��#,;<]���!���1lB3-�����*
��e���,������K~�,=.���o���DX��� ]8��%�~�e���eZE�~yT�3����q8gs�X���S�~����4z?�_q\s��	���2}�����{v���U����������dyx}�e~���
5J��������������������!�M]/1
&�	#M�L�����5	�cY���1j�������>.���(�6����������������<5���&Yc,�{?���"��@�1��{?��22rV��}S�j,���JCg�L����~����H_����i��.�'���1b�?��T����,3
�9>�-\��`]/���-�u�~��p~�<��cR����.l��������FH��<��c,���htKeV?�(��F�m�"��Gg�L����~����@nR�~=NBlt,�YV��f��*��~�e��E����GF���������h����=�	4{�u�Gd�T���"�?���6��}�d4�aa�/>,�!�!���!k4�FD�J|������_���k������|��������6|>���rz�!%�w����&S�w��m�G�q��>)u)0�Nd-�C�GA��$�{|ET���8��-�3qs���/�����7��,4E�E�3XGF����_�	?i%��7�Q�#�kr����~�s~�����N�G��/N?�����x�Dk�|��e���������V��c�P�Rb��)f�P��{U��[����	�*/F���nbd����"H���)���l�\,?���d�0G���y�}��|�
,^�J�9�/?�Q�/�=�PK��v9��PK-8�JPA_test/head/12_1.outUX���Ym�vY��[�n�F}�W,�h{�q�M��N�:(��(E","��PR����w�wkEF�l�,���gf�����_�/�>�y��\_����4���8��]��x���,��.�%�0�
���LQ�|�^R�)���L����l��.�.^E��2J4��c\.'-�d��X�'�z����c�L����������#��������Q��V��P���.��*�G���v�d~�>����m��U�JR8����v��8�.�����n����
��>�,�k��.\�����i���h]��U��E��$^���0��2P����\^]P)��
���	l�t���Q��b`�mE���w��~���(�f6$��/�/�������S�F^m6�z^��1����$�Z�(9�Gi 9\LN0[�b�
��b�!	��5���r=�>���������q���sP�n����`[���p,�]l��m�0����{�)�v�e>Z���>UW����-l���y8Z����oE����`"4Z���o{��3AD���J���VL�<8��V�hm�4C���&4Z���>Z��qh��	�yk|���hu�V�*Fk}��h�=�uh
i
�[J<��<��,�-����h}-�G����`r�B3��ZF}-��Vf��2�
���2�kY�]&�����h}-�G��W�u�+%�*t����Q_����V;&Hc���|-�����j4�P�F�(t�������^�Wq������J��������jVi`,|�R^8Hje-/�M����@��K*���Oq������kU�4J�fa{�/��E���@	/��<�bs�7@7�p���������o	�n���t���,v����~=[�8�~��,��Z��da5��h�b5���{�H
�����tY>���!�����2Z����|��iR�>ZF���j�y�d�tau(��-�3wiN_���!^��J���)���w�����n��@W/o}ss������7��4����s8�{�f�j�����-o�h�����w����������=���'����z���@����^0�yj���6�E�W�]�;�W���{����n`+����/���0Jq��g�� ���u;+(�|.Eu+����YA��
������k,����
���ae��YNFV�
�DHV�CVP�e����+�q�r..u���Z7��a���TA�+��V�P�#V�b��XI[)�Y1x�`��d��X������HX�vV@�VH
���U�����
I�
��
"�,k������TAzZ�����������0>+ YU
���VV��2U+N�XaG\A�	� �c�e�2w ]���an�D���B3RA���JV�I�2�`g�U�*�d�m�1��2�!���M�$&��+xWe�+�8XSg�c�<V(4V�mZ�A��7T!]o�
�(���.����9x�����2����mZa�)�MEi�W���*VX:�b�X!��
��I�l�T5��RC�D�ZA�����X$h^��6�����wa���6u�
��V+����Ya�F�
��	{[^�����6���Yj�6�~��
�M��c���I�VE;g�#H���c%:,&��B��2wr����i��N����l��a����?�p���}}f��=�L�PK-C��o7PKA8�JPA_test/head/12_2.outUX���Y��vY��[mo�6��_A�K`%��4�����E�b�a\[����+�k�a�}G���V,���A�;�(=w|xw�IA��2~����{�������1N�
�m���t����Uq�,��1l��3Ey�yuA��tRL/�>C��q��P���J9#"1��U=i�eKx?;��x��wY�~9V��^�7.��]��N���Y�H�"���3�]M��<�$;��N�i69G������D�:�g9�����n�v��8ZM/����nT$�f��YZg������l�Y�x�g����mQ�N������2�pX�2����f���h�!��I��[DJ�az������ �jTh�����,�$���!�-��|E����8�����2YLj35.�k�e�`-*��T�6��.
&J`,����,�uC���%�^��:[���z7-�z�.�g�]:e���jk	��9�X���h��J0��!���R��s�1�-��-qe��0�V�h�����y<Z���oE���h������}��rh�DJ�F+C�2D+��V����V�hU��v��.�B<��y�C�:Dk��%%o5V��FkB���h����#6Z��{���h�CK)�<6o)	�R�8��-��DYl�a.�{r�o6<6h��h��:�uEPK��h�\F�\�m�9�J�����6�etO.�o�C+5p"�o�\F�\�T���V�����������SU�@�1�$d^�����6�e���(����
U�#P�o���.��R-��u
��gk�4�I����kS�G���N@0��1�AU���=F@<�j�Q�4��~w��$�B�^�
��g�Q�`��	*���&wOT����W��l�X$`;~��h�O����/!7Vs-TErc5�D4<`����Y-$��|���h�k����z��(�f��K>U�6H���2J�VX�8I�
�Jb�U
weNW���!�I��t��A��=:}�����������on.���yz~�a9�����8;�c�G0k���i1q��hv�'�$]���C��������9��zJ����\�����Uu "��|���U1�/�����E�;�W�H�:����n`+��L����]L����
�B@!W�Da�>�
n���Ta��gX�;+@��dYa ��9UM�bu��+UGf9X�7+41Y!vYA	��%��AVP
���A �oY�=�
�x+ �G� 2��af=+�m� �d�u=t�3�X�{�`��d�
X�����
HX��"���D�;H>V�!���
I��
�
XdYW	B��*���t���u}
$�+�f�b6&+L�
i����-�p��4D4�
;d��37Q3�
Xa�X ��!iK�B��p�@�i=+ l�������y��f�4�i+��`�S�Y�]�b`E����� 4�mZ����y[�i\� �A����+z��F�{z�4��x�
i�6�.sO��2v�m�+,9�����{�V��2e����s� +��s�����
)dLV�Mb8T��k�����>�+�=+�����X$j]�6����������e�i5<+����YaA#�dE�����-�ui�g�h���+��T�C����M��ct��I����d ��� ���1(�~YpR=-��.s'��(LG��	z�%�������]����t���NN�PK7�Y^�o7PK-8�JPA_test/head/12_3.outUX���Yn�vY��[�n�F}�W,�h{��q�MS�N�:(��(E",!����$-���%���-��\mq�3�gg�%B�/��__��'o�/���Hj��Y?��	���&kBN��j}��Lj��1'���������z|!�9�����'���s�Q�pRu\L�����Np5�� �]�l���g��n'uu;���.�
N����d�L����1N����d1KG������8��O����*C�:��K�����Vg{k~���ww��n�Nv�25��9�,�4K������j�nGZW�~�,��S:����V�2pt���U�l^qf�8����-����6�����G���t�N?M��d��lHo�?>&_���u�'.Yy�X$�Qa���#Jg���r����0��.
eZP
���Zmg	���X�K��6M��\�G��n8���j2��]�v���k�*�eTI��W
IU���Q���ME
i5���s���hE�Vh�������0������y<Z���oU���u�:mb�U!Z��
Z�h*���B���6�UZE���V�hu���@�s����Dl�&DkB���oa�[����!Z�H�<���\F�	.D�������-��;&p���q�����<c�������e�
Z���b]�c�
ssY+�Zz���#m��x���1��k"
b�
s���Z�[���yt&�����L�6h�.�)D��<�e<�e���[�TwB�*Fe"�
s��2�e9nnu_�jbD	F���BqT]�f���:�D���biR���2.S~o�:B�J�[@�����(5�1�!UV+[����j�'��!)A����X�9j����%�����X����o)�\����|��Y��9���|8>r�7����Z-�R�{�Z�O����e��V����X�67�|�GX]Az;X��dJ�%��m��N��hgC�)�B��b.���D�2�-�|�0���%J���)�����{r���'7/_����'��\^]������b8,��������a:�M�#�ayN���d�L���)���	�����9��y���g���x�����:?�0v��p���j=�-����6��w��t���t�
����V�3�_Kw	e��6&+D�
e(���+k�aV����Q)R�d��Y�9+�11Y!CVXLt�3��X[c0��j(f'Y���Ya���
��
�$�-a� 6�B)_8�uO��s]�����{�a���@�
�X�Q�N��R�J�@���c�����+�\��R4�
+����L+�� ]���1Ya�X�5PQm}P%����W��PVv�%���]�B�6dX�J
�\�����+_��2V�>�t�A���A\�
',eE>P��(����,����gE��0��� �GU+����LX!�Dk�0��6E_Wt�L]�������eBYW�&e�����������cE��B���bOo�/r��F��
��)����VX��6;���P�xV��M�lM�
����2�T����{Vt+@ALV�Mf%V��� �!�p�w�
���W���w�X��"�m��c������0+�����TD����5BLV��M����V
��g�`}���.V��_m����3��1�����&�Z����$����^�v�����B��e����<�g�������7�	.r������)�?>�������PK�:d[�o7PK%8�JPA_test/head/12_4.outUX���Y]�vY��[[o�6~�� ��X	�/AS ���[Rd)������X�m�������J����b!��l�������;�T���_~}y�]_^�����������]��8C�d���se	'�aC0�>S�&���}������St2�d��e�":�Tf�����r�<IV�~:��x���I�~;V�����n���h�
����y���,Z�B��v=�W�d�u���Y2=C7���:G{-�.���ow��5?������4�g��Y�f��G�}+���,v����J���H�v����.B?%�����1��*�L�4F`��xA�$Z�&��K�0Q��D
,K�(��F�E+�|�,��'I:��unCr��}E�����4\rg��j-���s
(��L�E�����������q�Xj��f;)�5��n��
l�ly��F_���x�6�xyW����9(Y��TZK��5b(g�r-v����`�����s���h���yh��O��e����2��?��h�����]�BD`�Zl�
�V�h���Nh���J��h��V�he���h5���F�|��Gk���:�n��j����.�-�`A��5>Z�8���h	�<�*�>Z{0��V��i
�[J<�p��1�n}K-���2z@�H&�<�B.#B3��ZF}-�������h}-���u[e�����}C�k=�e���ph%�Jg��e��2��f5�q�Bk����Z�-�q�����'+t������A��
��]���1BR����BqT�R��#
�)�>[X��o$����{Sy�:"P;�_%W�`�X���h�5�1�CU�4J�za{����,JK�P��x�C���Q��:�� ��v��=Q��0:]���x���,v����f9�9�~���Z���eY���
�&���?SPVs�� #Z��|�GX��t=N��y4Go�����R��}�����bK�HVXU�Y
�Y.����H��C<��J�����o����=�y�
]��y�����������_Mf���?NO���FM��"��n��9����$�W�����p�a����6�8GO������;6�|�wTEGD�Y���S�l�X�a]m��w��d�vt�
���R�M������0Jq��g$t��A����TkG{J�"�bX�;+`&B����0 t��J�i3+ IVPH4+sVf9X�7+4!Y!�YA	���0(���Xa�Sc0/��+�k������{P�~�PC�V�B���Bq�.Z���������
(�C�By��	���$D�fVXH��Fc&l+�� }�B�C�B���H���lS��*�� NA�u�me&�A�V��
�lHV��B���������Ji	��.������ �U���2�I9�B���
��

4�5V(3��gVhfB*H���d��*�V@\p�B���?'l�+��LmH����U^���2u���R�Uu"��+z����6�����%T!��s,l]����f����#e�g���i��U�L��m����Ap���M:���X!��
oo��&�j�� �� �)�j
�V���E�����;�=�.�Q-�������NN�e+�f��!$+��MX��z�	
��3S�?.�?Q��ls���]A�	y:F��6�n�����/�[�u�D�e��xZ�=]�.���[!�b=B/�D�M�$?���P
�~>1����F�PK�����o7PK48�JPA_test/head/14_1.outUX���Y|�vY��[�o�6~�_��&!���#��@������Z���!��������Q)G�����"��pH��}��������/�����>{3�����7Y�����;�-7�)(����Z�s�������d�V�����#vX���b����<\�0&5���r��������9c/�E1�>����j]^u����[��[�&�sQ_���M�����W���V?��Dn��7��7������\o��y�X��'�z�.����]���������?������o�R��\[��6��z��n.���5�uY-�$j�3W�r
�W��g�sF$;��<��;�q4	�8���Fjg�g�K3�����y�(����d����6����NYh�U�������V��2o�F���"��P�m/.m����Ii9hOk��	T{������������-V�UI����\��e?��T� ���k!�xz����4��5�"��=�����<�I�9�_l���e���S�Lzo����$�z���������z9���8i�Q�0�M]��h����nf�>-��������rv�9a�=m��u�~��`���,.��l����"$�e�2��i����%:oA��(�g�"��:��j�}I�������!��\�KG�5�B�9�`�G�mq���A������
�\j"��`=�����-���@��!K�F��0�U�U��
@��pK�C�%R:W��n�?`3�L�(�
��tK��IE	:~���&�"K�:�\���s�q��Gm�Dc�K��j��I� ��t�lQ���X�E�{��h8�=���|�J�5a~����%��o�����	#�d�����Nh��{@��8,�w@�a
Izt	��NC�8-�hH7=�R�N%��3&��\����nq��MY�����}tD�{�%�I���<VO��1�\v�>
�{
����v}���83D��s�'�����.��q+���s�]����t����-�t)�@�n���^�s��������aP�+.�t��N07z���u~(W�Q-�x�v��C����<DE�����g�k�D\Q�7j4���l������Z
�X.�	�4�$\�N9�����j%@�"��Q,���a�C��>��i0:�y7�}?��$�[Fw^�!F�Bs������Mv�@���t\�&�h��&q�R��T���d�,�t�c�~
�wc�3M�-jMJ�N�i�E��b��K�'���)Q01%JN�VJ���r@m��k�#.��i�������^8���S?8�,���C��D��0�8a�w9QK��u�D7D6���
{��� ��GT�s�2S�ws1��9Q��+��\j��p�1
���N9%*������&�b�Ly���<�D� ��*{� &"[������� �J*���� w���X����p`Q�kc��G�V�:x�!����.Od�$`�����Q���\��s�1��X�<��e�V>�hY6�%�A�-��9,[Tx��� ��d��T�P�F�,�~BT{��R��	L���:����,[�|(�'R����(�:H�'d�sohO�g/�L��N�V��|h���yt8��'YF�z�m��q���{�{�6�E�#�����@=�*v�?���������X����Q�]�S^��>��>L�_W�+@z:��uj��'���<���

�>d��Vu���>���n���D-=�*�>�EMB�H�~��?������Y������>��h���$�����"��-���Z\5�N�
�#��vs�^������������b����G�PK
B�Y�6PK28�JPA_test/head/14_2.outUX���Yx�vY��[]o�6}���[����$o�H�v��]�b�S�9B"��=[Y���^�"�(��VDQ
�$~���s������������go�x;;8��{]�V5c����>U������<_{��\�J�����O�:b����+���n�Si<���K������h���3�����s������UQ��G�������t,���u��6j�F�&�A��Z��Q��2&7*���-����|yS.W����������x]��&�[o	�[,��z�m���P��y8[�"�0�Js��?�h(���*V���y5K}��P�\��gQ#���d�����[��u��� ���u�W�K����3���i5+����B����&����NY���������-��2O�q��A��h���o.M����!�G���.`��=�~1��Sv^�����*����b��3�9�G]��*2���������]�!��{t7�w%ayr��r8�X]W��������D�c���|vr~=bgo~a������t���G������vA����wu�y�y>+����f���������4���O}>���o���>�����I����5�bbK�����Etl�$����t���x�)%���3�>(W�
�a��2p3��d����;Z\K��TB�0_����{D���h��e��P�:(����S��U��d��:�\*5b���r�w6����t�����C��m
���C�f�5
�;�8�u��&Z&�%h��}l.H4�%c�p.��&�����!��I�����1�@�6�E��Gc��]�a8�b:8w^�=V
4��^>wAI�:��#�9d��9����0@j���t�����[��W����v,����p��s����E�:WR5����N�1���A	�}P!��I"Q��szF��sia�����sN�s�uBi���>�t�9Y�����t���]~�����>D���s�z��;����d�0'�n2�������`�:0�=2GwQ��S�o��!�s0cV-~��*����j��4��
��R-��V�cZ��q����`���]���6*���B1d1b�T��c�)�:��A��+�����v��]���2\��>��D9:�v0�c7��I�6�HZqG�%�y���qz�tN�x�s�
MP�j�!����7A�-��5��Y��8��b�k���"Q��D��4�~�d��Rrf�8�U��Rn$B]�l��;��8=���s���9	��p�-�r���>:���;o�s�F�u&�6��F��e�)��	Ut?Dun|��L?6��6��s�M��C�e�si����p���:{�c���u.���	�����SL�	��s�^�`���2�+1�b���\��}S'���sO��':�u�)���yl�6[@q�r����s�h#
�%G�O�h�#��,�/��P@�S�'[,��\q0d�X���e�F*��\6��Ik�"������CPQ{.�:E4/�J?��M�T�����B���B4V��{7f����:rB�yid�yJ������8����p�k0��`0�fB�m�3������w�&H��z�k��q���w�;y6�E���:�nD��*���e5����5[����R���W^�>���d��AE��B�Zw1��.��	���+���&l����^�HW({�������i%�F�S`�EMB�H���.?������Y���9�6����h����V��?"D�ut�`�/o��U�
�6����{q_N������	I�p���z���8:8�PK�By�6PK68�JPA_test/head/14_3.outUX���Y�vY��[�o�6~�_��&!���#��@������Z���!��������Q)G�����"���)�<~��������/�����>{3�����o����mU3v8[n�SP�k������c����S���.��SG����w����my*�E��!���/�/�+���@��{Y-�y��dgWW������eh�
mv
K���E}]���j�5�����V?TyN���J�K��Ac�c��)��v^,��	�������zg�0�����h&���������v.Er#��b������tsyUl����j�� Q��\�0���d����9#�>��<,��i�R"��Z�H����\bif��rA�;�W�����!$�_��h7�Gv�B���{/��l�*�����'�E���^\2����y�=���
�K���=��b>/����o�{U����b�����GS������Z/�N��C���*)�0�X�����M�������Z]�>?e����X����NN��G���Ol�����^L�������n��vE����wu3y�i�(��x~����������d�S�F�/�>lJ��,.������E������\2�������i�M�;�q'��X��t�R��g��m�SP�`��P.�v�`�Yk��mI}�P��qD8��R ����q�a.=X9��s���t�p�$"� !�s	0a��s���p��j����0����Cx�"�\�4`N�d4���-))�D���W�����s������\g�;��bPK4��$Z����h!I>��TUY���h�X�M�{
��s�3��RxM��9m��:�B�tan2�
���x�Nh�1��D��t5�;��L8Wv*����pnz87$��J8'>C8�<]��(��e��|PK;��B������^
��$�����`W
�����\I1�m�	U�<����!:��������|8���|0����1�-�\%����sR-�c���n+H7
�����C9�9�K�js��K�c�Pn��E��|PP<�H�8�����K=L��`��+��s
��(�;;�]_���0/�E)��	�J��:��S)�.���R	8�,��$����hB������99g��sl�����C�����;lO�E�.��N"���\���Qp���|��Kt����+D���NT�.r�'���
��������A���g�!�#�2���sF	��q�0
1�Mf;���Ew>��;��B��O�4�h8��MYo%e����:�u���	��K�Zt�����f���R$:G�t���������qb<:�'B
z�d�s���C���9f:7S���.��!��G�!�Q��X��~�es�N(�i����`�O�j �2����87\/+�9�(&Q��A���k1�f5��q.I�5�����I���|<��O��	]���a6��uNMZ�l%B��e�ZPd�"�V��V6���Xc���D�P;�8�'B�'c��0����<���)E�S-]���y�%�<%���z��-���[���A�4|P����~�"Jt�������@gl���������w���f�\��:���^�H�n����i����.�_��byG�[�v�Oy����g��u������\�����OMx��y0������
O��
�	v>����5��ZzD����@���o���G�aS\��6�����/C�6���==i�t�&]�4\5�����jq�\D0\8�n7��}9��+�J��B'8y�����������PK76!���6PK,8�JPA_test/head/14_4.outUX���Yl�vY��[�o�6~�_��&!x��,��]7tE���x��ql�V�����H;�����Q��CYG�����h!�(�x��O�����A�^��@�.��J�����:E�8��I��A,���NA�[^U7��������LEU��`����LM������x������M���B�]_/��IU���_h���
�yR���K��x�i���:����)I`�K��SoyiS��/o��J��Nf���D`_��������.����eU�%��/l�i����m(VK�z(m����'CY��j�����Y���7Km�	��&i��s6��:�����N���h��}��	V������8���N�YQV���R ������8�%����Gx�X��<��+��y�6l.M2����I����k�-�����d:-�����&�����_Nfb>��~b2h����F)R;
��y7P��F���MyW0�/o��N/V7��*����xD�X�c�NN��G���Ob���W�^B���w��������Q�����x�i>+����z�������(xN�2�K�k�.�/���!�Ey�p��#�esa"x���H�zr�]�� A��j����mn�KJc���*��3��2�!��	������]	�+��B��#���\���
�;a0�c���[�Lq�}�0�y!�:�����\g�;>?��0�x�a��:r�0��^1?B~R�5����s���c2�s)>{�������c:��)f=����8�W�\���V&Jo��f������AK��I���c5�[�y$����tp��C��v�xI�i��P1�mF�� ��Uf��'���^���*/�Z�\B�v� .��Id�kU�B�� �2?I��3���������M����0f���Q�C��u��#�S�^�N$}-f����},0��9t�
Cu���Jl�?wN����z�������{@��P���A:h��:���C�7��W.�s�"���a��93?�����]B�^z�N�b
s���3i	�0�TX���~#qm����W[,O�N)H��@�������<tI�,�r
���-�])�:�tcVi�*Rd*�4������Q�Kn�8�q�
Z���`���F�s����0uiUE�a���k�R���x=bwj�������"{)i�;�Z�f�HZ���lVi�U0j(���&����%w�1���$	�6uN*y3b�?����n��s����~�b@R��bjOYUT���s�K��\(��|_�����1+�:����|�u�T>�";����$��CR�5"k!��}�9d��}���B7j�=kH�C=�\I0���:G�cN��:j��P�-60m�I $�l�����5Ok�BQ���X7���d�A`���6R;NR��M�5E5bM�#���Fl�(��b��g�����F:�:�sm�(`v@����Z
p-�8��B�4Qk��!5m����/���xpvM[8���8�J�;�|�zv&���D�h�`7��	5���Aa_�(0��cV6ur���������PP�����"2�Jz�pI�2q�~���q����Cy73����>��|?/�����7�����������f�:�;��=��O�����e9����5[My���v7���}������P�~K���:�����Q��>I^"J9cje�+�=�������x' ��-��I��?!~�.?���u�cs:({�e��R�-F@{���11��*u��d��/o��u����@��������*���O&��x���z�^�8:8�PK�
�6PKS8�JPA_test/head/18_1.outUX���Y��vY��[K��F����-c itu�s�'��^�bO���!F[�6����&E���(�t����fKl��z�GM��z���7�����o�>7������I�f�,O���v���)�8�P
�u����$������y�{��?�4�"�],��b���:�PY������Y��m������!5��c���m��?i��n{����q�>.���DR�BA7W\yk�9����l����n���*�k��.����������6_�>��e������|�,�|�Nwly�O���]������B~]�O�.y���� r#8g�AH�jm��(/�T���e�C
���-I��'�W��&}�K��+�.��S�%���.�H�&��!'Tdp-$j`����i�h�%�<��<�0�$v<����
*FP�'�o���vGf�i�)$�s���R��V
W�V
w�8�_�Jy���&������0�B�K��-���C�|�:{���;M�����^�e��e�mj]K0�i��#�2�BP�*�
+�SLc�_w�~��������v�'y���M�����fUs��#/#���Kr!�U������0H��Z�%�d���kJ���T�f��u�����]��m*'I����*N�.������.�NJn�}���?-�O'nCAF���DK���J�i������[H���Tn���(D�r�c����a8�^�>}J7�8��-	
���
���[*�M����4����o���X��U�1��|\.6�v8����K����E� �T�z�������,"���^f(��3���2c�g1@f������,c=��B��(���Qf��C�,E��2�Xf3@f�cP04s���2��2[���Qf;Mf��9���1�&�g��PO�aF>M�@[h�1vF���
;�������2w��!2���)��*s\��,�<kM��:8$?{V�i>c�qD9��+z$�u`g�9���z$W�l��2�� �����|�/�6����t���$V
a�B?Ay9���	�'	Yk�L1��eul������s��i��}O�z�/�R�*�"�m�xL��.���h[�����e�Az�=J{�����R0-�������%��"Y
%bWM������0�-��q����AS�C��K��:=�X�<P-�p�Z
���R8��x����	:�������lP�FN��2�,��Ltk�MpP����
��J�����"�44���
�?o�/e0���1�������^l�����L�g���Dk�:e��
PyI���	�H19�_�H������L-�1d�s���Pp���������al#
X������zB�S64������"!R�����)���,@�t�eN�S-D*�:�������h���*9g�����!�-D:B47UU��C�/���7����`y�0�iA2���m�\��@�>��`�6�����:9�m!��n��\���#R���F{��Fr-H.�$�b�@�����"��-|�����!��QH�T�[����=��p;E�J��|'���T�����IzGW+��K�A�e��`�h�V�q��4L��>�a$S��e�M�eFD����A�au�F
WT1�#�%I$G��t3"j�
�
��
6h���f�+^4B��l�Z��������[q�����d� ��*6�V�q� -0��'���
�f62���+G�V�3CW#j�
!i����������� ����H����[#�a�$h��b����&�1��]��pO �u�������c��K����]r�?�o�;>,6y�}���|��������.[���4�|H��������R�X��@NP*��5�_y:�	��[�'���8-�G7E1���6&P��h~j��^�������R{&�#p�F���),u�"(z}|F�g:��&#p��F���'�i@Y��N�+�)fd*�1gLv����h�!���}�w=�-�I�D���8c
$\t�����dDJ{�A�@�S)�8cb$\t��L7�C?OR����`�D�d��-	]��n�LH/yR�!i%3A|NfOF��9�pq�o]P#z)�">�?�]��)�8cb%\t��\6���R�!J�LY��H���A������}#%�r���	�7�_F��I�p�eOeB����)��P1m1�3���3ff�E�=����.}�Y�}�1%!�3;���	]�D]�
�H���c���-&�7#p�,N�����G
&�o�H�c|Z�|H~3Vg���	8�ta��O��qz�
b{2�3gL���.���JS#���q������]���q�4P�q^�
qS�B�{��������z��i����&y�%]���)�2�OT�	2=}}{|������PK�1w8?	�:PKH8�JPA_test/head/18_2.outUX���Y��vY��[[o�F~����:@;�3�1�I�v/�"��X����	��$:�M���^G�����|1G���;���&d�������O�����-{_r����qE���4'�z��[��r�	���:J�O��`�>��p���������+r�Z������6�5�	��U����Snl�m�e�x��
��������V��?	�u��_?<���U���H*�s�A(E���v�1��<^�M��)�/[yB��&_oH��O��;�
���,_m���:����bq��'��!��������S�����q{;��_W�c�'������J0���V�t�	��tZ��� x��Z6��5��?2�v 6��.��!��/�����c�O<��}��4/�
��G\��h.�*D)*�t�
a�E�<��#N��+j
�WXemqA��X���O�>��V)�N	49��+E�3JY)�5+/��uX+�Q7����O?�a�	�x������c�~::{�l�=�������^�e��%Kw��%�07T����>B�J������/P��k�s��<J�~�v���W=��l�]�����f�|s��#/C��K"r.�U��W��0H��u0K�#�Z�>%�v�����4����X7v�K#�L�$��<�c�!�2������0����t�������m#`��`�N
��#�e�S`e�7�4E��8Vu�Q���<U����a8�^�����}��P(V����'*�J�uT��z�2���|���XpV�M�!�����z�#�.p�S�%����X/���+Xn��+k��,3�d��r�����
�,b=��e�J+��ZPf�y���'C��eK��b��v��X��Y�2��������J/"��e�#|��X1�[Pf;Kf�0����ev���1�7$�s�y�����R����An��Ae|�c�������
��3���up�oh�g� ����S��gQ�36���Al1/���YRm��9��cz$�BK�����N�����|�?�����N�����!U�C��r�����$d�=�`�o}����m3U���~J��
�����)����c��@MV����?��\>�v�Z��:����#�Zy��MUJ���p�NK��6���$�������s�9']�aBW,m��B��#�;��������5�	�(������p�8U�,S�+�&������F�A�9�����c�����.���u�:r&4`vb���	�������ej(�yw�|)��xHw$��r����zA `��7o�W>��s_�a��
�g�0PS{\�E��y�bD���G��k[����,�bv�q
zI�E����o�Ht��F8U�*��D�x��`��Fz�+�B$;�d�Ha
���|k�a�rsk��-Kz�� R��9���
}>�0��f0����3�b�t������x��!qo#��46���Ha&"�Adb!"��4�#�sp*L�K��29�� ��^�L�u��^��� ��}A�"!�L�F<L��� ��7�L��_+�(�$`��B~�N���n�pv0�.���R��d��?�����4@�4H�N����b������E�4R�k��kJ����H*r��5�,���5@�5Hah�xy�+,��ng�o!�-�-���5@�5������c�{���b�nQzAD����A���VL�:�B a���L�G-W���6@�6H�sT�8�������d�AlV�C#��
!g���f�HX�h�����sF��U���B8C������I�����'�\��:�G�������S�����n�dC���ut����.O������;�����q����$Q�c������}mn������7�-(?���~%��#����V�M������m�V3��S���
����$./E�������1.zpr�`T�A:���C
�����|��1�.��)�4��I����(mH^�d9&����p��SM�n��0�Q����i��=��m����>�u��p�"D*{���m�g3"p��H���	 �i��&)q*-������<��1[.�����is�0yR��\Ey��g�'p�J�8��N~;D�T~k������L�3+���d���/-N`�9����D�x�=tK���+�7��/U|
M�
'�����1	.���LH�
s2���t��Z4����3ff�E�=�����6CDM��s���p��8c�&\���P��9L�T�d�P>s��	8c'\��[��0A;D�T�����`�\Vg���	=8�ta�7L��85����~�g���	}~������Q������%�9K��i�p�����7��$��/[L��yy�o=�P���y�%Y��)�2�i���
�__��b����PKXA�X	�:PK@8�JPA_test/head/18_3.outUX���Y��vY��[Yo��~����:@;�3�p��M���HE�U&l�Z��M��{���CQ"����9�F<���72!�^����������?��\r���� o�u�r����k��r�	���:J�W��`�n���#�������r�����urm�BQ�����N�f�j�}������&��t�X�%��������n��-���DR`�B)
��j�����l�4�	�-[q�B����Wd��Mv�|��d�l�z������o��[�%Y��v������>$��>�/7��
�~��';�6���i�#�g�j[���9���NyJ
�1w�Rt���~����=y�Zl6����>�fq�,�;?�!�[���9ar�����B��\

T��� �F�����G��$�Ox�x���������b���>lwqd6J ���B�3���R��V���RPA�$n�	�W.\�j���?|���'�m��o��>���>���d���h�=�����\�������#G
�z�W�_�
V����T�_�V�(w�,�W!�7�
���T�&�t��Y���Y�������!r��%9��*�����Q$W�Z��PLS�����FP,env�Q����9�}M�$��,�a�!�"�~���O@`6�����~X����
be�/|N
��#�y�SL:j�rk������Tl����E�r�2��w�0�����������U��F���'*�R��T����0d`�a���o$���X����|$��
�n�z*�dRU���z�����*i)��;�9�Xf����2��-�il%��2�X��|=+�g��zF�e��2K�2��Z�)��e����d$��Qf�l��z�����d6��z�����ja��y�N��[_�%|N�p�����[�g�u�&�Y�������2�u�}��>o����Q��:8@f����I���!���|�w��R+aF�;�����k����Y�!��(��yCyC�9c0���z$����(s�e�g���PR��)>:V�z��0T!Na���o�'?I�Z{"�o����`�D�:,�lE$�h�{�W�ly��+P�����.�<����me�����}���F\+NMEWJN�,p�N��2���P$����X<���s��$&��2��;@��[UgJ������M��	�(|����R8��/\q=���P>-2�f �����}Z�rL��5�f8JSq��)t�f� N��W�Q��P�qs�|*���O7w$��tS?����@�`����<U�}�"�����@Aj���=X��&��!�$A��S�j��A?�\��]x�i�-[���"	�����
7���O���`���|y"�B$#D�Xl
J�����)D��m�eN�S-D*�:'0@��>�s���ao�-~�
�n!�"U�=O8�g#GZI���-0�Pa""�Bdb!"�j�4;GF�<�(A��:9�m!��^�����H!"a�R*l�3�����d���a�L��w�Qn#���T�%fC�)��_���f��9�{2��@8*\Sb��c���Z]�]�D�W�j�=��w;����`�h�V�q� ��*8�G7�7����T<�2#�V�q� ������^���bH��-�����4@�4p�4���c�{U&�O
��_���3"j5
7
Z��b����8��YT�G�W���5@�5H�6���������|r��&�W�N����5�� "��,����T�����sF��=*�[��p�V�I�rU1iSYRb83m��d1�nU5���g??::������U������������a������5�B����jsX'�t9
�{�y����On����s�*z����&��G�����q�����A!�6=qV(�n�b�!mL`*�Jlp�mlq������S��#p��E�����/xP����<N�,��Hg�������oR���(p���H�9���3�:�ENe4���)���	��5����6&@�E��:��L>������>#!2gL������fp�gIx>�q����&�3&K�E�=�2�m�����qZ�����1�.���jD/�R���'�!�7�R�3&V�EN�dS?O�,9N0�Q�M�g"�2g�.���?�
�^�����7i�����s0������&�W�(�����2�3���3ff�E�=�?���6}DMiO���p���q�|M������I�I�����%VALO�oF��Y�p��o����m�S�Ofi��;���3�v�EN-]���S=���
�.���T��1�.����+M���<o�?[8#����@�"�y>+������>�`��U���?�I��,Y�/��O����(M��aV{�G1��e9�����PK��{�O	�:PKC8�JPA_test/head/18_4.outUX���Y��vY��[�n�F}�W�-2�4���*����^���F��'avDH��b�p6�~�V��bs�!�p�/b�L�uNW���J�E������?��o_�m��&W?��W��l��Ir����B8�GJ!�����������8O~�����^$��u~Zm�<���Hm�@`�8wL;�N�����������I�K�[m������������Cz����Ie�a�R\yi�9���9��mm3gl+��������d�K�[�
����|��t����O�(w��d}:��mz`�[��~��-}�����*H���p?��@T����i[���9��H�Q>Goi���zh(�����1y�Y�v��M"������[?���C|�:�Y���C��*Zp-$j`����i�h�%�<��kL��X�\��T^,����d��4}������ 
:����98O����H����,/ �}���(��&������0~#�����\��)[?������LS����|�r�����v
�rA=���b�Ji@��������k�s�[�q?�w������e��vi�����M�������r.I���V�?�"�� ���`��5�"�������JD�����n�b�8Ft��I�/yz���l���sv|�I��|�W���iu|x�6��a���h�{�^F�B
OJH"��t���TNK]y&�9O���w�m8���W�>����rv�@S�����
��@�4	g�1��9�[)��6�t�|L?'��]��N��r�������n��Cf{�w�	]EP]@��m��B]n��)JhfA-h3�<��m���T��,c���l-3jI�Ul3�����B��[��ul����	Z�m6��zD�0>n ��vA��<�����Ped�������g:��:��x��S|�"A/hs��Q��f��,�AX���<8�fT�3���)���q�*���l���#�3Jo32����8����WY���t�XqS#)�k$��qI��A`��p�?�.A)��S�t��M'V7aD!5a��Q��w@�I��=�{0M=o�,��`�L
_���i~�>�3�5��U�~H��������>������]2����������EvO����J��z��T{����z�,��F��Vy8��:�5L���P���\��K�_��N�^�~�1�8��Nc����-�S�����tF��hA�#��~F�6����VP0��HW���q/)q��eV�fm6�0kw�2�Z�n�?���/�f:��}��}�m�Q_�h��7o7����R�%@�(��)s�*�����9S("���#D"B$(5�5�g��g/�P%��n�hy���Ma�kD]�q-�F
 �T9:m�}��������"!R����<�Kc�"*,�.,�)Kz�� R��Q���Y��C��7��}D����{���"!Ju�#����52>2P&nD�b
�y�0�� 2�"��4�G�Y����Q��49�� ��^�
|�I��2i�S��k�:�\�Hr�4���F��ADuYQ4�
~��� 
)�W��M=p;��@h(�������R�H5Q�h��h �`�F�phZj���`,�H�����f*d��$���r"�SD�� �$��]��plrT�+�u�����Tv�|�Z�o��2���^Q�j��j�*���>z]�"3# ���KI�) .����1O�*���_$���X�/]��S6��"M/9����7fhF��U�����KW�	�3�:�t�,�����^��5{�
�����]�����m��p�\O�����O�]��__$?&�!��nnv�mz���@| &��v���.�������ADOu�g�	*A��&�+	�_t��-#�izf�P�
���l���T<�(N���i��A�����[
7\��\�b�X�=8���QIoH�(q:#�o�gL����o=�N!jP�(q*�~���z�T9&����p��Sj�u�����o�4:�����
H8��['�
T�AA��)��`O�UD&��u�p��@2jC2I��Re�;['��3VK�A�zR5dZ�aX<)q
�FJ�l�d�XC	g�6���%�*�j�BD(���T&����p��Sr������O�|�����Z����[�A_^����P}��[j�l�h��_&��E�p��������&#\����	�������2����o�	�fH����2u�.�g+5p�zM8�[O����g��*��p������������QMP��:%N��xlk�������zpj���oX��p*������3g,���>���J+;�?������@�t,�3���A��rU��o��	���?��=��^t`�x���7_��)����+��!����k[������PK\��V	�:PK08�JPA_test/head/1_1.outUX���Ys�vY���Qo�6���)��X�HIc�m,�C��@��X�,������;I�dG��6L[�x��_�;��A������no���lr5�6a7�*.��g�b������/�����L�/��X�@�%����6JX��L�q�=�j��QI����rB\�~b�]�FI�-��y�]_?>��1*B�	����M���	u��f���S��i���zB��!�6O�J���*n���n���gy/A�v��k�/��_�'��[5�N\�C(��b�>o����r�CXe9������������e����@}��:��R����f�K��1��B�a�I�������n�m:_��X�u�EL��m��T������Q)��N���F��z���u{����:�����\�98����{�YM80���Q�87��+�����6gyy�Q�������~�,K��;�UZ�Jraw�
#-��c)���Ns)w�Z9	�;����8)B>e��f��X~��7	����fBL��������d�&�����E�[���g�#s������kcJr���+�#��j����k��c�s#��k�7^%�0x�	�R��������[�!F�W=^��k�r��ZI����J�H#M��;^����5 ��k������J����xiE�~L�^ER3^!����)+l�k\�z�/�n��l�"����K��0xm����[�����WS��@nmf����n7^w"8 �j�+�O�JWx-WGx�3^�a ��.^k�l�p�o���*�8���3��dTt�<H���K���P��������b�x�U������]�9��W��n�k���9�(�Jm�)+���M;�E�W����R/M%�?r�[�7L��'�6_J�}�@Iv�2�ET���e��$�1���������M��A_Y�^���V�G�95Pp�VmNx���G:��]��*P{�Z�w�Uz�����,����QM����x���������/��/��U���3�]�����k����(�wq�X�9e��r��L���|[��p}H�i�����=��L�PK���5PKJ8�JPA_test/head/1_2.outUX���Y��vY���mo�6���S�]`!��G��h;`��%��
4�����+���O��lKJ�(. `a��e�w?�y��������O�����a��&W#iv�����y�-f�I��caY�}���/��L*{�.�y��V��0S^'������{%Y����	q����wq%������ns�\�a��4����M0-�)�����NY�!.��@��!�.O�J���/���"*v��F�G�Y�K������am�q�n����V
<���}�*[L��]<�V�}�,�1�x��y����Q�
y����:�����Q���u�����,���&Q��^���&���U�K�n�����_����Ju�����sT��KsP)+��>e���������MHG'I���t�5=��{���^���<s�zi�f@�`�Py�\>��'!J����������e����*�[%8�aP0��~K%������	��a�����"�Sv�<lW�fQ���{#HW@/�����tZ�O�d�!�h����"�-K��#�����&��a���Z[`��b���,���W�����%^�\zQ�E�����W�P��D�����}����S41B6�5$�1�F�Ww��E�t�W+�}�-����[_�������x��a���z����k@)��Wa������)~�xA��v�RX���R����Txp#M���z8j�4����]�H�g��Qv����2s�Hi���fF�
����D��W�>�*SE��o����Z�^��K��h�I�c�ri��R��[��;1�8L� ����Vbf��}������]��y�c���	�[�! �����/��%^Z
�nb��>�(�
m�)+���M;���+��|7e��)s�Z�A�zi�
{����f�Q��%��UmVz����z����jo������Gn���:�WV�<�9(lb/��gF
�U���k�t�����*��\�:s�@_����n���k3'��++��w�������Jxa��{�>��H��G��e*�i�q!��R��(�tY}9eBq�4[o'��?a�+brxIy�R�d3�����r2�PKD���8PK38�JPA_test/head/1_3.outUX���Yz�vY���mo�F���S��&�r ��w�2 �X:d	�����l!���������m��%l:�Y:�w���Y�A�����$n��~��br1�1��*)�8�����x�@���p�N���%�/��\^*���Y4+�Q*�d_�`Q�TAK��J�|����r����,J�o�x_����bQ�����L ���{�82N��������T�I'e��@��q�-���h���S��M��M{������%~o�z�~����~�9a�n�����>��2�O��m2{���}�Wy�sh|���5����rGl^��q�^����LTs�sv�c^<��F��Q��l����u��f��/Vn�DE��y�D���(5G^��N�*JA*{'����U���8p����j�����I�\~��'c<6�ON���}�$_��?�Z������c_���E��4N�]�Y���L��#����F	��,��o�r��u�`SF{����s7�%iSq��o��z^����
rh\�����7�i5?/�j����-k���<�_=�-�+n���;��?�68�
����x�I�x�z�d���5
^
0n��0x����� {���r}xM��Y���F�5^�f����Io�F#�����!Hg��|x�T�_�Z���������}xU���J�����c�8^����.!��k}_���p
LZ���u�G����a�R/�mj�d�R�����l��Qj�� ����'��qW�F/�����U2�:�k����QEo��%+��k��=�P��][T9��3G�T]��fV9�W�i��
/	�����0���v�6������K5^�~��-^l�	F�����]���PTx���w�+�@j�G���Nm��0��'��������=xYI����f
���?f�&�a�v�6[��4��y��V8���Im�Kn�����C�k�$�vK�B�8��Y���Z<�5�h����@����H��ma��W/V�ZS��x�G�W�3����i�&���p`J'6y9�Uo�-��e�E�-�Nj��%_m&��?�l[&���/�2$�������>��PK���3PKG8�JPA_test/head/1_4.outUX���Y��vY���oo�F���S��K�f5��f�j*���Uj�JU�{Q��(|�z��;`HpHNBjYY26^���0�b���o�?������=���Md��e�JJ!N��<7^9�`�<l�E�es��KrW.��v��$Z��(e���uP6(��g5���4���~:c.g?	�.��4���E�]_���}T�C!����vB�c!�F}n�k�u.�$��2^��Lo����_it���]��M��M{�*���$~�z)~����>ks$���O<B��}��e~7����aS'�!^��C���O�T����e\t��4T�� �~�F���U3�9��1/�b#��(�b�^
��2�f��������L8�_����JM'�z������a�T�����~&��u{<���:��I�rAJ���Mz��SA5��>I�g)�T��V��5���W����E��4N�M�Y�,�L�������F	�?)X$�{8R�����)�=*��8w�]��q1'��f���*��x.� K��e��~������%Y�96���e���g���x���k�9�����_�Z���U}���s�Fx�b�d<K�`�x5�q��#��,��'��:]�7�Th�k�����5��5=��Hz��5i�5�z-J��x��+��W�V�8xm_�dh�Z�:�P���T�S�W����p$�����������!�V���z�7x�������8x����s�K�%;��9�������f']�����)�Kj�^tC���E�;X��&L/5�zC/Y�-^���[3v�]�������q���=��u��U�3]/m<�kv��O���q���6�����?��|�5\��u��������V����[0-^����R�%�T�8���6T~���G������36�C���`�ks���?e�&�q���6H���9���
_?�a�@����s�z�����<0+�>��0Tjc��JrM[a���k�6#�~�F
�o��W^�d��G�T�V�����������i���tO����L��B�v�
�)��?�$����4�J*��������L8��.!�������t6�PK��&�5PK.8�JPA_test/head/21_1.outUX���Yo�vY��]�o�F�_�����bg��8p����4�%(�$�2�%U���3��!.ISy��-�^rv~�=C9�^������D��}�N���ZW����|E�f����9���LJ����f����?����[���W���y��v����(��cVc�kk��8���_]_!�^E�W�~n�|qC��o���������m���W���~x���_���z�����,�Oqe�/��au�V����x��&��?�6x9��[�t?mV���/_6���.��K��)$�]�X��W�&�9�4]	
�y�WP\�G�N~H�0�t
/S:L��xX'@5ZE�jk���y>{<��[���T��!�����=��-w����2c�VBq����`Q���R��H�����z�*��E$�.�(:H���	�8j0V
�!��:Q�&m��4�=��*Q��D����j�R���1��:����&����AT����<��m�q1].cq�����������������s���,��_��|q�Q�H����Z�2�n�Fx��l���������%����b����x�o�m���G^79����.����J��t��}%w��?������'��������FByc�q��*���������Z{KlpI��S�2�r���=B���;�t8e��-[��XZ��n����8�h��J�iL�s����AURqy��2�����;q��V��8�L�x���{�y6]F�e��/��.~�p�{px
��H`�f1�-D����m����X^�rL6�,��w�
�rLf�'�����w��KvW�h5,
Q���������������i^�A���Ah7,�p5�����4��F1���Z\���8(�A8`���q���% �$(����� }0`��A*�����>@2��-�A�><\��l��vX��4&�<������?��`+w)�H�.#���&���%?
�"�0gt�O[~A8����#l�]2���]�-8P��UIW�%�.'nu��aq���q����%��c��]�?(9v�[��8��C�9�[
�-8@��G��x�k.����$��a��-3���Lpt��0L���8<������&�V�)^�l��,j/f�%��4-��@$5���h��,�p���!���f������bp�������[�kL���
�5B��1M�g�$��0�F�f��B�6�H��,|�?�t^^2��a��$�f�Vp����6m��J|5lKU&��R�@�d"����V3�U����1��o���e0���(��e�9�o��_�,r<�${r�R�;��F���7���x����Ha0�f��=���I�����K|i���
�2������G�B�(������euXNw����U��E%���oHW��d�$��������������m={�nz�Q�.�3Zo�"V���������d,�����Xg1����}��J�
;���9�����6M����-��5[?���	��,V�GT��+��V���x	��/���9&�������&t�C��"�2(��00pIs��O��cQ��K��"#���x����+/��ZTO=�)��^��~^�,7�Rj��,�{W5�m{��,~���,�]��
*4����0���y���cT��d�,�U�y|�L�"���^���>��-���Eud�H�
��k�M#����4����X���K"���>����V���q3���n��.-�+"]�[`�
�jL]K��p�+�;��eh��D�
���������+P��ZX��.q����H��Jxf|N�lMi�U���-���Ws�RT;�D��eY7�M�5r�F0�Pl��xX��nkF��y"0�������f�J�z�k���j_IO����q�V��PT��0��E�Q�i&�ZI��J"�c� 
5�Z����$0xY��}EZH��%���Z�3��C���Yb(0
�������;���N�-����E��))
�Y��[�=���?�]�<N'&��l�;�u[�d/c�-�'�Z'�cXUJ,o	�Hf4Zfm��0|����V+�
X�?��L4��i7<�UL�*m��wH\j��J2c���m>U&�n�
Rl�5,��O��W���EA�I�5>���t�e;t~K�d)Ks����R�7�Y�f��Hg>���y;_~�����L�A{�V��
�Lf��fZyZ��&�N�5]�����
8����,���v7�=o�������X��K;6"�F���K�q#6�a#Cl��M�506*�F��t\c�Cl��M�506&����t\ccCl��M�506.����t\c�Cl��M�5,6�l������6�jI�Au�q�2�`��<rn��+���C��d�=1������D6	�[j�N'����I?��c�)P1����_C��N�������Z
�
���t���|]+��up/8����~g��Q����/'A����q�Mw-q�����$��eck'�&$mG�z�M�r���ot<n�w���7���IP�;��q���#n=�f��9	����7���f��g�l8�$'A�����ozL�a�6�l�IPK�b%�as��#l=���y59	��]�d=lV�'w����pVON�
��F���~��a��s�rV'�gT��M[�`�I��M�3��`F���Op~��r��1��7�������QInn��o�T8�+fs;�%��n��7)�K������dy0���tJD����5��7�d�����vR���J��p��F����<m.���+�G�.ZL���G�����������A=���*��$����HS�DO�����x��|�B�+��Q�-�����z\_]�PK�q�%P^gPK88�JPA_test/head/21_2.outUX���Y��vY��]�o�F�_�����bg��8p����4�%(�$�2�%W�����R���(���C,�Zrv~�=C9��]������D��}�F����YW����|E�f����9���LJ����z����?����[���W���e�������(m�eBr�����6q-V�g|u}��{�}}^������
���a��[�3�vC�}Yn_���:������6/���y�f��r�W��oV�7�v���C�O�7	���O�5^N��7�:�O������/���t���s���4��S:�k�M�s�9h���6����2�|�Na���6��8�@�g��Hh������|�����^�Q���C�����{�)�[n��?V�e����^1/
���{�,��e�a������B��)E$����E���8�������m�^9Q�&m��4�>��*Q��D�MU%��O"��B1�v�.��pG������z}\L��B����e9{8j+�����!�?�#K|��y�/�l���k��L��&o�GQJ�";�%]	�oWK<����d��������h!J���nr���m��+��Hs���>J��{��?���O�����m�FByc�q��*�����+d6*�T���2���4��6(��Ksg����z����JW�S����E<�������}��&
9��R��|.r@{�v<�J*n�hw�=3(B|/N��*����b/P�~�>���h���e<��ON~�O�I(���~!U3���z��C��
����Y<��S4��1���0���g���.�]=x���(DaR�Z>c
��f@#����e��v����nX$� j��1.
�o�i }4�B������Rpp�w������"d��5�9�� �i�K@>Iz���9x��fX4��B}��*��40�y���������`���i���)lh�K�c
���B��K8�L�}�O��`	����%?m��`��p�5v��<M��n��P�j5'W�%�.'nu��aq���q@�B	��%8�/g��
���������p�!�e��V��!?�9����AXs98h5�>$yWn�q�5`�DNt)��*}9�?���4�����ZJ�
�f�$Y&��P�
�qA��zX ��jRj/XV�# �k+m$���-X)�p9@n�����Z{�t������	�?eT)�S�c���0,IN
aR��l�#����}Bcn
g
���yy����[��$��0���3W�^�V0M�9�Q�� �4���H�j����g&���=4��?yL?)8��sYc��s�)��s��=9�RL=��g0qW������&B#�����v��[�&
�_7�/�M��k7�����?h�K�
����y�d��(�,k�r����U
����&*~N@�������`���D��w?~�����w77�=����n��
���[���o����s�s��d,�����XD��PF��G1�y�lV��UFv��QSf�YG�fyHT��v��-�����	��,V�GT��+�V���x	��/���9$�������&t�}��"�1(�S�`����������T��R)��e�V��)?x���M)-8����x]�k�y����K��R�l�]��c��N��M#���L'ZP��[�4�>�Q��<|Q��hm$�bI������x��'tS��E�Q<m!//��
D��,��B:��7��������m1j`�K"���>O���V�����*����u��yi!]�2 �
���I���k�WE5��Fz�eh��D�
��VVI B[�@��;�h�$c����H��Jxf|N�l�5��Z`6��i�5�����v%��yY�
o�u����-s%�&�����t[#0*���q->��g�����������6����9	��q�V�A���8IMi�����L:��D��D\��D����MM�8zTS�
��S��"�B:$.��l�2�D��7��CB;z1�J[``������:�J����l,�S�^����iO�i��O|Gz�0������p4K
������Z������_�8���`iw4r�J��X��Z-#S4@�K����X3�.1����H��a�;$.�Zy%�1s�����&�#�S��D
\������b�NM[T���LJ���u�]���,Y��\$�g��G�N��5��D:���(%�/���K�<���df�L�#����P�dVVo��������5]{��j��
8����(���f;��l����e���W��K;6"�F��t\c#Cl��M�506*�F��t\c�Cl��M�506&����t\ccCl��M�506.����t\c�Cl��M�5,6�l������6�jI�Au��qy��H����
�Z��}�!�N������PTS�=��M#��P]�8��x����t,2*F#y�+cHR�	Z�A�jw���d��u':]{��8_�
�i���	N��ut��*xj"���IP���t�@p�]KG���&�!9	*x���	�	�A���qS����v2���(��h'��M��+rT���o�he��[���pvGN������;��a�6�-�IP0<�LzL�a�6�l�IPK�b%}
�����3Z�������$(3v1���Yi0}a�6������.F�6C���+t��l�)�IX�t�T��
���o�D8�)�f4��7��Uc��7n2�O�{�����	n�3A9nn��o�T8�+�fs;�%�>�����%��q��\���K��[:�
���Q�������	g���Lvw;�R�*_��c�6��
��isY7m^�#
uQ�I�SI��/(������������?Z@O���@�����*z�5��$�����(�P�����W������PK	3�8OWgPKP8�JPA_test/head/21_3.outUX���Y��vY����o�F���W��������5��4m�.
r	��=	�L��eI��k�����S\��%�
��Z���p~;�3�#9��������D��}�F����W����2��W��>�5F�-����u���������}�p��u�j�H���(Y>��JX�P�	�^c�$J��f��W�W����G���n����T4�0��������>��W���~x���_���v�Z�;����Oqm�/q�����������!�o�^�_���/����r�{��v�����/���<������d���J��K-QF���������2����x���<�r�C9�j�����KF��xh�@�
Z�sD�0�?/��L,��J����}�G�Q��n�,�l��\R0�� ��
`��������N��$V��
h��&���D8����q	E��I�I��h�����&3����y���BQ�
��!�.�i�j\�L������c43/*�����>�����1.�����������:}�<�?D���`���^������L4:��6P��*f����{�%^���5>����l������E���U������\��x�����,�]�o�G�N��^�>r�	���h�};�.���.X��B��X���l4hZ�'V8�i�)��"��h��rB���N#���S����G<M�����x}�+��[nF�,�
�i��tP_�8�����#2��$3��SWi]����|��Wh]�G��u�YG��:^&����}��Gq@���$F �����vj�'��]j���Q���8f�x/�����8f��F\���w��K~WVI1.�f
��)Ur��9��A�6P� ��p`����;��0@f%;8H�C� �`���b8M{�����A8<���91En"p�w���3�hYp��\�=(P�r����,��b����%�!�(�C�p���9����� �Dj��
��KfS������p��r���
9`Fh+�����9�x	S7.Ee����8����8���	���%yd�=���_��r�V#-��uL�AK�+�����V8{9~�jiG��q�!��VE�Q���
���p�b\{�y
W�I^�t ����c��-/W!/g���w��4�n���!�V*��3���i�$--�Yz�qA��Rj� ��1�J�)������5����ZZMd��$����i[�!0�����9�� |N
aR���r�@�s��������#�������A0��-n�O�!��5��T�W�UG�j@g�EiZ^
tMjd>��zZM�.�v�SF��Q49\��O
�����`�ds��TE�9�����`��Z#������c��o"tR��'���B�i����o")M[�A_%4��t��/Q1�:-����4Q�Y���<Yn�u�ph�CT�&�Z@�|�0��}It}����o�}ws��v�0�
t�6S`g�������2y�o��c�O���08X#�h��wT�Hp�M�k#����b�5�)&FKY<�'����-��@;}M��?tW��7����4���}����\�B�����k�dy�u�c�i���,$+z*x���p-m��hp��>�8��^�a��nR������}�e��"dp}��uu�����n�s)��UWo��#mp��ok�x�J\��6(sg���!�Q���|Q��H�8�-����]x�B�(��)b��i��xyV�Wp�sL�U!����Ut!��*��!D9E��u���MZ�E�T�e�������1��u5�{�]8�y �f��<\��9��|E�FP�����O{�H'��.QB�A�r��R
�8SN�c�r��@t�,n���3K�����sk*S`�.���JV?�t�[Z]��v�u\�g��P:%���QE7Nt��`D���c:N���%�����S��k?c�C1�NBasTw�w�st��n��L�C�w�Ew�H�~���n1#,�=_��bT�k��o�k7�!*"����h�v���L�!h|�.�p���SR�2FV{��6�����^��c��^R�1:�<:N��,��O�~�w�GT������0�2��Snd��!���o�����;E���chW�����j���?��~G���]4�U��T�rX����q&�V�9�s���+�n��.�B�kO���K�+��p�+��S����v�MK5,�?�������M�O��)K{�Hs�<�Ok����������?���/�����%33�e��*��&-):�e������SeM�Af�9�������(�]q��������Zk+���t�a!v
��1�����
���#�!1��9Ff#C6rb�s��F�l���������M�12�1��cd66dc'6=��l�l����M��M�ZR0��_��3�-]�1(��������w:���Rh����J<�$ayw�����K�?;��/(�Yd
L��d��\!��eY�������/t:lmu�������/[8�����~g���������*��sFU+������&>*xy��	�� ��m`n"�����O���/��P3������}������URh9q��
{w�,�
���Y�
U�����o����a7�|xh1�v�606�l�YPK��%��0J�	���l���gA����l�����m�6,6�a������r
���+t���}�|V'
=���L�����=���G������r�j1��Cs�a*?�O���q�hp�����7����KRnp��3a�����d������sK?��v�#�V��647�d�����~R��D��IQ��&n���w���n��Qh�U-�Zr�����[���[���ey������n"a�=���w_����~��WJX��X�+���>�v}u�?PKh��gMWgPK)8�JPA_test/head/21_4.outUX���Ye�vY��]�o�F�����.�}�>�s'M��K�\��p�U&b���J�5��o�oqI���+ p���%g�7��Q4�����>�'������7�.^�����/��(z������i��gJ1V����?v�"�cy���V�2z5_����h�|����V*��f,�}���h��<���d������f;�
EyC%�o�������]�m���W���~x���_�����jo�n��?��=������U��<������]B�/��f��S��M��N��v��x���6�2���9y����s�?��������s�p�t%4��m=^Iq#U:�!�����yB��$d Z'�n%���3�6���rq���^WQ���C�����{�)�Y���?6�u��Rs���%���q�����=�?�i�Q�}J��q���E�1P(:@���e���N6�Ie?��w��N7)Qo��D�&�,q�1�-�b`39��9+�D�����}��EW��:F����|?Z/���J��y���>����
_,�E����L+�9iAI@���������z�%]	�o7k<���`��f{���E��JV�y���?.W�xKW�+������}��){��?���O���E%���������q,��������DE0hUuf�
.��t
!��|n�w]�i�+�)��u�"����7����6ggd���Z;�L�s����xP�T����j����Lw�Cy'O��:����j�P�~�>/��h��V�u���3N~O�BXoY�<���>���2�r��{�2��T��6^����-E����#.z�����%�����(�p@�gL����� T
�O�
��	n\� ��1.K���iAY:�+ u}����Y������pP�X�t�����.%8�	f������f\�p��>��SED@���C�>(?��K���
�C8@�`����6���H0��J�@*t68X!���&��
�+~Z�.�`9���x��3�����`��A�[�K��]����W��u��:�j\<��B,0Q��v�@	
�WQ������c��?*�>���j�j8&��9^�i���z\}"��x�r��+����A1�u�j8���Q�4�n���c�W*��2�+�Qj��g�h��"��ECJ�%���s]��$�������|��4%)�sj��A����s���')s9}>�������"L�Q��-}��\�O;���G�����OBb�?nqC$I��j+9s��mU�4wH]FMBX8 �4���H�jQO�Y��s�+��K���������;�7�\���^�,r����� ���_����p����>���"4R���w1���A��n�%��\��A_&4��P_�V8f���g&�DIgU���r��k@1���Yo���Q�(?�����u2C]G�������w�]]�����o�Q�*�t��E�~��?���c�/gy�d�:��I������2�1y�=��d��QS�yG�fyHT��v����ot��SoV��=*i��}����\������������YKv��P-@3x�l���t�e4p��b��V]i�5X��+_g�/������V8�0��h�X������u�
3�0I�ZUM���z����Y���cYB�=��J�.i��FIj��E-3��(����y|�N<9)�+ ���(�v����y"])�������n(4������B[����Ho��/�`b���5�n-�0IT�n�0/�k"]�[	���f��#_!i�4�
*[� �u��$�*Y(��i\Lq��+���������g��9��%��&�L��P�"���joW�z��HG{Q�u�;eS,�o��$�`R<.��H�
�Y�'\G���9@&��Wel�������7����9)
�����Cj
�{!xeQ�i'��H��G"�{�d�B���*"���Un�0DE��t��$��)�����������h���"Ff{�N�i2EVa"��v��S�����ed�Z/��Q��0��H�?�8T�P����!��=q���8�]�b�5�cxG�G5xZW�c�����x�z��k�'���X;�iG��*[`\�.�Z�������HF
��S=9&��+uU�����6+3�u((=yDz*���OgS�C����,y��^$�i�4���5�c"��DTT���r�%J�z�Q23K���[��j�2����i�>-��������|s���8����(�]q����vIm���X[��w��b#O�&�`�&l��F���	��kdlt������F�Bl`���bc&lz����!6v���b�&lz����!6~����l���	��56�jI�A}��q�<�H���[���n>�~�C	��@�|(���#$U��i����	�\m.����E�@�h$��~�IJ;A+;�ZmV��(@�h\��u�k���Z)7�����~^G���@�BM�}|5�ty��������N���M�CjT����p���n���	5�d��;7�w��7�W�,�����q�������pvG�������;���`6�-�YP0�c&����A�l����-5j�}�d3lN8r�m`�|8��fA����l��*������	���YP�<�H���Y�l�`�����������
��b�I��M�3��`F���	������i���R��qS�|�:�O�����Q�&���M����`6�wX"D��8�MI������A8�������>�,d����5�7�d�����v2m<�z����'�z��>m����kD�)Z4i�hj����7�����b.�E�GH��W��+=�.�w_��S��
����N��3�����H�����PK�^��OWgPK78�JPA_test/head/3_1.outUX���Y��vY����r7�������B����U�T���=8.��kk�\5%���B7�>�6��8��LsR"�����C�G7b���������x;�������x3�Mr!���U~�H��&HY�� ���WW�������
��8���h*��,��@(nW^���t�����g���;!>,����z��$����H����g�P?[��g'+������8_�g����<��������O�y6������q&�"�������z��
q!�>~=Y��y��M�������u�\�_|*~����������K�/�/���lt�*����K~7��^�M"����h:�_&~\.��/nn��
��$��� 1�;V��9�`�K�GE(-aj�����`��x��t�N����s������=s_,'����D?���l���7��M��PK��>�����K�6���)�5�o�m,c��v�/����x7���������z>���`����2��+v�*���D4;��^o�>��������I"�]��&��>G����������)�S������;(�b���#�.�GaHVk������RH~[O�w[b���2��6M�.&�p�+��@�HR�c�F-GIA|�C�SmO���+~��sw�e�-�v����~���9���q�8�J����.��l����I������z�a�����$������JG����1K2��duw)����1������=/�%N%�%{�G �yHn��r������b��,���4���o��x4����<9Rm�	p��u����5F�j|9�`�}��'{=��x�	Q�G��x�S�^(s�G/��7<�=��x�����Z���z$�������6u��+	:������z�u��:�5����������x���	x��P�8�m����=u^��P��E�;��[R@6�Q�H;�����z����:����}���*�^��N�h��]�����u�y�P�]���,�Zxe0�>]�:p��l�0���;Pn(��M��E�0�T��X��"b����Z{��=�P���q�ZMM�sz�#S����:r����z����o��Nm��J��G�}�NXD�F�����	D�8X�M��F���6}��/bNc��z5�t����z(��vn���U�����c��y��aM������WCQ���b��j_��z\M5��a}��9�&8������������7K�����nfI'���`���M-yi��Q{����y]U
_'���;CQo��A�������
��k�����|���v����z�d�"G����6�~)�$��
���wz�!��vn���)������@�o_��H�u����%)����@�iB��R��c<�'���� m��L�������u�o($��H�`���������Aub�e��P����I��
�Gc�����or��=7%�@��U����j1��r\���p����kd�?�)�"�)Z��hd��E�U�P�`Rs�1�k*��:���6 �xP������X�7e,G`�A��d�4���o���d]�Kxl ���y�U����
]$@rr^
�;���	��e��B�d��M����k�����:��x�*�x��%�� o� �-����r}|_��"��:&�rsZ�Yo o��CCm�����|�6a�����S"�*��a�N���6�A^k���k���<VuD:D�!�� �_�-Y'y�&5y�y��c�bz`b�Bc5Mx��yK�� z�����R�����{_E�����*v�� ��^[��:���x�&o����QmrTI�l��A����y���:V����v�m������Om��*�s��<��
�;F������4|>�JQ�na:�O��3y�O����xE�N�;r�&y��/��nC^Q������E���U����@��
F�q����Xl�����9��x�B�#�pL>�VldRpR+,	C�������}'��-�;�,�NJ�-y3����a�6*@o��Xa�h��a���dO�<�{��u�oW��cY}�sSU�lh��������&T��T����N\]&T�!�����6�&�	�P��J�,����r=����R!��<�^���m���;�q5��.����x�P�.WV���+�lHm��?t`8�*�����f��sq%������?~���{������r0,����,}�x2�)���*1[��Wd�u>a��W` ��p+�_����������PK����uAPKR8�JPA_test/head/3_2.outUX���Y��vY���mo7���S�]�����p�4��6�=�A�48��@��`=��������R�Zi�kG���-��w���?3J�~��Oo��#>�{��������3�n4�B�g��JktP@�z)�������+�������
��8��`,��$���2����1U����?xy��.���l�zQ�y\�A0h���V�k���NV\�_��Kq�XN����i6���g�������Mv�0
3�q�B����.'?!.���oF��l9�_���|���b6���������/7�<������nvs)����{q�
����l��f���zH��v4�G�����������<���[�����K���Pi'��
#`����Cl�,�7��n�)f�}i�����n���g�Q�e�"����.�3����y���$���R`b{l���i�F�hW��u.@����{��/���`:��5��{7XN�w/��0��#����ZS&���
(�����} �/��);g���l�M$K��;�W3�-3���m.u����Q���g���i��LA�8�/,D��!�'aH���?�E������
��4�2���t�M�������5jKJ���Z�����U�#I�C�����'�d���My���]�W;H��O����!�R�R"��']���!xW2�����`X[|Ji������${����w�#�8�P#&If-�/�����Lu�0_=<p@-q*�,���5K��@�(^�������T��jO��[d����X|�~����M7�n����VKUf8�o.�N����������X�!|��'{;��D�QT�w�_�]({ag/��o��5���fQ�ov#~�R]Rk�y�E�4�+�Xj"j�v�����E�������{^/}Q�;|��������A=F������9;z��l_��A�8x���F���]�H����� +���:z����0�������:����8�����~������M}Q�ug8��jA��W�DSv���Z�5'�GO����S�:i�h����R�����C-�:w����/�~���T���6�9�Y/K����������{- ����I��O]Q���u�a�Z�'�9Bo9�VM�k�#��]�cMU�G�hj��GSt�/�����I,�t:�j���UA�����:~�dH�E}Gm
^�Z���]����,�����`O ��}Q�Q�#���ut���H=ni��Z�t��{��fm�q�Ye���S�1VI*�������u����7kS�"M��N�;��h/I�jS�'����������8
�U�H���6���kN67�����%t_��kS��U��DMM��<�X�7�9n�vp8f����ku���@�R�5c���4���
�x�
����+����d�V����74�vu$�����"�}����N�X�Y���;Di�I$
q���)/3�US.�X�|�#��jQn.y�O�G�I6�����W�zk>����2.4�G���4IEX��v�EU����_S��o���EX��u^����x\�x��/�X�b1Y%�v���a�6Y���Z�����7M�J:Ub���E� �)���X��W�=����Wj����m��5����U�`qH�=�j���A��d������ �vu�]<g=J](���"�z�y�$�kj��v�]�yk���j�w{J�Y�� L�J�jcj�����#-pLP�@5����d���~-�d���I^����S��	�^��:�T����)�wL�}'�����6�4�.�UC�V8-����E���|�����[��[r�����<�����=��`�c���ow��|�(.l9'W���*�
'a�~������Q������)� _�<���y sb��~�+����Q����
]5,F�G�����;<�&b�gAl��]���8��*��Ow��-E
{b�
�R=�o�������|a-R}�@�T�e"����a]�P��U�b��^����I��d���5lm�Fy���O���j�>�h='{���������|�+WA|��*~����������'T��T��b��]=L�XC|����i:C@����Y=J�\��b���rx����N�y�����,>S������6�\N��n�mC��\��r����[X�������$�<p�yr�/��I6���x���������
{������7,>�K�,}�x4�-�?..-'����x�G6\�#����
:t�2.��<��t^���PK��l�uAPK28�JPA_test/head/3_3.outUX���Yw�vY���mo7���S�]��L�i��q.��I��4H��*P����\i�6��;\j��U������^������p��~�������o{���:Y�`o�I^0v9�-�)��^������H6����������c�����X���'��T8:C�U��>k<�=����u�5c�g��U���{���*��k�����kG+�����5�\,'���|��E6�(������E6�����0c_��g����t9��v�6�~�/����xN��}������.�/���>�?�
�lc?f�hvw�����[6�������d6��)|x�������q���}7�-_����{��H#��w�@),
	����F�F��oW�B���e�r@��r���]P6>_��G�>�������
�Q6'F������
S�:�\[O�D��i�r��z�-�\Wi0(wN;�g4���������J��`9�_}7�9
q����-�������Y�y���s�Zs��Q�ccWnb�qv5c�c��M���V�u
�qZp��)�����\�J����������E����������6mI����|Zv���;�Ye��J��W�_�,]�� 9Zq �hq����4�M��@��-����/��f�����&@J���Y	E<��b q�:��8��'O_�)��
�{���Y;��U���B
�D��6_<\3�ZK��3�-�|��H��)8���Yc(b��$�E�6����
���!����l��g��������kd�G/�G-��+oFK����+RZ%����:G����<��|L�+<�zQa_�g�����WJP�>��K����-������S}I)�l����.uA��xD��'�S�I�B.�j��w��=/}Q�[|��W���7A)�*Ij/k��r���N�N�E�t�[������\��uPRT���j����u
]_�����M��5&a�u�\�7�NE���:P��u�PE���Z��)�&PWH���qx���R`_�m�����[JU�H)LX���%��nES���n�����(ru4�!�IP9���t�����S��p�H��������5��=����a��h*����S7��0Rt��2G�J�2Gt]y�Z�&�*��A��0�[�Zay��)�dN�0%ua=G���Ka�N�*+����6�B�����Kf�����X�d���3�a���o�M�����u�eR�K�Fp
���������t�[�����QYL�:���p�l����3��o�u����wkS'<���{J�?��Jp����x�����������:s���Tm��J �(�Z�p�+�V{�����8���j�E-IV�+��)��vi(f���Ji-�TMyc���Tm�XS6Mn��P����V�0N�C������o���mIp�!]�
����e���54'�D��^$��@N��s��@��<��/r�r�����rQ��d8�M�9V����[s�f�y��*$������Q�V�V������h=�
��q
�%�����
��4�j�����X�V�A�]��y\���Ku=J)
u�v�W]����0R�K�_���$_������hO�|h;��E�I^w�����5XaCSy/��y0G)_�=	��}�M��Y#/0��/�J���
0D��������@��]Km���
8�urh�e����9�'8l�0�N��U�"��v�D�
jc��Z��1Z����Zn�v��]��!O2oS+&y����A���=+������K�/;�)������MnC��y���%�h;��m�|C^�������DW��"��(��hO�������7;o>��)�!��t\�t^�3����mD�v��R������)J�
�jX#k�7Ay���:l{E������m�(|���2��
@��<�1�=	�F��z�����
�dVi�������r��z ��a��y�.E>��^p_�\��<�KD\�[��_S��*1�I�d��R�Y�<x����-5lk�F8�����P����ys��s�'@���A��h;�w�r��_)y�eW�j���������'T��DX�"/��#*�W��n�M���a���Q<	�����/����X�,�J���A1e����>,��5�rB��&�&���U�
��|����= <Ge�'Z������b��O�i�n���n�~�����~���|�|������q>�/���+`��{�{6\9a��
E5�x������w8�y~q�PK.0(�uAPK;8�JPA_test/head/3_4.outUX���Y��vY���mo�8���S�]��� ��!\hw���n�M�8�����?dm�v{����e)������=@[�1#���f8������\�G|x��}OgGkg��h2��8����*���I�����Z�g�/����n��+���|0�����G��J�w����]��f������N���y�EAY���a5����kS�][m�v������\���rr~>M�Q�M>+���s�G�Mo����h���������t9��
q!6>~3Zg�i��M�������M6_?�}.�s3���	���w��K��.���l��(���Mfs����jH��v4�G�����������<���[�D��6�KA(�)m%��������v�o<����na���PB�A�X�}��
������s��G����D?��l������G��4s��;�S�6:.hiL�lcC����{��/���`:���B���
���]��B?�����~�U�d'%�T��[�Z�����������|��8VyL��GM�����7}�K]9S���%
m�-
�2
��[
!�n�U�����]�����������.#]h����l4-{M�8%������+_eC��Pxx�@��M����l��������B����"�nF��J��A�������1x[2����k	#cu6�]���X���=�o{����b�s��$�?����z^�����a�zx��Z�T�YJ��kIf7�<���dm��l"�=!VSd����X|�~����M�}<{����B��L���:��@GW���1hB���'{;����P�(*�;�����P�g/��o��5�X\���7��~Y_Rkk����C��"I������z�L�rNc��U@��=�����-����K�A����n��p�Y������s�3}Q�uGN�Y�&����1Rg9d��5�XK=�`�S��udXSw�JpZ�+g�|���[<~_�\7�E��-W]*�:�i���b45�WzSSO�����E�5�;-�:�r.����NA:�Z4����;����-
���]GS����"st<+_�?GO���t�.�C���4pM=b���Qa8(s"S�����S��r���-�2Gr`�2G�r�nQ_)��p���,��0�Y�:�$��Y��uu��~.]��Q�O��S}Q�R����*_��I�*(i�T�z0'��x���o�MHk����3�)��
� �V%���{��t�Y�����2Gp������W�YW�u�����fm��q�Z������/�$U�u:_'����n���\��s�jS��'��F��I�3A�E}�6e�]�oK���Z3�x���nx��O;X��d�MG`�^�j�Z)l��xy�=���K+���Q���ZI[�f��H�����X�O���������eH�����D���H�c��ii���?W��X�|�#�\�(.y�O�W�I6�����W�zk>���_������E������C5
�6U�j����x�����*M ��+�5�x�
�8��zm�A��dk1���������T7��j�P���<4�s%�J�����(^��_�������d��|l;���%k%o��]E>V���R� ��
`MR�${�m=��y|D^QW��a�r�.�a�,7�E�����I������Bw?9��s������
�w@���<5�j�!f>�E��0�p��C��A~��[�V��I^W�Y�]�3��;S���-��S"oB�}'������C 	�v|$v{�����6Y+��-���t]6!��m�|�� ��d���w��c���ov�6|�]�c�5��P�����"���1�d��������>v�Z�k+��
����#�=^������N�T��a�x*�5�\���q���Y�G����f��6��J�E����j��m����|��}��[���S$ornS�
����L��a�5�
�����"�G
���x��t�g���-5l��Fy���/�*
���P�<�����g"v���V���\����^��>zJ��{�?��P��R���8~Iw�2�b
�����o�=��}^Y�J�l�q��z9����R!����^��]�)�w��bp�]
����M�M(u�+;b�|bx+�
O���'����'��"��d�\\�o?]���/������0��{�����.�����������%X���x�G6\�#��~�int�5 ���,5^���PKE���uAPK68�JPA_test/head/4_1.outUX���Y�vY��[mo�F��_��j�bg_fg�s�����.)r	����L��e�������f)�\��ZV�n���������J���_����7�������8ys`r"���e-���Z��CR+)�c>�Z,�?� �(/��m���dZ�Of�.o�m�6V������iVUw�����y��?������?-����WW��jR[4pV��aZ�gu0�f�+i���g�p.��e�X~Q
d���p�(�EY?��G�����B|(WmU���j�%��kkh}�mW�k��aM���d>/.���v����|z���h����.Y���hPc�uh��;0����f��{�>��l�����|����	��]ko�c�V
 ���`PZ��MQ=c�Je�2%��CQ_WL�����f�������TM6��w�GDL�>Y^�T����##IG`<���S;��	������5��4�(
;��F��Vs6�t�M� .���e]�6gf���-{wW�/[�{��^��A��������{���kp���6�[��C{�x�Mf�b&>�����\T��9kj�1^��cC ����0c������j���Y],���5�1*|!�cg�7J�Q���y���x��������^��2��y�����.�����^}_7��?�yq�B
?���/����������=��o!&*��i"�tH-'��9�`���9���N��;8����G�	��;�D��8�3�p���IFw��?JN����rN���8lb������Xz�z�&9q��q���e:N��\>q�`C�b���,�����x��!':�����'1vq_��
���W��'���r���Z8��\��Z��'�%��7�	�h�q6�'�:���r����qs���p����x�T>ve9q�	x�=�u�q��0^-�x�p��}|�O�I�}����gta��6��	'��q��Z�ub�so��[�
}|������xm������e�5N���a��'�,d��b��(I�����8��V���\�o�x�$@���q�q�l����.����N��D��Q�����3��>d����#
�-e�cu����E7E��J���=^���m15���k[����m�yx�����MQ/��d.nr�qby_.�=��&�]������/{|�����e9�*��'�����nBg0>7q>�������x�<1L�.s,���i%�|}�l�e��6f5>�ne����Zm��1("/Y�� �^/\�w�{�z�x���X�f�N��V����{Fe�VzZ+l=J��T��2�2�no�z(f_��(���_��m����^�E2@�����B4/��@��y(Hm����=X*���~�qO*���F�@=���v�cW���'r-�?������3�T2@���Pb;Tv\���X��u�[�.t�����[*�:h��.��n:�eO�=�=��� �C4@��i�n&��8��m�!��M2T6H��������K*�:��1�=�b�:�	�1�G���z)�Tp�U�4v��
�CII��-��\��>�?IL����b���UG�b��*��*V��
I���^��w���������U��!Yx&t(��S�"�u
�J^����12����q�m����U���Hw
���x��GN�Os��
H����r-x������������:������0���x4�btlX��d��d�q���T^7t���<���I��u��u�d���I2T{|�`(�Y�9;������,H�{�V�����&$$���&���F�������z����N�����Xu8���;��{;%�W
al���6�3`�n�z���A=Y}���_5��o�B������}]2`�S�������W�����PKfE�}+<PK(8�JPA_test/head/4_2.outUX���Yd�vY��[mo�6��_�oM>,�!�C2����z���.����#$B+��������%1��u��;'�����Gbt��/�}�M������-N^�����mYq:����`<��J��5��Z,�?� �,/���g�t2��'3Q����6�:/e���hVUw�����y���������?,����WW��jR[4�(�V7�J}V��`7���������sQ-.����R �O�w��Z��C�<*����.�]���h�:m1��"il��r��6h��_��
k*��&�yqy.��o'�����%����E]�.[��&��Mh	%����T�M�Y-!��6��M{����������1e����XEk�6��X��%��Wx� o�Je�2%��]Q_WL����f������|W������-"&?N���������E���^G`���U;��v��v
0N�v�F��0+i��g5g�N���B��yQ��m���{g��G�_�����l%��T0����?�S`ZOVO��K��!��D�.l��g�+��m2�3���C|�N���'��XSDc�t����uFO�����T�g���s�\����.������������+�_)���<��L���_�����,/J�7?�>���'�wl{�}�h/�����35��&���[\��:�I9;'z�	H�N'^��p� �~��i9����8'f���9	�!�	�>	�3���	�#����p�SN�s������q�w��_'`���9		'�q�cHeC�	��Uo_%'h<Nh�L�����.8vy�u��>JN��7�D'9^sU��'�o�k-�<�:�8�0'>�������m�	w���x|Nk��pB��$l�O�}�fs<����;M
G��/v���.�����!@_wg-�����������)����;h3L�1rb�x=#l��N!��	HI�$��6�8�Vh������9^[�q�V����?Q�(9Qj<N�}|��h���O�
��S��]�q2��Q���\����������p��F�'�������>q+N�?���({F��zFx���T�x^M~�y0�l~�����aStS�~�h(N���S�z=P��3���`2��&�E��������;x�����MQ/��d6r�qby.�=�Q{�]�������=~�����e9�*�����9?W���`|n<�P�R�x�.��3�y.=���$��-����o����h;Jj��!o@o�8n�-$o��U�z
��(�W1�����n���l?]#+b?����W�=���P�oI�(�O�������b�iQL�������-��@��Ev��z�vNZ���t+��7`��~B��n�,��9���'��f�G��h��v��|�
�2�,����ih.�v����@�'�k�-�6Fm��U"$5�hk���o��d���t�d�����?�6�o������.tp��aM~�n&��4�K;���B�xt<�!2\uS�b�s{?�����3v�vC��J��hO��\u�q^������:H��c��:�4����:v)��z��d��$1��:?v�����t�%C������!���1����<�Ke;�O�}hT���G����/��aA���5X+y���c��@;C��co��6WuP|j,�h�Wh���hgh�d�t0=�k���^eZ�&B��=����:(=r2|:m�@���s��\����8t�>:�� *��
#7,OGh�� /
&g���c�f�%�t}2T{|�`(/Y�Y�P��\�����G��ZeCG�:�;C��Y��&���&�������z��F��d�t����xD��n�'m���NI�ECG�����7���=�;����7��Uc����E���D|��bz_���-��l#�}
��g����PK`����+<PK.8�JPA_test/head/4_3.outUX���Yp�vY��[mo�6��_�oM>����,�����E���)�!b[���K���%1��������qlI�{�{����������_�����?���:`��q^2v8,f�������5���M�o�S`�����T�#v8���+�qFoD�N;�����iT7�����9~���|2��3�vZ���]^N��A���@P�������`"�2�����������l:;xq^�q3��i^�������l��g������V%���R��V���%�������$eG��$�8ar�������K�&�2'YV2 �N�*��6�+�8���\�&��{��P��N��i+6�����+�G��{��Q�P��.�M�����[���������J��T����W���6^�*��g�bJOE�^�J="`��`v�~+�I
�: C����Z�H#����
D��
��0�U��s1!��]gw����I�����d�|�����g77���V[pAb[��N,�U�
����}�$���d���f�
$l�h+��`4�F�s��}&��D�q_R�[nk�%?�)9)�P�!k��I�7�r����2�/�6.BTxy�^���c!��89	�����{��T_��2&�I�C�Yg�`|Cz�U�����{1��6����>�#��]�����cu��.'��"�Qx�"������5A�3����'�����xp����
~?����g��Du8�1'��'Z�c��mrv/9�����G��)NB���jn@���{~0N4��81���
'H�,�O�J�RD6���������DF9>��	?Q�����
����������8A�vS��	9^)����&������'~iR��E���8�'���a/c�u��.��.�I�'�p-��k?ka���%}��8��>��;�N���b/{F���aIq��R�|b87Jr��|����b��[�%}|���6��/�p������'�>>�O�S��O��E�M[w����I��W"�D�r�
~b
������'���,����.�>>���������{�3*�_��x��o�l\�A5p�k~��y�N�L�"���9o_/j�s����L���P�a����H-����_���rv��&�a�3���p�`t��T�����^et�������/��ev��3�����2t:�st�7�v�N�����p�-�O�i�������f�[5fEtW���a��C�+4L*�t,������"��"������������|R�D&b�q�����
��
"�j��|X��y���i6�����'<U8 �r�t���@�����v-������Pm��TtM�4�����=�J�K,����MZ;�h#C��u����?��M���3�x%�V�����*k�mP�:�J�(9�l�����}��s��~A���:B�Y4:��hG@K��[�
��nm���L�@��`Tk��[�m�4��u4Z��y�x=k���c��Z4+��I�h�����Q�;^u����nI���Q��	��h�h�{1Zn?I��3���;t�n�A�Kp)
zk5�.�w�:q!��v�����=���j����J�K���x:t���Z�p�i
�|=g�A��g���!�D��SU�<tP�o���P�x��hzN�s����Zp'��	�Ck����ES���@{��=��3��Z���!��M�����H��~"��7t���<�����T�^�D2�CQJ���h[|���������M��s1Z+ ��":����=l���&g��&���gY�����S�2J�V&,�J��w�~4�����g
�o���:&j:W
������
���7��e����SA;����f��2'�QE�,����a��G�PK�R$<�+<PK%8�JPA_test/head/4_4.outUX���Y^�vY��[Qo�8~����&%8������=���^��b�
�#$B+k+����7�-��l�q�oP N,�3�G~3��������~���y��'o^�����mYq:����`@j%e��_h-��������3q:�����������Zm��^c{�������	���{!~,��Y�g!~ZT�wo�����.�x`�tn�aZ��|���a�+k�s����\T��b�����K�����e��_�iR_��X\e�U�����Q��S����m�]�����=g����<z��'�����Kb$'��d_�2�cL@MLhJ��0��������	�qLf�^l<��s���xoL@��Qc�[n��F*��PY����`�&��5��Jr�������������	�Cq[-�S
��w�����}�����y�!KF���xAI�v���f0L��U �7�!5h���YY��_�9w:��xbV���.n�wf�;;|z����]1�l�VR1�^���u�����d�$^�����#e��z
/�w���d6+f�s���<��E5O�cO����+6�9?=�a�E�A����^���rV��t��eT��/�w�X���(���<��L���o��ui/�|�P�����G_�������U}_7��?�yq�B?q��o�������N��v<N�����!&*��7�-�B��&w��x��81CN��$An�DNx_i��}�p��8��q�Nt���������sJ@�qB���o�0�qb���������7pe�q��\��Mrb����j��8�����	�'zN�>JN���r���9{o��[�F����.4/��N\��qb���Z��U-�����������Y����?i��7���Q��r>�]-G�]���.Py��q�h��u�u�q��0^-�xL8��>��>q*p����Pe�h�x=#l��N�C|�v*�u��d�8�V�xg+���Os|���9�bN���>QG��R�q2����DB�?izFI��_���d���J9Q��0�-s�v���'0�>�����]@e����(��k��gD�g�'}<�v���;rV�������;l�n�����k�V�_�j�s���L���e N�������<������������k29�8��������y�&������g�@U���\��������nBg0>7�Y���}L@�Z���a��1 |%L+K��[f�/��mcV�c�(��]B zK�|�����z7zK���}G���^�q?v�9�S2��F�{�>��s.2���=J��Hj����L����^������]��W|�69��E�
��2@���{i����v�'L����2=�����R; �����=�e�6Vt|v����t�*&�-H-���������3��2@�h$']���k��m!J�VV�hk���o�R:h��.��v(�~E��(�
LMH����v�{7�Zh7�+����\N�F{���.�T:����+:�T4��F;�+�kC���^y��4����A:�4�HG�%C���I�]���$�v@�����#��=���!�0���(���C�z���^��p����/��Q2$��t8���X�u�����5X+;d���12����q����qE�\���k��v@���\��q#'������
H����2�!B���k���	��R:�9>��E������h�h����vRc�z��ATj������#�Xu��4��u��8�qZ{��d������Y�Y�P��\���,�_�Ze�#*q�����r����vJ���eI-��3D�R'�0V����<>�%�5�������*al���6f�gjXm����NV�h,�WMx��h.��]���[L���[��y�s������S��������PK�_��+<PK&8�JPA_test/head/5_1.outUX���Y_�vY��\Yo�H~������q��C�\�Gd�hd�,�������j6�/I6)`����a�Wwu+�F��������n?��tq�N��|9O�����Fm��p��k��ln(�c~�>�Hs�.��t;]�t��n(��K��������E�������	��q���|%�{�'�����y������]n�����|��h�����?�hu����,BC�����d�]Fp]���w��,���+ ����7�}����n��x}�	=D��&��c������������j������x����O��i����
$M��%HD�U�$O��4��+����.��s�������md�D��G"s�h7�u��r�������cM):TJ]����C���J0����_Vf`�P,�U`����z�5N�d�>/��U��C��0��f{wr~�&���t��f�:;�.��3C�	��p'��fv�y���5�v�>F�}���W�4�b�#	'
��lr�\[�U.o�D�k�D�T`�r[�K~�X�<{o�(����.M~G7(��d��������E��z1�����Kf�MC$���������\0�k
�W0��aa��HjS���(�[i�? ��Bm�%F���*t��j��	/)���	Dl������7�$���uE�.r;o��M�L����!�7�A��9,�`�L`�X�%O�S�dn�5�������Oz��v���\ B �`��N�mnq���!����Ue1�`��.�a�a�x/mgb�t���k�;�:��P��1�]���Z��r�S&��{$�VRliQ��
��4����*,Y�?���Nq"�*�����S��X�y��/L�
�D=�9U�R�����J��.�W~�����Y���	�T��&��	���^�W��a�����T����	?���^����h�M`���S���_q,+�9�}�sW����� ;���~�������'$QFh`?�T�c��+�U�����N2�)6F��K��<
�P�5��������/s��:�_C���o�����G�����/��mQU�$R�
�f�5|7�����]om���2�_�������
4�J�����"[����D��6�/�[,��)������h����������������G�u�`�@,�'���-31��@��7�@�v�,�G�S�=�G9����4��>���,^�3gD����/�f�u�<\g	u?DBQtR�	��&����t3AJJ����i:{��gZ��/��}4A�sl��
E�n�#��XT���R�������C�b����f�����7�0�q.r�$P-8A�`a��y~(���
�t��.��
[�$!�}?�<�J����>������Zm��>7���q��vi�It_���Ius������P��L���\���P��cV�q��������ka��"lwklX�<���C�+��N#��G���������E%�I����������}5����=L��^�$D_@9����o�KV�S7���J�%�	T!���5O�R���#Spi�-��7�pY�+�G8��h�9��uw�;-�BH���Y�c9�Tjl�1��0,�A�Z�0S�m(XF����2a��������=�����ZJY�$�X������@t�EK0�a(� ��Q��pl������n��k"� P-�V�Cz*��e�wn��'G�.�kB��$w7�o��1{L��!��<������D���D��x�	�E]�BGHA���C���Z�(�h�uvC@�`E�C9�����Pp5�y�e�C�h�]�Em84���*�s���r���:���k��;�����)WP9D��tn:4����I�:�M�@������.��C$���s
bO�����R�G��j�R��e�[�Y�@�G����������tH�1#�P7����R��������+�KM��CJ��iL��=�0;����2��p�}n:�B��C���"�-����+��cC��R=����t(9^���u?�F�����4#�5�x����F�����"�iK���{���Cf��V�!y���~h����335Z������X��?4�Z��#�6�4;�TU�;uh%xd��}��v��H�����S��9�N(�T{j��w�tU�>�{"M:7r:���~~��P��P�f�R��^�R.�K>��i#��i�y)�����i�����.�/s���T�0�i9GX|Q�[u��q*���R���l�3��Lj������@w�Mt���`��hW��Aj��d��Vp�{8��J�����[�5�=�|��L��$]�d_�����E��G-7����l�������m�f������}�..�PK����U	�BPK'8�JPA_test/head/5_2.outUX���Yb�vY��[YoG~���7K���G�ED�+{���F���P��a��M�_��s6�Eq���`
G������&!���y��?����O����]����0_�SB�g�&���X-
�����$��+N����wW�^���,�N$�/�+.���j�(���E�8^���3���'B�����\�R��'���������_��	9�l�����*����c�.�E���&�Y'�YD�F�_�W��j����$��o��Y�]�H����o/��w�����������M����2N���_�K<"����b�WD~N�����6�n�i��F`j��$�W� )��R ��q�c����xEW�V�e�|�[��<�B$�=���yT�d?�M�r�!�����}C(&Js�?O��(��+���{U��{��VZ0f����k��G��|^LW�7"�y��t���=�����i�����O7w]jh�mg����������m��Z����uN�*f��>F�mD��W�n���zC�)���F�8.u�%��`�
�$	���bK`����VN9{o�
�����&BI~'W$��d��������{��z1�����/�m7i������{3�4s�ht�==)���|�I[*k�f���X�,G�B��1��ZN������/)���	zl���F��7���5��y�>�{k��M�Lt�����4���b�%u�S���Q�T�c�S�d��
�=���@7z��z���\"�(c�
����8@8�u��M<p���"���,|��UV��[i7��3��E� _�����tE���*���n��P\0�Di5������21�}k0'�_��7NS%j�~��},��]������c��m�����C����#����3���h����T�k���'��2Q��P�?���?O���C�/��?z6
��Hy*��m���/`�g���7�����6��
U�?f�C��_d��,�T�[#�������3�����'����1����_s�j����?�'��0����V�O�?fw-�����o������8j����������N��G�a��=���5���
�v��r7��������!C�o�������b�P��a����Uh2j�A�(~�v��xK!�_��?���te�����|"������Q��k��Ab�_���`����v�uL�Z�����P���[�0����������i�f3���$^��`����,.�b�!��<|ei|o��(+)s@��M����(�L�V
�:�z���"����e3��&DJf[bwB����`�`\D��>9��vO_'h��]���5k�J\�yW�5`Dg�+cP8q��b�"v�?NM�F��l�-��;������z��P���	Ri�}�S����}�!R��c#�S���'�m���5M��G}S�s�>ZT�v����26s�UZ2
`MG��,����\�Bd��h����tzu�)�\uZ��a�x������i��-$�4�����/�xs�j2�m=��&�����|A��7�!�}�X�C�\��l��-(e�-�4P�g�=�Bj���8���Z0����'R���f=�����9y�d�c��@<:g����qe�����j&h9���I�z)���	W��YxZ'$t�;({l[3���Q�KJU�$�X0��QW�Q�#Q3���F�%��UK�*���y��E������:F
�S����������a�����$��7����=���c�����o��r�[C�����_�*��G}�cE��� C��!V�U.l5��_�>���([Ku�}����A�d(�v�J�d�}3��"�C?Xy�a��T#��v�l5������`�a��c8����-����W��-C��0���CM�4��>Z��������d���#�P�})��	��^�7U�p����+���d���#�������R>����PjE��X�������1r;,C��@<�~�&����i����M��2tm_
am!�@�=;��1���i�����d�$;��g%�v���C���QAN/v������y^��!}��r�����������%���#6��4d��1j�=�g�B���i�8�|n2l_�����R]�Z���>����P���K������4L�2l�iv�C%�����Ry���i�s���#�i����M��aV�����UUN��K<��i��;�4a��@m�GqL^Q���K�K��7/��4���q��a�C�_��/)L���q��F���,Q^'��7��/�M���|y�:J�Y����$<����S��,C�-G�
�}D>��
a�_{��'�R�|u�-�i)��,7g����l�����������?�s��8;�PK}�Pg	�BPKB8�JPA_test/head/5_3.outUX���Y��vY��[YoG~���7K���G�ED�+{���F���P��a��M�_��s6�)q���aG������&!���y��?����O����]����0_�SB�g�&���X-
�����$��+N����wW�^���,�N$�/�+.$K�P�V��\���5~^�!\�?�5N�_���B=�>��w�/������O��f�<?_�W�<����t�-�3�V7��:��"�7r����LV�e�7�%i<~3����*�@"o�}}�x��(��o&$�����]4]o2�?F�8A��]/��������_�9������$����.H��A2|O������W��r�8��ZM�y���o��2j>���]Y�66������G]���L6�:�i�f~��g�
��P)MI��wQ�++����*
vOY�����=e���8���
����V
"�y��t���=�����4I�������.34���u)��
�$��x��k�0C9(uN�f�`��6"����Re-����K#������@��.D�-�0�1Id�Rk�Y��J%+���7�
��
Oqr�&��+�]n��qu��\z�#�z��G	�|������4�HFg����k�9f4kT@H��@�s��2�4�@�h�R�J�8���+�1�F-�:������/)���	Fl���F��7��G���"}����c�h���UQ�@1�n&��XM�A���P^�s�c�8����B@{xsc�.n>�]����r��S�0��j�
���XW�\��WI,,�Q�ScM�J�*+��^��(���b-���w�u6]�x�c��2p�jqT�_rJ�!	|�q����(�s"�E��4U���������PH�����~(���?��
�;���?sT���D�C����s�����������C�S����#�2�!��KK.X�������!��W>p1�0_�������i���x���R
���k$������o;��m���/��b2:�_iNe?�������G	�P�D�
���1����i�����?�q,���
����RA����������d��?����������D�����a�5���w)X�c��"�������k,�
�����p�9���}������-� a��P��K>U
�nOe����o���A�����Rf�_0�N�;�:�m��������Z^���S�3�"G�%���4���?J���k�c�U���E�L1���������^qQvR���6�����t3!Z)����i:���;/t����6�)�=����"l���`�b\����q��)l�{�:����<����'0�����{jV��,�'��t�U��?,��������/��
;����d��"����90d��)�R�mc�1�"9"���B��l8�n+���iR�<���F������Ju%^��]�������Fu�F�
��������(��tzs����tZ��a�x���#����"���$��@�u���?�\��L~[����h����/���F7���+������
�B����Rk�"�*�+��L!5�g�;������������u�{�;'?t��<ag�cc@�8�f����qe�����j&�����������a��56_P.g�i��0�����m�A7��-���TuM2��{u���������f2E5$�2
[I]�Z�����r�I�e��[p�=dNU:/���������E�d�aH�w�������T�al����=��t�\�h�"�4������7!��/�q��t(Z:���������,�PZ�-l�Cu�9FN�M� �:�m?T���C:�}H���t(��*���C#�:���:[��u�0���VR]m�X���?���C{��m���j�P:4���C����FdVZ~�=����9�r�v,e�:~(3?4�rY�����9=32�F��i�����|�5~(�+�KM��C���i��w�0��BQ)��Fq���t(���C��������	
�J�uo�x����C%��u�<+��Cn��Z�����
jx����F�����"�yK���4����BV�����^?tc������7F
�4:�CKMX�q����P�1bi{N�Ku}j���X�6�*�2/�����
S��s���P	��������qTK^�����H���%aN�<?�uh�|hD������Q��Y�T�%��4����s�p^��q�����RG
�u]�^�4�u)�0��8GX��yp��}�O��0�*;��J�N�������d�?�}���'Aiy�:J�Y���@���9���O=`�o���0�C �!H�3���r����(z���M�4T
C��3���h��Ny��B��p��	����}.���PKG��Wa	�BPK*8�JPA_test/head/5_4.outUX���Yh�vY��[[o�6~����$�����FS s�^f�3(�}���q,W�����=u�u���6��"B��w�d9���/����|�p��p� '�����|���sBNg�:���X-�����d�� ����e���t�o������\���9J��j�"MW�yv�p��D��4]9��{��������Q��_��	9]o�NO�e2���o�]|K����Ur����������W��rs��
rNZ�_���t���p�����i��c���W����O�&�����?&wi�2pu����y?_N���s�nV���Yr=��m�4��d`G��c
H�I��F!A�j9����.�-�wI��~"�ed�|�OF%ju%#�fuT�f~��g�-RLLJ{�?O��$�^+�`��**�kt��J����iv�dk�y1].\�����rvs��~���Y>G��>]������]\W�SC�������VC1��0vU���}L����3�/��(k���$�j���R[��!W��DgI�hM���VH?/��V(��e��{�.���{J����N.Hq�.����3r����l��b�dt�-��6�<EOFg����k����ht��(�Ypm9��p(c+E`-��Ad��� ,���X&'���������~ }�<�%����"C�M�
�����vB(|��
���-�D�T��	��b�L�1��'��r`��3��#�������l���G���r�ZaP�a�a�F:i5u��I��XW�\��gq�(-Er�,��,g��~A�gN�dA�&��������-U�^����R�Ri5�����S:[\���H����>�R�����a��O
���������]�9�)��������_�D�+y$�eG~����_�2�������`�@���W]���"�.��_��-�[�������e"�����6qJc���c���_�27����/�����Z~k?���G0�������������H����u��>��W���O�������_�G~��:��2��Z6&��BN��H���_'b��z=�T��7�+s,�������kF��,����z�?;��=��X,�����~�C��d���G�_���o���&���>�g
�`�������!�_�F�W���Q���������_�4e��,����,���i,bT����v��UU&*��+4F8�?*�}��u�Y
bTY^�e1�p�<|e)}&\^T��yD��;^of�I����u��4��$���e=�N&DfZb�B����`0o
�($�!f�P��}��	Vuu�����)q��m�ZZ�E��,�'VP^�^��}���ld��	�w�`,)����h���^�.��i`\���{�>��W����]��@�J���H�j�:�%��b�k�57������>��V_[�~���~���u�E�7���=�  ���������=mR��t���U���#���"O2�����@�u���?�\��L~[�n���^�WH� 9�M��o�kQ�c7@J�"c�V*e���E����@���;*���P\c���yJ��\����9��ww�	;=�R*�����r�@Zl�"��qZm���������@`J���Q�4��|Z'$v�[({l;{P���n8,)U���c�4r,`0��:f@4B������Fa	�5IU�X��z0����
��&��N�����C��y\�����a�(���$��o����xL��c���������
��&������e��&��r`�!�py�Ca�\�j�0�!�����1�xnJ�������8F8�����B�!�84B�C����Ci��.��o	#����������xnZ�sJ���.�:��0>�/��K	u��'�3���p�����v����P���Z��C
��W��q�����t��m�R�C�7������^r�!�"���rp����p�VQ!]c�{�f?7�4��u}��knF���oR�qX�6v(��:���C%��9l��h�!X3b��/��E}�s�b��9�:����x;���j-���C�_�^����C�������Q#�E�K��x�4{��|nrv_���l�R��Z������o)�LV��~�`����s��������Wz�Cp
i��4}����������i���<�&����q��PK]�Kn��/�p(��p�x��������j�/E���>�y�z�����}��s��������O��0����L%��)C����ey2��x�C�%������*�����>�#�I�F��DZ�SX��[m����!���;��Q��>�?E�����	�#w����d�)Ny�?��
j��>i�����PKX-�{V	�BPK$8�JPA_test/head/6_1.outUX���Y\�vY���[o9���+�F��e���RQ���+!��V�}B�d����$�����x.v���A"<�U����|>7���.����?����W�l�N�lr�^���b�t��U��B�\j�c�v�qw!��rQ]_�:c��yu;[��\��r�T��6�.����^�Nhu?e�E��-������W���C[��nh�F�~�U����A5�ro�v���A�B�A�4]``�F��l��{���������/g�����-a��g�������{���^?%i�X?J��r���Q�\.on����X�w"��6=7�y/��0�dRh�JZ�gI\4��R?4�/Vf�\K���V��{A���gk�Y�e�.��X�BNX:�H��].�����hD�4�4M���l#o
���=�e�&]��.�����E����Hz�I?��������_��mO�����	q^���a��j���g��7�U=s�y�N��(w����
�	.���'��>���UY}
�*M��oW�����x�Z������9+�h=/�X%�4�*������-w��V�5�W/M�T��V��j���V���Y�-�����uQN�,�9i&��+�k�S;'�s�v����v�6R	��N�L�E}E�Q���v�2���N�#`�z�9!�����Bd�#��5
f�ku�@1��S�X�`����N�����������������H�}Qj��SJM�
�xdz�����3<Ry����;���Z]'i������g,x:�'�6�5x��\�<i�=��	v.�G�3���RL�
������GO�����h�/YG��^���Y7b����~���a�	;�'�g���h�;B���dp4��<e#v����a�#��>v��r�|Fi��I��}��nD�����(g,v&b��5��#o��]KGw��#�(�v��+���M	;9.��O��3���vHZ���������pnf����T�^F=c������<�������bg�,B<�-Q
z)�����x�2�@e��=�gj��F{�c��0���>A�m+%V��hf�3<���/W#D
��A�t��)����NZ�}1e4qr4s2����h�>xZ[����xI�M�H�(��m[��1~/�����=�k�x�I�F��y��r'C/�1x�RBF�2������f�
<�<m�)9��K	�ke�RB�JiP�&�r�	�	<AA]��6��]j�a���R+�S|B�4YKt�4N�
�x�H���%�v�	\E�b6�B���!��4R�BBV9c��;���VX�VP(�2f4�=������s%0�V&G3#���cDx���Jp]��)ge>��IP�V�VJ��)��U�X��^���1�)%8�����m������p-����T:�� x�����kb��v�t*���ynA{W7vb�wY���.�X�(Mp��;C~}:*&e���A���!�3�f*%dd;r�P8��+N��}:*r$Y{�8������wz/�����+&���)
X��MwT���/�S�.���HQ]�wf��22�p�����%&{�J	!�3V6�5R��O�.�����+���K4��*x���F;)���R�R������ed��i8FZ�b�m�{�������i[����htV=c�3{�<����]
��]�F��y����W��D�&�22��<FO����v�"�`�f�����X{�~�.�������n����na;����Q��Q�4�~
W/�eUl�Eo�
��!������)>�:i�nZ���9�3���l�;a����mU���K�[��'��������NN�PK����>PK#8�JPA_test/head/6_2.outUX���YY�vY���Ko7�����>���3|q�M�A�&(��UZ���������;�WY-m�rHw`��h)rw�q�3#!��������W/._~���8y����\�����mu�iR$�L:����������0�L�����l)�rU\���^;);�wC���
�>;��������lY~.�������U�7���S�����uV]�/'E`gR����gRr�h�&u�!fd�F��l��[�j9[������+^������K��_��U�+{~�s+��~~h%��t�:0����~-�77�z�-�8���*N��Q���
����5��F�z	����U;�Z�Of�\K���V��[������b��r]�U�z��/<z��j'�y-=�\�����~F��Zl#������y�f[��.��^�7�h�'�')����������������WQs�P������:o���������U�������Ey;����8��
�_x\�`���f���>�y5�4��U�-�g�lm��������(>���0��-�������,Z.>�MV����yy�*~��[�T��F4���U{���C`��Z7H���E;'�rp�~
x:���	�����n����������M�����&�������u�����2R��!�����f�-�j� �������P����3^z���t�CsY�(vv���r^jxZ�	�q��;��3C�g#x��y�Qb�i� �M�e�s(x��La
G����g���H|�"�xjo\F=�c��C����
S����x��d���ZDf��z�L�
��<GQ�A#�����<x�Ai`_��@:Lg<$;�7.c�9����!x��J��3`��Y'����x�O!x�
�M���#x��<���xP�9��������LP#xo<;���l��!c����A[�y/�����&�7���<�{<rAz]{<�(u������(��Q�|p)�po\��+d����j��%y<Ly�=��FjcU������B��y���a�^�<��
�/��O����c�E�P��P3'�/�c��~���?�A�&W���]j�Zdm�����P��x|\SMO{�et����x��>�����O�83�22
�;xa����(%(c���3��}���BAO���s x�z��%i�:��/v�O�>�Q���4:t��	�q
5�o^o�<e���cH��\�Xu
^���P3���9����7�:�BJ	)�IF�����n`���jfd<w�:�u�k����A<r�k��O�������P��N�{�b9�c:	P��7
6Hgbc���Gp��SS=#���c��`���,o�.y<��F�'WH���N��S�����P���l�x<��N�@���9.�c�
�2@�������r�1��0�\�Momj�y�x�JU7�Z/x�����P�����$�YM�<������3��Xu�~�����r�xv:�ed<�r�^Mv�X�=��>��	�������z�<^�<���w����r��3��u<��&i6
_	��gJ�	<�S�XF�[����o�-�U�
;W�1z�e��l9!%����V�=������9<���L��j����_���xd�kA��7�b+-��
��L�ed<E!�a�
o���=z<O�=JY)[%U �	��qvo��c�
@�@�#���j����\�5\=/�U���z#�������[,�?��m�/��zZ���5��,����D<�X����
����0�� _=m�#rvr�PKR���>PK58�JPA_test/head/6_3.outUX���Y}�vY����o�6���W�m�C	y�
���n��k1{*<[H��rj+��_��$R�,:.0���!��H4)���������^��^��~���f����&g�E�*k����m}%��Zh�yjk�Y�^�P.��+%/��l^����.Wu��}�f��\������V������lY~.����������24�C�����	�:�o����"��3�l�ATx��AET���_���b�e����*�L��x1����������m��f2�~��(m[>�~�[��}�����������]Q-��5Rr'�[���s�{y���9�p ��J+�gI����f�|h_��l�,��iY�fw���~���*�������.Vo���	�CG�����F?0]Z1�Jpi������� -��������|�v{[�-�F�\���{|$�~~����`�/a��=n���]L����n�����=C}����������E�����:&�����*]x?������H�T��bS�/�x�FO�����%+>�z^���4�)��jQ|L-�&��5�U/M�T��v�[�}����j��fI�`��������d��������5����s�xZ+����)\<��},�T{�����1������xNE�12d�Y�@pt:���b'%������X����S�;h,�1"9�c��9���&�H$�Gy��RN���xJ�2�<5�x&�����y�����m���;�!h/��c��< �46���+��	�\���je9
��ix�2zmO�IA&����x�K.)r�]#u�j,p=,v�t���@�qAc�;B�>���
����7J
;�f�n\���^�`������m�N�P;�y-��F�NQ2a�Q�������i���K�ag�����V[����H�%���,P+?<vJ���4�4i���)�T;r'�{������*�3���<��;m=w��w�!���������a��)�Kn��8�7.c��7�����&*�m�w�rf{��Y�
��]�O�����*�X�\�]0]�5��1z�K��;)�$Y=D�������X>��wT6�����
�C���\E�>�V�86����:�\'�r�9<����v<����>o�$n*�wFb0z<�������;-Na��<�=��L�TY������V�9��S������Fs���Q����Xq:E�#�A�fP����x*�V����Q��� ���7�'� �"CB�l�G�/��xm�O��� !��c���Y�a3lR+ZHE$E���8�$��q5��&V�����jfd4���	\M��Y�Fq�S~b�x]���
N9��z�O��;�C,��H@{w��M r����!��	<1�gd<#�	���XC[���O�����
m�!��5z��d���9��b�)������N�@�/����.]��4�QBFFK���`7�Xq�4��R��%�%G��R�����'�r�9��b���x<i�K���|����&I���s3ExOis�oX�bto��s>w���>������s?Y��z���Y1��L�X<�:���Dw�8�vPTA=����ip*��(x��	0�Y1J��R1k��	�qlj4�Vo��T�U������<��x�HzI/�#c^��HH��o\	�RNSM�ed<���	0�Z�������s:��V�]#�?x?a�UN��M�~�A1p�"aj��Q��K{��p��\���;^�V^����Z�[,�?��c�-k�rZV7��?��i�j{��},��uI
�.y��*����y�������PK��>PKK8�JPA_test/head/6_4.outUX���Y��vY����o9���W���k<��(R������N�{B�d��H6%������ko��K���������<�0���������]��|���f����&'�e�*k�N��m}�^����<�5��?l/�P.���g�t6��fKV���BJ��K�9�����z}K��Nhw?e�EY�����]^_o��Y]|��u�����[	�mV���'U���I�}|hR�!'�8�������w�f�����*�s=^�����h������.i%c�'v>J��/�������|���G��ry{[T��i
"w�W�j�M�^^y0q
b�
�����w�==��P_Z�g;3[.�%{V���-�����g[WlYVEY��b��)G��n�h^���\G���e�!-���V^���;���"]��.�no��E����Hx����G�������_�n�'M/��b��u�����-}��f}W7+g��U�.�r;_�Uu�8���4/�����YU���0/*���[�r~v��mt�l����������B
��'|\��B���}�h�����dW��xR-�p$����ja�Y�K�q�q:�Y�s�.�W��`�	;��`-��Zjn\��.����t��xR�����v�2�z����v@r0�!�2�	:���d�b+�``/��C��;�N:�Dc����ho������V�<���u�!������GO��	��a<%,7*<]+���	��zO��	�M�<������xF�7O����r/MO���1G��TC������H�K<b���^�J��\�	�x���D�e��&9$d�t<�GhTO����)m&��e���Q,��G�+y(��	��<i����x���	��z�$�����	���ui��W��4���<��lO�j���*y��<MZ�>2$��Y�����u��|�4'x�lo����ac��S<����Z>�jJ�le�Mu"�x��	�qO9G���4���-!��R�l?���>h�k��7��Y�
���SD�s
x�����h�*o�x�l�ga��&W3'c�������<�,�J"�%|>�I�A[i�1~/��C��;�I�h��C'h<�X<\��A�]�N�r�N��������'��:�k���'W���&����T��	��zO@P$-�4	��\���S��� Y<2���MO*=�7.��h������I~I���x^��W	�{��S�r�N���P�v�V|8��%)��%d-(�x`�m�q������jfd<�����b��f�Y���p���5����r�Y���\�;O��Lp*������g����h
L�N�t}��1��w��u+�����H��+#���C���]������e�s(v}����Z;�v�\L��	�*8�FZ�������22�f
{k7�Zq@�}�X��d�L ��6�{�J�y�&�r�9��j�(r$<�����b�G6��R|������I�)3�w��c$V��j�(�{{'����m������_����e�s(x}����e���IE���)�-�F��
����(��I�bFM�b���#�7�Z1R�n�������
���Vo�p*����P�������n,�3���"����}��� ���D�t.
HM9���Q���
�V�8�c�`���Y�P�~3�k���'���I��)��7(g���MPG����������\�����V^���)�Z�W,�������k�eu����57����'��}1��KRP���C+�a���]������PK����	>PK"8�JPA_test/head/7_1.outUX���YW�vY��\[o��~���[l����-X/���l��4�tQ����D,X����u�����HJ�,����$�����|�63J��M����������������Wt����M�Ir:����ZY��f +�&Y��X������������t4^����zr���@k��d��x�t>_���T�������l4���&o����������h�n�@)���a�x�^���q({}�����d&��j6�K���d1<�F��������t�|���:}X�/��Y:Y�wW�6�Z�Lt�^������r�����3��?�����n����r�����A�Q�����e.4�({��F�
?�x��������v�i���n�D���}]��t�v��I������2�3�	0M�MkJ��-�"N�Um��5_���U�q:��R�_�����~6��q!	�u=A�~�nZ a\)�A~��$h&=dr�6g��"fG��<Sv�Bz"�$?�'����t�K��PF;�7������
��pF�%�gF��Fy�M�4(����Z�t��F�gr������_�������O��a2]�K|��ia
|�����o>������o���hy�\|�[rZX��zwy����,��eR{��-�hyQ�M�D}&������o_6�Q�{��3ZM/m��B�!$��8,@A�^����3�Y�hZ*�����8���a�@[�?�_���i4�K�l?��t��b�Hg�Qp�8�{��#����$s���&3�#<���>R8��'���%g��F�i:M>��'���Y2�?�6YJ0�sl���_���)��,������
f���,Enb��l�+3������������Yhj�;���V�21�fu�Y��YrF
<�bfM�Y��Y��?�Y�d�tgV��	3�f�!��p�rF�df�!~�����YL������$(�f�d���"f
0/�h�`��U���b�����4����:�a�%�u��@��c����������b�i��`{$2�4kv�� jL`���k�xa�#\�?sE�������b�7=�Sf����d��Qy��O�����m�^Q�F
@����7)~ b;'�u5���X��f_����N�=�GX�'�q�I)z�<P���y��z2���//������_X���M���h�8�LST��#�@���O�O���lEoBHQ�;�a.����C	T�O��������u���	�F���E-���lZTv��v}�-���s(�vPdJ�����<7�[�(�-����0�F����������q�U	�������u-vL7i��}���Ac������S��
�g8k��0�U�a�|�XH����*~��8���$0I���FE���95��&$4����5�#<���(��F8/v��'���"�1�����)�TJ`���I���	�<�9�1���I|L��2�I<���=-�����%���%��D�����e����K!_���X���z:�"����i��8����z7�N
cM��
�E���Mm��\���^^�CSyG}�,P���[(�
�SROP��a\���6���P�h�-9�����h|�aPt<P; d
Zj,������S�p��Q�6t\PP%4W�gPP���m�*�.��((�� cx/	�Zw�,mn�������������C�+��j3Z
mF=�8��f��
E�M{J�h�^*�X�c�T�6�
�����'��0����H��k> jMS�`P�mFkB�
&i�2���Y����.���t� 9i+Q�x�b;�6�<�Z�q�
��Z�U�H�@Q�]�
Fs���hM(�g�dP��m���`*	�rL�� ���rN/�C+�m��h��Rq�F?��i3 L#A2��V�5� �[m�l�G}�,�qA�V�ftVgQA+_����j{�9�iI<B���Xp��F��t4P��
[�x]f�FkF���8��f��
��h��u�0�d�H;9��$L�<m9���u���_��
v���k0��f�fT@;;�G}�,�qA������A��@feP�
�0�c]�0��>k�e	��

��
��|�X�pj��e]59X�m�@G�r����!*`�dq�:��w;0��l��H�*�<y�~�@���Z�*����=Fk@�cD���G}�,�qAAT���P+`)�X���j��ZAJ�(�0��$Xxe}Fes�ul@(�W��1���6�5�����FY�m�@���j��������2����Xb��r+f(�S�+X'����=$��hSg��Z��$��6�Q�6t\P(W�=^�[
�����UE)$��
���8�F=2��
�0<��@{�� �Qh�Q����i����U]�B�m�@��r��)j�=H���~�������h�Ys�1��%�C�>+(��j���@��Z���Y�m���m���P�6[Wt�$��(�4��nGTPtAQ���A�
�?���X+�7^��i���R��h-	�a<���|���*��X73�������-��a��i�B��c�e+~I�g5d����h��6��$H�Y�5S��o����P=�T�@��c��8�u�(�Q�Q�X��u+1}3|�a�=����&��q��������`��&�[�aEmy~�(��x��{�����<��]�v
����/�U�5UN/m��,�Q�`|_��T��?��mGm���[��(�?�Y�����Zg�A�^[o�t�����}����t�3h���/�j;����a�G@k��+���f_�7�}���*Q��U���$y����3��_	��������OE~������PK0B��rTPK08�JPA_test/head/7_2.outUX���Ys�vY��\ms����_�o�g�x�47����]si����O���5�%�$������&�b(�a�z��`� �x�
��k�?~y}������w_�Q����+:K���g�$9�.7��ZZ�\3%+�&Y/��������K��d�}�����>})����X�y��r����3�������b2���&o����������d���@J'����87�K\Sr!L�������_$����}��dE����^M���g����������0�-��6���\����v��k/��d���������axK�X.����I���b��Yy.�}a�V�b�VB��Z)<�3�
?�x��������n��sz�\#[`�^�����L�������`-3:HM(c����&4g��SE�|��^�E��r}��7���d�H������a1�����e=C�~�ln[ a\9N�����~��12�SH����!��'H�)]���H>�O��"���N8��d#�����C�F	����
�W��t<IG��Vv����KB�_Y.P����
�=�{y~�K?'/��D����{����6]����U�]�]^�y���?��D�����N�����&������.��\$�Jj�����-j"�IW(��
����&����`��^~���
u��D�#$��,�|
�L���/9�Y���	m�@��b?�����G�����fuO��}�f����w��_�V��&8g=�Q��!�afvK�����jc�>����H�0��?x�%gh�&�y:O>��%��E���B�,+��Q��c���~����G�����!~5f��,���J������lJz3������Y��lf�Dc\d�q�HV7����zZi#Ga�4���%�h�*&�Y�d��eV{��Ga��,W��L�q��?��j��H����,N��%�Z�Q�mz0�z[E�b��0����3��)`������;dY��g�x��v6����)������Y��b;J�%���@��%�i��8��y0�"�2�^9j?|�+g�H~���F�1m��Y�Cq���=�G-M��U����2{�0�K�*��P�W���6�/D^�I~�L>���������N��?p��k����F���U���d;[.b��%���zJ�V&d���tD4i���j�����J)�!�`*]�������%��,��i����b$R1A�������AW�����>R�bH;�"�%��e�u�|�e�^���C�@�T��Fr�E�E;����0+S��	�/3�j�E;�*�;X��\S�+�;�e�?�Z
��Z3���SH`e��D��#��j����*Dj�gh'7���p��%1�`nH��TFA�������$@5��@5D�@�Y�*Ut4�yq�1|5��OV�)�����l�c)����n
,r��G ��j�
��ws���z\�8��w�b��r~�Z	��~����E�X=�\��x�y.��,.;�� �W�����~By`\�����LW?.n�O��f7����CjV7�uZ1���V��Bh������:��M�����LTP�&���
��T�C�;���P{A�7:��N
���CA��8�MiM(����X�Zg�N
�����(H�I���Tp]��m^�:�4�&_B�qw���>:(�=+�}�`(�:��:U��Z�B���W��d��,�iA���iO��A*��>�`Ha��
������U*��!������Z��Z�T;�4f*mJkB���,U���:�tZP��W�i��H���WZ��:ZO[&��\SE�*���U�D�@^��]�
Fs�U���PP�3-B�ZC�,�iA�T$��� iS��:�a�
�Q��+�F?��i3"L#@2��V������i�0rh�E:-(�J�L+w:x-}5�B���1��c�D;�y�m[�l����t2P���[�����jw�� 9*9�)���Y����+sZv������a'�:*H�.�S1�#�LQOG�O
v���se8s�Mi��T��b�\g�N
��0m����uR+SE����PP�E��[Z�������Bm7�6�PW�E�Jk����w�Bh
��H'�r����0��^�$�00�t,�I�F)�����j����KD:(�������[�C���(hf�
P��u��� *^t�+	�A�5%��#WG!'-�9�|%���>��9�
��fc0���Mi-P�X8��ZC�,�iA�����
$�
"���(���� ��S,��SZW@����Bc����?�)�
����5��"���f������������BH��+x:ah��l��9��W�=H#V���(�G�T�}�UZ3&�
�5��"���f'�l� y:yVF:R��u����X
�.�+��C�>)((5"����b\���5� Qg�5	Ykh�E:-(����$PQQ�
��n� ��]�9Qq��?����+�7^�O�iKJ(�wU��V���Y��;/��Y���Be��f��r��]^�9��V�Sq�
P�e5���� �� �W��yL�����aQ� ���:���A�z�������D����/o�=�����70��f:0��krH��c��m:^j\3z��Ew\R[��3�u��������7y��mh^z`6O��~��������S���S��0���W��,��������Y�L�3��S�%��=�I�������9�Y
������^�>���N"��N��<l�h��a���u4�Wo����"�s�V%������t���$�R��y�����]���PK�sTPK,8�JPA_test/head/7_3.outUX���Yk�vY��\ms����_�o�g�X�e��q�\���4uz��'�N����N�{���.����b���I,X&��.�}�$����_^_�;y����WTr����������6I�������WcE['�����"�}v��}��"9�L��y����/��%S�2V�]�i�\���������$�~���g�����au���:�8��{8��I�*������p������-��"Y��^L��?%(��7���j���?��N��^��������l�����5����7������. N6�������.������"��<����r�_�J1##���/U9hT���e�����7#�0�sJ���H~{�M�6A?���5���U�������6]�R��aZ�	��)�OjFq�%R�2���b�����.]o����b���}����bz�q��{�����a��m����8�E%�1� a�e
]�o����{��$��.^���|����E.�,��e#�V�Af{�F.7[�!�#HW�8���f]��^����rAZ>_�oH�a�����]�9y��$��������������~^���������w����%�����v��H.��59/�������]y�E����v��+Zn���&��tE�L���Io�_?�F:������z��#�<�	�j���)^���%��Y��v�2.2�[����=�^�O6��{�Z��k6���}w?���*]����s�4jK�����-����v[3���A�������#.9#�5���y�!�-�0�,�%�+���2r���r|����~I��C&��t�Wc�����4�,�vfeS���Y��Y`\�Q���d�3%+�s�(��&��?��i�1%�(��&��?�`>f�qf�����5�	7�4��0�A���<�����c�,*��en�	~�d��I�a���l��Q�y�d(��Q��h�`0k}M1�8�6=�!�@G;kq&Z<X;�,��Z�Q�mz0���.c�3
�Mv@ #���@��u���:���Zx$���<��"���i}�!���Y�������IdY��%�#e��az�n7�d	&�&��mJ_�������|L����Y��9������������S+`����_%�[L���"�~Y�}����zaeB�O�MGB�2,���O��=�
�5r���������J��CEf��Q��G~�1�D*&���9=��S�>�����I�u��G
��N
�������u�|�e�^���C�@�T��Zr��Oh�v&6'47�aF]��	Cxa.����w���,�b��p(y�t�Z��s��j��`�=:��A?v_E'�h�{��9�W�V�h}k>#;���|���-�)�����FD�0EXqK�EI6���2A�s(V�����������~�J~H�Gn��i������M�1H�n����E�4`�n3���JiO�r������;��9���Gf"�c��r���e���������4_���;W���Aq�sVg�3]���I?���|�.��Y�T�R�om���V^�������5��"��|z�(��C��C4��R>Q�~(P7e)�B��a�2�F�A9��@��P�:(�/5�*�	����>k
��H�YBAqI�:xpHyLa*�*��&�;F�eQC�(B|������}t:P�{V����P�u(8)��mJk@Ap���E���:�tZP(jo����:A��uY=
��H���
�{J����*�R��<( ��zD(����ZT���)�	�j���Bl
��H�Uz�I$o1J��*�
B2�eS�����3A�����t"P�^��U�
Zq�u��������!W���u����+���o���H�
]Pp��IC}1�p�Z�' Y�G��nH�0���(8��"@���,��Y�������Z0��WP�U30)���([3�H�5�=��X��~;E��d���
��P��mJk	�Hg�pbkh�E:-(�2W�{|�H�?���a'�$J1!W�T�.���3@��\��f��������{~����Y���B�'L_���������0i��FmH��0��'�Qs��n>�
��������P��u���P�8T�2��I���!���v@A1���)KP�R�Pp�~�H���Z�+��6�=;Jk�2�l�Bh
��H�Q�
4�c� $Yz(�]$7�#W`�F��LV���	��a���P��6k�-��Mi�^ASh����:�tZP��6+J0.�I�L��TP����b��m�`,<�u�
jL(4V�j���(�6���5��"���fG�S!@R� ar�JBb�W�����ZQ�\�6S�������4b�~�B9�J���]��@AP,��Zg�N
�j���l� 9�J�H�;��c�HZ�PV--	�IAqD(�W��A�������P��IJ-X�Zg�N
�j��E	���)r
Cm�W@�~�����0���e�=����+�7^�O�)#)R-wU���6���[C�,�iA���l|�cC�`��
W���a�W�	O��Y3+�����3H�N
r�
R}����c�mJkO����f���Y�o
�sL4�>�� ���je��O��c?6�<��
��}�������,Q8��52A����7L�r^l]�qImy~NK�}�S���w���<��_�	��G8�O5�H�cM��K;�<���<KaT!�?���Y����;j9��s<������S�%��=��Wa�����9�Y
������>�xR��r0�D"���l{y�������a��hx��l�1���$����,y�)�>��?	�\rJF������qqv�_PK����
sTPK,8�JPA_test/head/7_4.outUX���Yl�vY��\�o�F�����b����;h
8wi�^�K�+��Pe",K�$_���7���H��"����������<�L��M��������7�o�������Wt������Ir>]n������fJ0V�M�^��y.�?f7���.���t�0�'��}�\H�50�c�8?i�\����
���I��l1���L�W�������u�a�M�p ��x��b��Y.x���E�p�����g�B���br��%Y@1<�A��W������t�~���6���/��E:�����6����Vt���l���r��]>lo���EzQ�7���r�_�Z3+#����LZz]LZ�L_�e�����73�0�s���<K~�M�6A?���5���E������6]�R3�eF�	e�������R'��{��_��]��$����"E����7������@��;����q��m��q�<�SZ2k�ABi�4�0O����{��$��.^HN$����l��Fq��@�!e:>Ng,��f�6���g�G����t��j�%�H���.��������������IM"�7=�&��l�M�x���Bx�'�W�^������(��V����"�|���������my�E����v��#Z.�r�&��t��L���Io��>�NFz������z�PG�I�<BBx��D��%�)^�=�d2
�dn�qn�;@���*���F��hy���4����i���U���'�G�&��!���Ms�W�����u��*;r����
�d>O�������t�H����.��+��(��:���_t��YfMf�L��Wc�����$fq	r?��)Y���:��1�UM��g�b"��L8?
������,�i,Sn�e`����k���	�0k������(Y�rf�Q��h�HAc0����
�t&�Y������Y>��M���AkS3
��`����gV������Y��`,���c(3
�D�;���5���vf�L��$+�A��bgE���@p��y9�jL0�G��y��+����g�H~��G:�{bR�
����v�=�lT�%������.�n�dB!L^L���?y9'�u3�����C���s�;�.�'�p�ph�,b����_%�[L���"�~Y�}�Ew	������T[��&��gI���B�]D^k�G��%�;�������;T��,��C�a$R�@��]���4���*(�U_7}�@�	rv8)D*KH{�>T�.��D��3���2�J����9N6gb��p�fe��?�B����J���{��^P���F����VCh������g/&�p��{JX�qs�T5��
��j
"��3�����/p��%1�($KU`�-���;�+JTs�L�'�}����L#�G[����d������@T���H)�g>�q�UJf
����V��9��`>�R\���\��e=?����^?����U�X>��]<�<�T��R����rz��J?�<pQ{��qF<����M��P����u��|J������smJ��y!PS����FC�,R��7��P�
�&,A�P,�M�Me`/$�����*[�D^�8(G:((;"�
�q���&0�3&��8Zg�N
������(H��q[�
����
������6e��;�in�������z�CCA����dN�)�
�w�P�"��H�����=H
�|m�l)��?�BAjL�%A�zt Je=D~u"PP�k>"jeS������)�	�:������:�tZP��W��
��G:���W0^i�
Bcn���p��/s����}U:(�W�#BA�����y���&5^(���:�tZP0�Ib�1@�P������
���&�J�U�P7�QHN��`����*�%@B�E�GC�,�iA�V�f#���+h������#@�L�y!W(z(V�l�GE��d���
�Ex�����3�1@
��u�����\��o&�
BXf���P�$NO�#�B��.�#��3@r�\A��nSZ�W��
a4��"�����/�
}R+S������(36�$��,�*�������B�q>m���,eW��
���%
��H'���C�A9f�W���p����{;����&�>���9�5��D������ZXT;c+Q�(mO1U�T=��"�D�+��r!��C�b����\Os^e-6%
gbq��(m���z��?7mJk��f�C�9���Y���B�������b�����
��;���������e���1��+�1���6{e@��Z���\�Us��t���Pv�=&^L�Is�D��%���

Q������e5�G���+H�m��=*��QZT���l4��"��n��T6�� y��^SJ��W@X�R��VJ�:����R#B��m�Vy�T���P��3�!���u���Pv��+*H�0�*�4����+�_6"�%�bU�=��������0m%X�Mi���U*�l@�,�iA��m�������+X���a'f�]�f`�R0M�c��| �&��� �� ���N�Wp�Mi��f!!�6	C�,�7��F�
���w�k^7�2���+�w��[���X��v3�
�769���1d�6o�x��h5�����r���{������M������T���,���~��������m���m��0���W��,_������4��*���\���iy0
E(9�9g�g)�����n,�H��v�%2CK$R����������9lV�6�����X�%R����Y��c:}��_�F?m���>�{"/���PK�����
tTPKF8�JPA_test/head/9_1.outUX���Y��vY��Z�n�F}�W�Kh�/\ {'HcE�V��eR��&�C��w��5%�
��$n�9�ns���������������w��1���������@�p�-����be0�?���������iqy��:L&�*��bv�N(�K
�Bq��Q�6����lOA��,M����������E�.��m�����2��� ]�E����?��1J�b��8=O�k�3:���E��On\�?9>.��Nd����sB(��������e�\/_q����
}�Rw�y���&).]����b�4��V�A�!��a�E�4�H����W�_�~����#����d�N.{/�\��F3@���<�R��a�Kp�V�`<]��c*5jS����g�uY����(��)�&����j�����Q�y�\n�Qn�j����^lE�����M�V���2���]g9��JI��}���dy�~�fi��F�A�J`���&0��ym���<
��bTi�f}�-`z����p�Z,�3���y)���A']���;|���4wSt
`�������&z�1����	��RJb)�}o�g��`�p�w��,�@
�R����$)WA@Q\;��5H��z_�ZK-1W���~����5�2%�2�t�
���
��`�-4������Zk��Zn]�?��R7+�u��s	���4���bY=B���D����[���R���>[,\:�e%��4Pl�0
��h�4��^�*�k��]��5'�f�{���5�����O�l��(KQW��L���B��B�S&��< �@
��va`����Xh��d-"��5�gA89iY@0�����`�Y;{���E.�`�;��bT"ba%�F7, M�![�b��NxmI������-`��;q��P\X�-\���|\�Rq\P������,`���mXh���Q:fm�lYp.=�y����P��,L�GI,D�����Qp��v<J��Q6f�<���dv0GqoN�f�&�	��:����$f!`�������>GQ�=�[[jFe�S�-��(��s����Wi:��z\[������[�n��i,I��2.����
,�[[(1��,`��mm�'`c��k7�j,:U�J[k73X�^�mA��q�n���n-���Q���R5�`��6,���v[X��z����U���d�QZ����A�t�$�N���V�����#�������X���0�ki��[b;
}_������M�����+���*��oG����\:�z�����#c(�lx���R]c�=�mC�Q4_3�������G�<�����R"^-g�E}��=��m��H��2x!���FI!�X��y�yK���c�e`q���������"l���,��
=��"w7?U�<�h�>�'�����������������d}�cM�)o[��X������Q�j��~��e�egR�?���@t}����z�L�f���V��i�]��~c��Piz�m�������������8�����b]��.��X���wb]�}t1?�.2�E>�,�c]��.�����G] /���G]�H��}��P������t��������'
��I�s�#�}�oZ�
tA�����mSC�w+���,C���k=���j�����N�q	abhTB��c�u�(�y��;1d$0l~�2���1�/>FBb?o�^�W_�d�Y��Dp�6P����.�����?PKIs[�+PK38�JPA_test/head/9_2.outUX���Yy�vY��Z�n�F}�W�Kh�/\�Y�8A#(�$��@&,�
E5q���3$EZ�(�QX �<�E������"B�s�������������6����N�U<�s�N��2�LA,V���a(K?-/(�O�����i4�W����]P"�\�
W�otu�,M���d{�B��$��_z�����t��i����k�*����(x�BQ�p{���;GI��i��Q����txF�(�O���({r~���9|�f�-G�P����|�y4_��&]���&��qO��u����q���4������#M��L������Pt�.�_iv<��Y�$nr���3��U2��<�sy��f�)y`�+.(�����vR�0�%��R`�KR�b�[��?�n�r�f�V|L�
��A��� ��%�)E���C�b(C��^��~���7}\���e��������o��}���hy�~O��RX�A&m+�5&r��L�1�����Q[SF�J[H5��[���y����j���.��Q!���CM���{|���4wt`�����W�&z�1������q�1�2�X���l��� ����ccl(,�9g�o�<�\�
�\JK��)�ARHMZ�|���YC��o�1�y�(� ���B^,�
����
#s��2n�5l�����:��nr�l�#�p~�f�;�XVG����h�b������������K&��p5+����E����2��*\Ux�UP����5\(��l�f��}D��(Ai��z�gB55��tG��J���@��_M+�J�AY�m}��Y�zX�|�8��ML��qP�m���a�Xh���J,E�r�n��,D����fiR����b���`r��=�aY��-`��q�xo\�BBy��o���(���:
[0�k���7O��Yh���Q��Bq,k��^����A^��W��!����(��hx�},����5la���l�����9Jzx=,|��M:�U3�M��C����B����i
����
���ma��EG��Hm���
N��Q`���E�v�6�Y�uo���-���-��,:j�6���m�D�G�L��5��j��G��j5��GDo����C��E����
���eotr��#��-�vP��m�����{������`
�$�Gi1l\l�n�U��c���{������$FCE���������)^IKE�%V��}_�����&�������sD����#�Y��o�?T
���e4u��J4�}��T��U}��
.R~JF51�Su@�<�����\ ^-�d@����dP����:��_$:C#�����
%�lb�"��[�-'�����U<�]��*�0�=+����W���i��~*��p���
z�������������XMA���4���5�}k�k���?�@r��<�����lL��GX�n����R\$��3�,0��L;�R�����"M��
t�8��i�M�D�q�O�k���������.��o�"����t�m]��_Qm]�Q�n����.�����.@���=�B(i������4��5��a�z���'E�Q�<���m�B���^�m
]������P�����G�=CU}�����N�q	a�oTB��c�M�(���t���"��`���1�O>GBB���|y�^~v�U�g��D�
�������W�����PK���+PKS8�JPA_test/head/9_3.outUX���Y��vY��Z�n�F}�W�Kh�/\�Y�8A#(�d���&,�
E5q���CrHJ�(9VX �<H�2�9�n3WB�{�??�|�7zwq���>=�!�������I����B���8|6e���E��i~s��	:�&�2��<�sg�H����J�����t�'G ��_z'�,����,]����3w�n#��_��k�`}(�Q<���OQ�q���*�����x
x��Q�?�wQ���4w�s��fS�-��8�*��'p0,��n���t����4q'�gz�����e����HS�l�]�!��Z��kCZ�r��!]����x�w�(I���Mw\D�dr�{���4��FS���*XqA��n�����rl-I	�1P+�����������e��5�!��4��OI���kP
Bd��[�(Ge5������"y�`"8��e<�]2�qwi�5B��gC����5Z����8	
ka� X� ���	��T7�H�(�������2�.�����<M���b9��b���U)���Ggm���;|���4wSt`��9�����!<j���2&��U���x;>�
��������`k�"R��p�8x���E�U�@��U�4<��� ���l9��j�s�}u���V:$-+���B��eaa`������0���m�?����swW��������}��Gx�gO���h}���;�:�)�����|��i��`�	��
��%U�i���RUf�T�Y�{�T`&��m�Bm�f37C��#��D	J��k�	��P�3U��	0�(-��+��N�+�Ge�:,�F���l�����X	���`���*�i�f�dH`a$,�����}�� ��4��lA=�LD�PA�e!���5o��Cq�K����[��|\�R���������>�#��l�B[=n��]����f��b!�-��2mM��!����(��hy�C�P����G�q=�vY(�m��$���VW,�ut&��,(����meZCY�����*l�-5�������k�(�s�_��xc���E�v�Z��`�.s��0��lA�e�S�!��V\(��v3K�-[5j����
��V�#b���l�C��5��E�v�vtk9���@ &0
=�#t[�h�,�����i<����^Bp�VK<J�q�b�v��@��.tHI��U��}��zS\���h�S�q�?V�a	�0���Tt��P+��Y=VR��	�l9�u����Jk�A��|r�����B��;E�P��p�kW���{�UC>P�jF��-��[��`�o��}./qr]��*����QM�$zSYza=%�lb�"��-�V������?�U<�]��*��a{R�g�����O�����j�>��A�<�_7E�������u����%���5YaM�jM�cMQZS����$l����r���V��������z���Z�����L%+�Z���vi���e�S�H�[m;k\��F��lS��k�pe����.��!��?����.���E����.���.���n����.���.����b�J:��[�?�.��o���Z��V��������[�
tA��[��c�������Xo�P���54��>�b)R����@�.!L�J(��M���?o��a'������bX�������	���nq�^~v�e���%�r-1��_?�~G99:�PK��7��+PKM8�JPA_test/head/9_4.outUX���Y��vY��Z�n�F}�W�Kh�/\ {'HcE�V��eR��&�C��w��Hk(JN���x����K!�=�?��x�7z{���w}:z�ClG�<�MK��'��<LA,V���a��?.�(��N��3�N�q2)�����;�D���c�����Y�����d{�B/�,���zU�������]%����+�U�cVmGA�(H�j�G��S�%e�g8���[�3:���yR����\R<:=-��N����KB(�/������E���_y�/��
}�3w�y���*)�]�^��j�4�����"-���������Q�������x���$����MW�'�lr�{���4J�FS��JV\P&���[m&�������L�m���f�#V��Y�e���r�>���R����F@�	��B��Qm�j��/.�mE�����M���fdz�n��	F�7O�~���k��F��i�(��������M`�4�X����R�`bk�dTi�f}�-`z�g��x���g�+�������������~��}��):0]��b���z�1���9���u��3���`C<>e8T�]�1��PI!L���o�<�\�
�\ZgN����I��6�S�!��d�����|�e��:P-k���@5@	�^k|E��r����A����������
�cv�#�r~�f��/��'����h�a|K}tS��U�'���������Pl-�M(�u�i���JUf�T���Q��nb��]�.�d6s3t�>��I��<C]��3��0��S�LH%��`��DM�{%��,X�B[�%kY��XH��'3F[�����- ��a�X,��iB;�B�QY���_��biR���%m2�$T�qY��PLx'.�_�l�=	��[[p>�G�8.�����C�q�8�T���V���t����e����BB7�����m]�~Y���X��G	1���[X����z��Y(�m��$�C9J�z��E�*�	�#�d��XX�v2�!���G7�5���za��EO�������Q���������"������z�vrxTc�`2.����
,�[[(1�BxV����QY��j5��G��G�i� ���T=j����v�ntk9���`����-����,��m����eag[��
���Gi1n\��nP�&�t�$��c�Y����)��Hb4�?o�������)�HKE���%^}m��$l{��rr���)�Jk���4)'��j�^��"�r��J4}��T��M}��
nR���T3
Kb�G�<�����T!^.��*��s��A��W�6��7=��^H������M���~�y+���c�e`���y��JW���<lO��,��=���pw?��ox�;��O��M�?w�g����"+6S��%�k�`My�������&��2GI���:[����3��a=h �>f
^K�n�`ld�h?�6�`�k�v���L{(�4��6�Y��q?��f��
D�q�3{������.��X~����.�����.��!*�Et��&�Et!�����.@$��t!�D�tG�?�.��w�}���'
��I�s4#�}�_�t����r��M
]���x[�P�����f50�Q��G�^<x*��KC�Jv��;D���ug��!5�!g=���|��O��XZ�nG��'7Y�gV�(�i�%��������PK�sP"�+PK
�FKPA_test/part_pa/UX���Y^��Y�PK�FKPA_test/part_pa/.DS_StoreUX^��Y^��Y���;�0Dg�K4.)�pn`E�	�W���!��RP%�y��V�iO���_� ��3>����6�!�B�}c�t�vB�2���ts�:vc2]�J7��_�#��L����C�>�+�1�X��W�,��pp���?a5�!~��u���v����K@����nl�+����OPKj�m�PK
�FK__MACOSX/PA_test/part_pa/UX���Y���Y�PK�FK$__MACOSX/PA_test/part_pa/._.DS_StoreUX^��Y^��Y�c`cg`b`�MLV�V�P��'���@�W!��8��A�0]�PK���4xPKT�KPA_test/part_pa/10_1.outUX���Y���Y���m��6����W�[����;\��rv�����r����SJ��l�g��F�����@R/(R��\o6B��(���F]U�����_�����o�}�[����������vu��U����a����%���Ym7�xx�h���������b�{\�V��]�VIN%�����>t���?��W/����qw��{S-v��zK�?�����/�[/��/���w/_����jW��D)����_v��������u�/��/��7o������z]�����a�y\�^��b�\�>,n�����j�&����rs�������Y��rswW�w��^���~�l��iHM,4Uc%�L������J����]�A|��{����P����xr����O��7�ns����S�����[}���o���W��8��y�^���YW�n��}���?.vuk�q@; �:{�L1�H{���%ZI�n�k�����������^g�� ���y������u���b����z��*��5bJo4&(��9�a@�V@��M��T0��;��t6
��w����w�f�gB��4��7�������v�^�����}�x\/?
��S����n��I�L���;[e�6���+��
fH�B�m�HE�"J4�/�5j�	5_��2��C�A�i�}�����D=�f�GZ-�v�L*�����o��(����������l�7�B��%2E
Z���%��["�,����AJ��f�P�(Kr�=�`�jkq��[�=K\es_��������cT`�J%q!%C6��
��A��eiG��d)��~a��
�kM)������_�����	ki�(5'�+��C#5#m�r�m�J)78��M:�����x�q�F{|����8���}��m�FWb�J�sE����
�:`.4������e�#�	�]\�S-5H������zw�
m�����%��N6�"4��sU'�*�%���I �3����X�\�_hY����T�P�l��NP�L�kj<H���
�?Nqa`����v���%���<�Z2�m����9������taop��8l�'_�Yh�G?�}�����s�|������������������%9|���	�b�"�������?���J2��5�N����J���r����������]��[���=n��{���6��{�|�_~Zl�x��X��{D���7��_�[#\
��/��E^j������d��#������V6Vfz}�_�#M��H�����Q��]�#��[:
|���-��������`��XJ7�S4�Qn���9b��� ��)��z��^~nG�����8w������������}U���OU�#��?!������7���������+'>�;_��?\_]]�q�;��k�{`��`��Kg���_~jj�����$�D�[!�����nN�_Ka����������,|J�&&��F>m��Ei
<<���h#Ru�����OY�O� ��2|�|�2�il>t���)��)����V�� �2�@���iV>o�e������hs5"�>��Sd>i����Rcf�)��)���4�������S��O{���,��]7�2��L����	������*(��j
��S6�D� �������|z: 	�����
O�T���^Ka��>S����5��O���O-EKT�]������D�������'}.FaXz
O�)$^�Oy�Oy���j.��<�?5�]��Z0����_O�X#�����V��P�O��|�����g��j�~lK�_D����kc���+t�x~�L9�p�rW��,��|�?��HM����'�5���UP�Y�$��|:��'@"	��Y������|�.9�����P>����0� 	>}�V$�?=W�9�TZ�Dq�?5F��OA����[�n�Y ��pb63����Ae�T$�T�������|*z|�]�X���Vr�����*e�5�JV�O�[E�B>s��f�������"��)U{"��K�VJk��}l$��i6��:SN}�?�v�+���S������)���hj�����

2���$�TL'�H$�#;�����t@�]�^�C+���3#�j4��0� 	>}�V$�F�����������Bf��?��HN������I����2��r�Dd>�	>���OGTsa>�}�)�����V���}��AD�8FX��������2|*��?5�D��YEfZ������R3��$�&�O��#M��'�~�|*��#�OG<`y>�{>����!��O(��8��vv�KAAfUS�$���$"�����������IHF�[ao}I�Rw��{-a�A|�L�"H��~�F�����j��rf�&�Oh�Un��E\�jN���|�|�ID��S��SU�:��������	�h�K�p�t��G������[��B���[E�B>U���R�����"3��58�x�����c��(=>�Fn�NSh>�\\t>UQ�Q%����<����>�'�����bDi�	���dV5I���N"�	�H�zVws����$�"�����O�%��V�_Ka��>S������h���85wwq���$j�N������I4�%�eZ����#��F'�T��.��:���P���j.�������$��S���S��n{9`�m��}[YE�B>���)N^T��YE���O�%hKm�g��O]���C����"��t|tA���,���`�>#TO�������O-�_�UP�Y�$��z:��'@"I����������
GV�?��d��Z��$���ZE�d��p���e�<���X�-�"%���grSzM�����G&���I��$S�OM�OM!���j.��&����������
�� ����uQ�A"|:�*z����
�T$�#I��YE��S.8�������d]I�*q���d(�#�"�SS�z���i�
�����)rQJ����;�]�UP�Y�$��f:��'@"�������OO$������0��������I��3�� ����5�'�����.��=������2�NCs�=�pN�F�	f�u;�{�M��$[�Om�Om!���j.���������(��1��>~��v���e!>o�e����m	��YEfZ���'"hK]\�J���	�Ch(tq�����F��������|j�O��M3^C�a��v8v����� ��)H�O�t�O�D�����nn>=����)��VY��R��h.U	#���gjA���j4�jElj	����RM��f��'Xm�<���O���� 5?�������������,�����:h'�JIH�)��Ew,��`w����2��'XE�B>��<3�
{���H��9E��%������;f�,����H��b
m?��^�O���h���G<`q>�w �n2(����Hn��J"���� ��)H�O�7M"'yJ������O{�����V�[�gD�;
���0� q>}�V$��\���?EN&����Q��
�\�u��N;Af�94m3�	mD�����+H��I����DAP&}$�GA��Q����S�.��=s'p��4��AS�niu�k���)3�����4�h�m�+H�<R��Hz+F��{W��^e�A"(<�{�������`cWOI����Tv�F0g��sX��K�S�mp��hz�ck�"7}�������Z�f�i
�'��h���6��]~�������
v4����{�M�Z����X���a���$��2��QInM�.�[��R��[�B�[��Yd�r9�`��i ��b�h~6�����$g�s<�S�������f�[S�*��n7�P�� �O�u�9�������;_S��<��}���z��>�~����L`��Z�QI�����I��)a.�qx-`G7�,#(������0l%���4
�[�j���I21E,s�G����i�M����i�`0a�I��4jpu�	s�H^3�]d���C����f�s��Taw\�y��Ej�](3��=�����L1�������R�rax2�6T�S��B��C�\��7�h�����5�?������uG�L���=D��l����� ��'H���g����E*N�jv"<���Vm��V���0D�
�����I���z*�����?2���)`��)��i�Ai�nO�D�.L�e�wA"}\��kE�iY��w���^8.r�&�8r~���Hf�(�E����P��������p��g2iK��������T���X������
2�X��c&.�|.����7Yq�j��E��ij=X���� ��'H���g/����(�E�]_d*�v��+"t��b9�\~��Z���$��Yh=��o��p,�E�1�]�72}�N���(���qy�FA"m&�2i� �6
�N5���y�����6�$�C�<���������YZ(t��z�p�o��I]6�����Q��DWjT�A�vL���;4��g�C4Y�I�3�_����}.%)E�Ki�f�o���p����*&���	����Ys�)d��s�������.�
�V�"�<��u������S�h��x��	]�w��"I���9�EoG��\�HW��i�L�"H�+���M��~� C-j�?c�I$�2�����M4<��/��w���?H����:�K\��j�/ugx�T��P���/�OM.�E�$1�'I�$.�����s��3~�����5n��s�@m^��U=A�\4=[<�l-Q.����nA�r�i�����YY$G��g���F$�E�B�).r�K��/��=��S��q*n��[��Zl�!�=�ir`0!
�I��49pu��	\��[�%���Rjh�{�����I3]&��5�.������HjE��"���E�Pj�M�8���\�wSh{F�r�;�X��c�u41��1�����eM~<��*&���	����YB�)d	ID��������J�.^���"�"���F$�E�B�N���a��-V�qM�n_lw���8���K�������97��L,����� ��
��������u����T-��x��.�!��3�\�����V��7��
�����{�z���9'�D�Z@����"ml|�9g��S0n�2�rTL[l�Ptw��>����h���-u�O'n�j��4����d�W��{��]}��b��}X��ws��������u��rswW��^�V�5Y�/���;�`��lp������-T�3��\CH[��`��w|���8P2���2�*��pN����b��c%��$dS�<
2�7�5*�����G4�����o9��=��u�$~i�f�5P����q��3�fwt����?Vb�y��(�V���,�6PQ���Q6�B8F#����	D��3�3� �|�>����m��K�3-tP�� �����O�1U�����]Q�9;X����*uC���6�Iuq_F��>'�d����p�����
����h��WS��^��*(���G�;b&�����$�
az�.��P���*�������z�����yT�Q��{xQ}��WN����n�w�%o6:�z���PK�b���,�PKW�KPA_test/part_pa/10_2.outUX���Y���Y�����7r�����o�R
��*�*��w���8r\�|rQ����.��r�����=���A�]��2ar��A7�@wQ�F��?~�����}��o�����������P/����[F-��:b)!���b����[N�ln���_/W�����8l����H���pYS��E�����[����<�o����awW���',~.���������x��x����f[n��O�2r�S������7����,��x������#\�Y���?��<�w���+��[���������vu���d[���Onn�����{��v���Z������������>Z�V3e�cu�8%�r�V��j�k����n�T�����o�?@��������V>�n�������O����������~�����.A��y�]�n�Y����}��/?�ec�v@�C{�\s�*��A	�%�[�rV#���hE�6q�e������KU��<B�zr��kQ{Y>�{P��cPeNb#���ZcR3M�������bI���EG��tTc�z�Wa�������y�mb}F�J�^�{��\���oW�m	
�����������r�_�PL�e�J����Xbk]an)�Cj�Fi�5�q����K�v��ED��+;���P��kn� 
��8SD�^P=�
��m��D�oN�f��fi`t7^�Fh�(�qj�|�94�X"���%J�Df�����(fL����3�,��5�8d��O�[���t,q��}Y�??n��OQ�0Bs�1�a�D�X��QBF:��)�v��AK#���hbM�`�b�%{A{����S�o���%a������/���v�M;O���_��&\�M������T:��n��r���q��Y��a����\�a+�~z.MY�+��`Z��]�l�Ii���A�YJ�e�������my[����6fK���b��})���0�H���>���#�C�-�q
����Z��[iJ�U�S�
����*{���]*h�@��aI�55�%�Q��r����u��Y��N�h�7a�P�6]#��+;���t�tawp��8n�g_[�2��O~d��E�=boC�|��N���������m�����59~������~F�C�����.��^P"e
F��������Kc+�^�{���=��������~�����`3_�����������G5Y���{yS|������&.���!NQ��vX5��7R\;iJ������l�/���+`������Q�H]���[����V��_K��YG�=`K�0�C��!�i8�e�����R��E���}��T�?7����Mn���h��������	�w_���S��_�+�9�[P���.��|8�����a��x����'�����0l|�����G�
t���&���������	H0}""`����/�da,��9������Ux	��h�&�S*��O�k\IVS�S�g0[#9�X�'���|�#|�I���S�S��O����j��������+F����r�m��_��.>�n���)_�OLa�F$���1E��S�Q���j3���>u�z|s�/�9)r�)�O���^`���i�
0�8�|���$���8�0��*���j��S>�D� �����ni>=��Q�uYa�O�������D���Z���m��h>�P�"�:*
i�?A�1����"8Q�J�����dLZ>>�$"�����������93�����BwU�#���#�Sea �vX#��e���V�� ���������1E&��ZQ�~lJ�_DZ�l$\:B}�����f�^���`���'<`~>��&�z��� ���FrN��.���,�&/Q>�ID<�0�������II��K����O%�����5�z���3�
/���X���S�E-�O-�an��L�(�*F��)��Jj:p�.1����A�y�TF�T�Y��T���T������f�q�G�1p	��.i�a
w�����V�� ����T1��������"��)��`���U���4n��)t����._���R����Q�H]�0?���Z����!]$�v�i�R�U��E��%��r>��'@"�5��-���������[a�OM5��^s���>S���x4V�i��*���S��fg!��E��J����	
��m�I���OU�O� ��<|�"|�2�O�Tsf>U}�)sD��S����>�}s�����t���)��O�[E��<|���j���O���QE&Z��2��R=��$����p���~��,|����r�O�0?���O
%L��0���P�n��
�����D�T�'�H$�?���|z> I���}+��i}����0B/>}�V�%�?��4�O-����j;W9w�E��J	}��g�8��XRZ��Ou�O� ��<|�#|�3�O�Tsf>�=>U0g��)p+d��^c
��&VuXc��t>j�U�>���zq>�Z����*2�:CNE?U���x������Lu������|����s�O�0?��n}^�9�0������a�VA^U��(���$���D����[�O�$\�1�o��|*�8���9��K�O��Ux�tLc5������G���R�� AE�|���p�r�PG��^7�u4&��f�DL>5>5����93��>�������(5��x��DC��]x�&�o�"��}��O��|
���:��T�����?����s�E��0]�
��U���g9e��c��OG0?������P3GC`:�I �8V�VA^U��(���$b��D����[�O�$%D7Ya`}�V1���F�%����*�D���h��Qp'��(E���u��Q���hi����r�z}�F�������6��6��tJ5g�SX�����wC��d�J�b�@/P>�n�����
�TF��N���tT�i�TH$`�_��L�vI�O��H���f�b<���������?`~>m��)���!�?�0FS�q�VA^U��(���$b���c
�ni>=��em<�c+�����)ME#���gj^��O�k4��S\����SI��Q����sZ���)e�[~���^G�"|�I���S�S��:��3��������(@+��WZ!��.,*�N���y��-��T���QE&Z���D���}e:]we���>}��������OG0?������JN3]C�Z�cD�VA^U��(���$���D��#�[�O�$�L���
��w���F�%����*�D:��M��F��?u���O�@|)<�)0���������|�h�O���>M��������9��S�9/�2��S��mz*
����n7���ih���VQ�����a���i����t9���*2��(U����z��]��~|)��3}��'��3��i���O�0;�v�C/���0UCUl<���5Q��U��E��%���M&���%O�:�����7 I��4>�����%��y��������Z����t�F��?N�����.�-w�Fn�e���:E���	:P����f����g"���`� �'}���b��G	�b��B��P�V�;/�U�Cd7	�a�U\�(
i��f!���2����H����!%�p5������l��^f�^(<�{�A��SU�QJ&���(�����
�W�:���]�+���im�iv�"f`s-����U����2��'��)U�����V�m����7��q��<<��F
�7V <|�I�Z����X�)��jJ�����S���{)}�_��-��*��/$�����Q�/;�c��L`��@�]c~^5B/����$O�)$y�����n�ZS�(��[����@�����6�M���<UN�0U���[MR�w�T������[Uo�Tm��9���T��Z���Q=�]�<J)��4Nn~���KK��z��c���UP�Qs3A��m�q�������8bT�A���y�^;�G�
f�by���H5vu���@b3L/��0f(��2�����e��A�/RS��<��������D��a*m+�/�*�VD�8`��j-���2�BN��[�R�o�����%Sc���f{���J���V1^U��(�O#��B�0���������M��c+l�������02/"|Z�����h�?�Ttc'�D��K�!�H�	<�\��i,���
fjby�w�H�.vu��\H��x�C0F�^`&�F��xw�[+�p�t5�>��E�'��mi��:Q�2iJ��"�������}��d��r����lq�KpQ��J*Q�������$���H�/���K����/bO!}Q��8�b����\t��K�Gn�V�"�<�k#���g�����e����Nj��������R��v��;�
�)$K�7�E�F��A,O�(����5������3�
��5C��p��(�:������6�NWS��<\�$r���/CA�`R���R�r�L�P;4TF���U!o�z����d���/�KpQ�LI)Y'S���[j1}=Nj�"�����K�����aO!mN���]���;}\���9���au�|�I��X^*���zl
&����qh����n�\�,���XM�h�/��&^G��+b��iX�|E,���]��h�Y
��6��f��1��a0�p�;W��H��"5�>��E�'R��~4����Uy�ZS�� &�*FG.���.lr.
f�ai����"��:Z����P9G3C�QL��>z�V1^U��(�O���B�� Qy��7/s����� >!+;�"�;�E���!g���=����A�z�!J5\Dyt���
�U��h�t�-�c�'��H�6������"yr��yrfpQ?q�����i$Z��u4���
��W]�>�M���3��� -��FQ'S�/��6���f���$��PU�/���gt<����5�:�ZG�Kp�Q�+�s4���;W^��*��������iB�SH�a/����N_iI�[Y�_$��I����D��Yhu�|t��B���4o��AM`s
�S���41ER������T,%�#���9B��.A�����:�Z�^K:����g�Vz�!��S�\����;��������m��r�t�~����k�Y-X����mlz��d��s0e�Jm	���b���U���6y><�P���M��|�\x,�QjlwW���S��{��]y��j�>|X���ws�/N����u��zwwWb������l�/���[���I���jn�����#T�r��X�h�!-�\����bbGHSJp��c��g�#f(�|K�\��o�.���2I/�l���������������r����<K�����A5P�����o�	rq�����Z��No�q1��"�dS�g�$�44���/�g���gp���.�������&0��@FA���}><�}����a-��LS��Jf'����sR���Jd{'�~��b���7��D�[�0_%�K���,����2���p:��84"3�>p���S��	g�X��^���+��'�:�p������KT����/�)���}������v��~��]�DpZ�=�(��&X�@��63�)���u����PKF��N�/�PKY�KPA_test/part_pa/10_3.outUX%��Y��Y���m������W�7��,�_TQ���I|��t��R����sZ�v�5������=���`^H`���]&��������@��$���7�^|x���_�/^������n�+����a��Q����XJH�?��n����������
���j��=.n����|����X#	�����v��x�^����������u�||�m��-Y���/�/_��^��_W�wWW��u���w?R�������r}]^�oW������+�����G�q�,^'~�zXn��� G�X,��O�����b����z���?����G���l����������W�/��7�h}Z��!����)�������Z��Z����[/U��W����?���7����A-w7����ns�����\�?T��ky���w�������V����_e������������bW���``D8��5��09����%������=��A�8�2���Lv���to��z���{Q{^�n�-�w�9�2'�S�����f�
=�1�h�a�������ND56��x����������3��T�����K�}(>�.�� ����������Q���
m��R�U)m����u�}�����Y9K8��DS�����
*���YD�|����zU{���v
�����(K����q��Z���Z��V8��Y���S
�������f�'�B��%rM-X�dAKd6l�LA�_
�Q���Z03��\b�}&��Z[�|���e��l���j^��q��r�
\y�1��aK
*�A8�w1�~+�X���-Y�
Gqh#������G{��������j���Q�E��Dh�R��A�f�4�0X	M`�=�����-�K��?l�`�W��5�����?�x����6SF���AT����yfz�v��"�K���3������m���TZ.%p[(*��D���4���X}�	C��
6K��������[i���b����j�����R���>�.�q/��Bh���!�3����^�N�x�cE@�h���sl&Wv����-BM������q��O>��f�A}����~v��
T�9�;j���?�/�b�.�?�����'�Mf4+��1�����n�8}6`W������Z���J���r�(�����O��]��Z���=n��w��6��G�|�_�,����P��G������_�G+-�a���E�k�US�{#���1���f&�S{+������o_����Nic��(����?��G���vw���/%���#h`��8,��e��`tGM��ei�eV5y�X.o���z���>�=��&�`,}��+��|�e���?�[~W�����������p+���nqw
s��]U���]_^\�a�����=��o����1����?�k���������VHO���H?���F������Ux	��`�&�S
���Oq�a�3�jJ�|���������_��)��)�%��Oy�Oy">
�R�93��.�R��-8��S��;�����fL|yg��t�Ut.��S>;�*k�����|:��4|�������fF�Q>��	t��B�O:9� 5F���`���G���|�tL:��)2���3���
�2���D��O'�H$l|����OO$i�������0;��5�z���3�
/a���$|��0H�3G��1�Y��"��`��`���dLZ>>�$"��������?Q���T�����?_r��� �=���IhaY�t�Ut.��S1;��i���"�O����R���F����!�B3�������G� ��t���S��O�[�I2�Xk����H� /���K�O�tO�D�����������u�Xa����}����D���Z������FS��r��`���8[���L�xHZ�5!�I����g��Oe�Oe/��<|*#|*��OGUsf>�>5:V����(�8�V���b�6��������s!����T1�.����tP�i��jI0��.�Gic�������0��*��L)}���(����?0?���ZE�M��.�_����0��*���j��S9�D� ���lPws����������
O�l�a\��a�^"|�L��Kdag�F���U�S��0�����?�R\K��f/-8)�8�TE�T������*��*��tT5g�S���b�w�?u����3^����w��p�T>o�y�T�jU���AE&Z�g�`��qU��;0	r1>�F�q��P���e�S��:���S����@*S4w8NT�hI� /���K�O�tQO�D">�A������T���v����G�k#���gj^";C5�&���|���S����c.�,�����:f�Iiib>�>��$��������:��3������	�����
u�#���
���	[���L���[E�B>�����2��tH�����T������(��}e0>��������3������?����n��A=~}�������j"�����eV5y����N"�	�H$�tPws�������u�0����%�5�z���3�
/����M��0�������9�D�L�����0�n1�c����#���D��������&��&����j������&����2J����S�!���T�j��������<|jf�S���|:��T��a"���p,������*����<>��(�s}���Ss��j�hHb*�UD53�D
�2���D��L'�H$��?�����t@RB����������F�%����*�D���k4��(+�~��bsV�u��B���7B�lZ��l��}�S�K"6����L��c�93�����n�%������t���x��Fh{Y���������<|j>��s�Ht}P�i�TH+��/���&��j`����
�H���f�S�s}�����`�j�)�;0�aT��V��Y��%��v:��'@"�=0���OO$}�O�>��S>��8S����9��K�O��Ux�tLC5�&�WilZC�����T`:�'�5�=��iI��E��������.��.��tD5g�S��S<s�Y��y���S�X���}����2��x��\���n��S�r�O�h}_�*A]j��t�tMl$��}���WY��.>.��t���Sw*��4�5����QbS+���j��S7�D� ���tPws����$�iN�=���*�+�R4�z���3�
/��i�F���������P�����O����g�#V�Q�,$)�2�Sx��D��������V_2��tL5��SF�|�iNj�Z�h�2��K4F�&��U��<��'XE�B>�>yf>�.���AE������}i?��6YH�O��?�������8���?
v����?0;������3(����D��$q��U��Y��%���CF���-O�:�����3 I�&5>����� ��jt��������Z������M
�.�_���Q��I]���&���
_��������75c��N?I�z�<��X$}��>Jh\M���s�T��#��EQ�������u�p�f�p���3��n�*C
g����)�!��`&9G��s2-��"�Px�v.�A��SU1�]^0<EQ���V�
�����Q�p�*ho��>C�,�.\�1>t�*�]:p����]�d_h�p*�U�{�`�v���=.���0�0Y�n���7��������BX��$�I�;�����u����OexD�����c���8L/��<���	�_h����Y��Ktj1=�{
I�*����,�@k*��v��LY������al�i���s���*��+#�����S�N*3�g�mU���2U��������Kd��L��Kd�sTVe����q���:�Eu������<�U5�('uBA��������Z����Du�e��N����z3f�<y�X$��8����L ��i�0f(���e	����,��P������g63{b3E�3���9�����*�V�R���I�}�Ph6s��9��lsq�����p�\L9<�e�f`�F�8<V��I�eV�x���4R�)��
���O;}\�3�ke��0-;��9��K����c��m��b�_��`N���@7n�
I���Oc��]�7S����E�w���wM��@B-��}��=����DAw������2�WS�B.�?��!��"���jk�/��VI�������|�Y�0�Kc
����H�9��Mj%+W��hIG�r��r�V1^fU��(MO_��B�� q~A����\t��K��4]+�������02/.zZ���[F�?��Il�����9W
f�BaB#�
�h7�K�6o���b��X��Q,�7�]�7ju9)�QyZQ�w��1V�1������A���s!�
���y����K����>X��\0��
�H�K�y�g�l=,M��i�"=������dJc5��c[m89Z��yX}V�x�r���9�)��	s�ss�i�������C+�$�?�GH����D��Yh=���x����7�e��()h�CG,����5��M���W�z3��<��X$_�8_�.�&���0�YoP �Z����xn�)9KM�y�h�BJ�����#�W�5U�9���R�`CU����������YbX�,1�������&��C~���-PV3yGx��&�b���/Q.����=�t-A.���
o^�r�i�_��XY ���>%�n�P&52/.zZ�q�t��U|t���"��q*��N�����L�:7i�!g�<9�7#
��'�E������L��n��a6P��Jk����Gk'�������<�'��s!���FQ'S�1CU5�V�:�@�8�����*�=��)�n���1�:������X<G`�f`>Y�*h����F1^fU��(MO��B��H�����N;}l��w�,_dq&�_s��=��.�����=T�qM�fcls�����O�����I��z�n�KS�T����G_��N8�F����%���Sh�K�x�k�s!���
�\�4�Z]G��X�8����[����#4��\�j��d�hThc���9s��)Vj
S���'`N�'����vY���'����j���6w��,[���^��Go,�������{������������w�����z�[m�d]}��o���dN
��zWsk�d�
o�r�� �A�u)���+��
S��U��M1����
S�����q���"�)����4I/����������v���>���r����y�8���7����
W	�	���i��X�X���[�-�Kd��+�d���i
F�_��H��S�4'V�s�C&7���72�B���������F��d8�B+T����t�p�����b�i>�3�W�(��;x�&��*�"�K���L����:����:Q�;$3���
�/�����j�a4h��K�gE�+��S�eG����.�y����D��|	�}��@�s�Mx����z�Z����\����E�M5�B�K�R���^�����/^�?PKC����2�PKX�KPA_test/part_pa/10_4.outUX���Y���Y��]m�7r��_1�,�,�_T�U�w�;'>���J����N$�v�5�{g��O70/$g0/$0�u�T&L.93���FwQ�F��?~�����}��ou������E���ns(������-��qA����T�����-��?67�O��������qu[6w�[m%�9CHhr�~�v��x�^����������M�~|8���=Y������_��]��_/�^���l������RFn*9�����~�Y���/�`_�y�}�/n������o6�����
������au�e�]6�-�V=����}��p������������r{��{����a�����L�X�5N	���UY�*s��W����6���/}���?}���.����iw��8��_W|*W�^�+�v{�!?�~����vu��gY�e�{�������:�5 4p@�C<P��e���xP�kF��Q����x0Z�^L\�����.��N��{���������S���?���I��pQiLj���#SF�����^
~G�d6��q�������{�_���6�>#
M�� ���.���������������v�i�Q�����
(&:2�W3.��2�Xt�6�R�V\���(�YYmh�����J#*5����Wv]/�� ��. 
��8U5�H�P=��������~���2�k0�	K���1��)�|NN��5��D�z��l?��pCfbd3Eh��$$��C�|Z#n9�i��U�;���/���Y>�
\X��1]�����Xn10��j�������E��d�@/�V&�H�����������v�m���r�	Rh0���v"�,I��9w�*/�t��=��x�q�<���o`������x���;�����^UVc�e =��T�C���J�U5DH}T�T�`u{[����A�5l)A�	E�_��'
���x��&X(G�5I�G+�
�C��%���R^h��\iG8S�P0������ ��Bh0��.ga���q�`n����=�]��QQ�)�#��R��t�9��6�����;���&��/z�!�t�%�;�?�??�W�b�-�?�n����O�&3���z�Y�m&��@�x�5���v��*���r�$����������w�/������>f�x����O����z�x\���M�����JKk������8E������)�2N�V3A��Pf;�����)4�o0<	1�o�z
@������_5���p��/�Z��A��
�TJ������
�O�
wH�� ��)��r��\�g��������0����}M����*�}��"����P��������_��|8�����a��x��[�������|�����G���/
77��T�P�7=�?��Bz���6$��a���T���h�&����)��	�0���4�O=Qt\�R�'�/i�)��S>�Dx~�#��'������n��Oy��W������A~��
�$����0��y��tTt>��O���TYc��H��O����2�,�I��`F9�?�O��p�"4j;����Oy/��)�������1L:�3�
I�1��h��*(��j
��|>�O����o\wK���	I`����?e�������A"����"H?6F{4	?���~��Qi��V�Q~�(1�`��m�k��1i����S1�DD~*"�Td��S�93?=�SK��w$Wz�����	�U��	#,?����y��X���i~:��D�S+|�c���"��?�A"�]h4�xq�J9��=�5����������bL�,
t�ZB�D6a�dQ5��S1���'�D"�tTwK���		��w�
��),��E�5�D��3EE��a���T9�zZx����������S����,��0n��g����?��LD���2�Oe���)�����?58V�|i��e|�qX_�	c��5�)���NGE��<�T.�O��z~���Ud~J�K�c�|+\J�d��RF��F�v�)����������������#��h�LhXeKI�2�dQ5��S9���'�D"{���[���OH������?�e���� ���A"�i�G��S����2�Drk\4�T�-�:#)qV6�I����*�O� Qy����S��:��3�S���2K���j7������\�����1�������|������O��>���?Ud��}���4��yA.�~p���#���4,?U��Ge��Nx���T�����
iE��o8d�4���,�� Q~��3��HdwTwK���	I���>vu��?�@��M�� ���A"���Mj-zh}��m]��1�?�����ic#�9z)�������S=�Dt~�#�Tg��N����Tw����l��Z!�n���� �U�v���L������A~���Z�l��!E&��0����U[=�dE��Oa�H���78M�2E��Q�H�?���T�������?UCF�] i4MdP"YTMA��T�g"�	0����Q�-�O�'$�41�����}g���n���a�?}��1Lc=����a��p~_)�u�	��s&��U����[�����G:��D��d"&?5~j��������t���k����R����>�a"cGa�i}[���A~j���x�����Lu~����pl�V�I6p>J���������2���d��Nx���������9���a���*���,�� Q~j�3��H$C������������Px�O�!�Tx�� ~�LQ$�<��i�GYI���4i�O��JKA�i��
��dS���?��L����6�Om���n��Om���n�%��:�����p���L1?����y���
���d��"Q���"��S����w.�*�)�D�t��a��'�T�z�b<������:�����0M1��
i���2Rb���dQ5��S;���'�D"����[���OH��&�1
;�SJ0T��� ���Ab��#=�&�wilZ�t��9��w�a<+�#T�%�����C�"��
2����?u���S�93?u]~jd{>J�:F�O1�<�(X��-*.T&~:���S�|��pY��c�L��W!��o5qe:��T���y�}���WY������������O�Q����i�k�Xb0|�����*(��j
��n>qO��D|d��[���OHR�&�1
;�����$��a�?}���M��4��h���S`A���6�_
s:8t�a�f����=@HR~
��^~
�0�iz~�������:����SF��T)�d��Z�h�2|�b�a��6��:����Tt>��O�//�O����T���G���~hU��m��t�)'N�Y�Q�I�����=�5��OG0;?m���T�j�8"�F���J.���,�� 1~�^d29��S����na~����lK���O%Q���� ��O�+*�Ds;�h��S�����}����`	���*"1���i*��4V�~����t�L�|,����b��Q,U����	�j3K'��\��( �WyO��&(#��4���&���[�wkUa��fF��j��x�X%����+{���*D������ ^�T�0���S�U�}����]��`���g�8��K2���u~=����g��L`��[�I��D�Z��qC���S	Q�om{�����q��<<p�0c_�������k����c����)�_c7�Lbe����erd�MU�`�I/9uiqT�������E���0��vi�~Aa���b~�'��<y�u�[��T�
���VeJ��C����&���u����B����p��K�N)3���;�:�&�`U,^s� ���3Ey�H�G&��qr��<K4�/������ak0��9F9��u	�m<���D�3�XG0�:��������:jl�b�SG�E�������X��6��"mb	c�9���D���e��I�/RS��<����)��I�����Z����8P1\�T��0�L���?X{G�R<�\���2����r��e�f4w\�9	X�UL�E�$�����bO��T?��3�s��g����#��3Bi��"T�5��D���z,��L����LE;���v���)X����T.��4���+5�<��X�~��~�^�SP4�D���h�����FRbjO�qZ+����j�|��-_P���"��/eR��MEIc��q�
S������2*��`��"!��EmQ+��Pw�f4Ll��bbO%�*&���	�E����P���q~E�}������K��4]����J�#D��dA"��Yh=��oq!/��Ee]��;������(��V�<k�"%K[7�E�F��
A,O�(����5�u9)gHSWL+��;�/��f1w�h�_���A����A^�r���/��.���>��T.��j��*��;4�&�g��zX�j=��Ez	^�SfSS��<mq��)bdb�YT=A��h~������"f��E�F_/j�J���i��B��������c�h��x�T��{7�S���`v��ix76�>Z�^�L���+b�zE��zE3xQ�����Ns@�
2�+�����V�L�r��:��E�R�B;���j�����V������8bB��h�&g�E�UbX�*1�x]d�-��I~��������x��h�V1AUO�(/�_��=�r-����+������>&��CY����� �E�B�1^$��\�����m�b]�h�GK(Q1����}4n��!g�:9l�"
�S'�E��������E��5�"�-���G��p�40��z���<�g���A^�x�E���"�&.�[u|��Q^�
����Q[F�S�������������x���9���<�"����idQ�����eB�S(���K��s���l(���7�!�R�� �E�B�������p�r���9�$4=�)@=�K���(R��
�`��b�,��AOn���]p)t���I�3�%�dSh)H�����y���P.U}��[�u��N������{�+]�a0��5�U-X����z��t��d��sb���B���u��i�	����	���O�V��D��XT�>������u�����������z}���=}��f_><��w�	����������]6�-���w�
���I�l��jn��Mq#�F��0���>����?0����p����R�n
F���F
�	'`>Q)��y2�7�-(�����G�a�x���SN��.0��;��L�RU��H�f@�i��\��X�tF*�K]$�Zd��;�l�.����~y�E�>~O�3�{�)R�=po22������C�1���x���B��H%#���v=��f��c�)G,I�w2�{Wk���R]�J1���4�R'&��;���D����h������n��i�l	+�B��%��"�*(���O����%�I�� �)HT��?<r"�P��{j�K�:�������������[��	4���E��__��G�Z�	0R����:�z���PK��8�.�PKS�KPA_test/part_pa/12_1.outUX���Y���Y���mo�F���W��,v�}���s���:A�Aq(E���-D�\In���7��R��eh�t�@� ^�\s8�y83;T����?��<�!{{v������<���� ;�]�6Yv8]�7'�p��d�2Fck�:[-]�@���bsu"�Qv8�nn'�l3��O�w�1���r�����zt�w��WY��l1��~��W�������*��l���������'�o|y~���_H8q�]��8���l�_������jvs���w����&��|���W�-j����*����q���zk'��g!�]�>��u�v>Y,���L��8��.�W���e���63�$aw�}� .Hz��P����8E���h���Jl�G���������[�6�j��X�������-k@3�
�J3��<\��|�
	�<�**�_���%���og��pe����
��P`��m��q��o��Apyq2���j���)�3��U�%���`K���u6���yvzs�/.j�)�<��iR�i��0�Tl���'�����l��q�,^�p��b&�K+�TM�F$�C���I��Mzs����\�z8��������cv���:�|Y���g�s���?g���E�\�Y�wo���JF����u83c/
2b��e
�z6��+4����� �o~�������W��7�_������?��t3��������/�����v�/���U>�g7����������W�d_�d��{����3&��2h�3���?�A�Zo&�7hKD��&X0�}���e�s�5B�z�K~����4�%>�~�:	�r�4H������v58�k�f�-��-}9�*����(�#���w�qw�>Ha�"p���s��MbB�������M��$���y��@>�%��\nX��v���u�(�o�n����	�d�@���Ni������r�nL��%5�on����GpM�Qe��pG���Ya��97���zj���a�����i�v&��0�T
���#������NpG&�#����s�IV&��4pG���|���L!{�t�4v[����	;>8KU"���&.�(@��`G���Ya�����r���;�c��	;�k(���J��0[`����;*�5.v�}����L7X�;�JI������)d����������,��0���lK��9*���pG���]a���\� +�p#q������d%��6�B�00�-���	+�E���w�����p��1u�4p�XL���a�B��� mi���?���o�^�i��G2|F`v�1;��'����;��)��s[���;TU�c�d%���.�fy��#$
+���|��������;��|b2���i�������Rc���2����AZ������m�Q�#��d��s�GO�S�Ng�)��\�R���U����"��D���5����@1�v��weUY�f�v�����pOs��u��L���hV|b��L!{�t����Ee+c��V��LZ�[�E;��`�6�;*�SL(*L�`�����Q�:dYzW6���,
�S��5��r�����.�7.v\;�1���i���A���)��2����A���A
����^L����I'YT���!��;����
G�H�TU�{�O�����0,5SV����1�1p7����<O7��K;>�?.v|
;��%U�:M{Y���Ky���Ni�)f���c�L��|����d���8z��:v:+���d\���#cm�+�����e;YTv�Sy�L%�6�������AZ����2�n���AA��:�����2����A����7CD;�����Jur[��b4>v���)�S��bG�VR����xO�����R��*E�l��tcb�(��]q�;�IP'����)C�O�S�����a��#AR��-S��:$E��5��hF
�qT��h��=~,-�8j���-����f��������*��@|�#�~=�������~����n
���
�SB�$4����J��J��\�V�K�C�F=���( �*Vv�o�
���0n?44�C{Ui�Y��rMS<�	��zS���[i�,�r�1�^���`��\���
�:mG�U9���;���e�,B��6b���������Z���n�����Psq��C���C�D�5��v
��k����ujx�:�5N1����i
������
��$��cvg�
���A�wX}F|k���l���w7�d�����}����7
���:�m��R�I�]�Y��D7���
��n��ee^�NC���	������2����A��,pC�o�f�Q�#��S���~�8���Vo#�n�"C�_b�${�I��m�f@�]��oi#�G�Q��J ��+��
���-�F���C���^��|H��:�vNM�����L!{�u����
������Q��
s
N�vq�������0�������;�v[|��&�.�S%�[Va�V�M���%&g��Lt�����P�V����t�:M;��ow4w���2����AZ�&�`o���7���f�g�J�q����+��a(�����������3�z�}YYH1�X�w6|)}��	|-���T������jg�(��A�l l�?�E!{tu����M��Z_p�%�?�/�*�����Jx!��g��py��[ �^d/�A3<G�+�����������PK���
�TPKX�KPA_test/part_pa/12_2.outUX���Y���Y���mo�F���W��,v�}���{���K����P�*�Yr%���p��fv��\�I�h��k�+
�;gf����#���/.~���:�Gkr_����r��Z�,vYv<_owg�p��d�2Fck�:����A���rw}&�Iv<���f�l�����W�3��Y9,�,��[�yr�W��7Y��b5[.~����������&������+/,�[|3���7������>8�[��4[.V�b��������zq{����cI����u��^���5
sn��*����	
��
���g��7��6{���V��i&RG���������i��mv�$aw�}8!NHz��P��;��i�Gh���q����l������y���Z������:(��%uw�)��@�����������'%�����M~����x�~�[�?l����o�|K����v���"��-K�p�9��Zi5s���0�8Y��ta������u6[.�ev~{��.�)�<���b��bF
&�K�X�*��:�^g?��B1�W��#��(f����rI��q������cPx��\��7wK�������j�����?fgYn�����G~���.�%{7������:��p�V2�i�����hP/@0.�p�D�n���4�q�ah��7?e�����������_�����W�}vz����z����NN����M���7x��$�}�-�o�y��
�����T������9�����9����<��p��f���8�����nn�����]�`��z��|��.�������e����h�����$���R� �c�}1���~�����7��:�W�#��4�-�pK����=���G)�S�.
8���6�`��8�m�"��x���fg���0S�f��q�#��bG4������uZ�#4aG`�����r�N�������d�X1*=Rs�%���-�Y���Qm��`G4��[a�"1�����j��I� 6x�v0�-m���$�+�a���H�{�Jg���;�����;��'1�iZ�;J)�Na���D9`����c8�O������Z `x2��c����`G6��[a��(�P�����`�z�� �
���(���)�"6E����	^b������Q�bG=�p��n�Nv$�;&Y�F�L�vzHv��r0v�g2&YaT�b��Y�[4%��GO���N�9NI�E����26v� ����Di�6�2���d)+P5)���e��q�����bG7����iZ�;�)���q�M�����eY
������/�������]��=wt3���0N��e�"gRM��)�}��c���
�I���&�P$�j9f�]4����������;���c�r��f��p'����i+�p�D9p��tpGs;BuG3����K�/�4�I��ONGO���No�q
���4K������1��D���5x�Ns�C��0(����i�Mp�N���w�#��i-*���781�2Q��!�1��1��V5Z[�Zlz/��d��{{=vl[Q���T��\s��Act���>��U�T�w��,���p�{f�����Ee����;���q�jZ�;Z�pGx���7M�wzH�:���q~tc���L:��H�4+���;����
Sy���`*������4��h3K�J3eE:��[9*D�;BZ`�W����x����Op�O����Q��5����P	�#�Nn�(�����2`�;;6l��*zp�K*��Mbg���=v|;���aG��;�T{����x�W6�.���c��������/N��o��>%v�7���2�i�V�P�e��9��[&�;=�k]z3F�SwT����x���hz�<\��R����1'�u?Y��bq�{"P�X},��,
�����3E�$�.���wPGIP'����)C�O��e���*���e�u���,��p�D9P����#���o���.*�Z�J%Z'�c$�7��[�������w��n�'��-���n~��+P<�����]��zx��]�h�[K?to��k��g�Z�6�cUP���,s9�����(�~_
�����C�x3nP����i������Z�Pe�6�q���,�`�(�������J��9Yx��W�]�c�������T��xk���m�[h��������=\,�{���E5�+�7��������$�6n'$��a��kh�]{U/�W�i�
��;*Uy����-�����7�8��A)�;!�i����=���������!�������$:rS�o�4���4@��[���CIPC�TYo�#�-��
��wC���;i��M��Do<<D�,����-�����g��#zP)��	HGo�2'�C�q����G��0�(���q0��>z3@u�*>��>r�$s:�UiE�i�7�F�[����#�F9=�[����:mO���'xp��6�-�����Ya�o4ZQDoa�+HAo��^��u���[�]��ao�R��t�v_{��&1+��C$w$w�A�����cPS�o93��f�]�mW�f�:�(o�N��&UL���[3�2Qx�!x�R��������[b�����xV�� ��t��6L�$������b���
�>p_V����F_��W6���~��	|-���T������zg�({�AL_�p�m���b�-���CR����mj���2�.A����7Y��|~���'�}p��
�w�I�,��f{��-�3���7�]��_�a�������PKu*"��
�TPKS�KPA_test/part_pa/12_3.outUXc��Y���Y���mo�6���W������wi�����P��`/��uw�m����f�W[�"[�S;0`z-Z���pf8r����??�8�1ysv��S���<���� 9[\/vIr8_ow'�p��d�2Fmk�:����@���bwu"�Qr8��ng�d��NO�� �2`�l]���
~?:�'���$�f��-���������r�^�vi����x~���+W�_���gn�|�~8N��U�����8�|��Z�\�/��Z����*�$���e�0
s�=U|�Vx�D5N)�>�e���w�y�n����l�J/���lv��_�^B�x3��8�����aB:LHz��P��r'�i>�������|>�1��?��y���Z��m��u*���c�
g�W��.�0���
\��|d���<|�������/����m����z��?)����?A�8O���R!�qN��VZ��Q�pp%���I��D5\g��2]&�77���10��g�?``<L
��������*�n�XCVp�,�p8����X+�����a���y�Mz.l���%�������{j��H��?$'Ihn���������z������V�zug�}t�C-���{���O����q&�PM&�,��t��9�L5r�������_��x��~{��?�O�=������7����?�5�B�����q�� �}�,�m�y��	?���./��/_��|u�<��s��w�^�e��d&��/�A
��f�7�Ld��.�0�c�J���s�5R�z�kz����T�%@�~�B	�2_�qL��/Z�rph��Cn.�a.��I�,X��u�~��{Bm	W�qw?h����P\�~�F�0�w\�<��������;Z04��������i�#��A�N��vZ�#���8�p�d��N�������d����$5����}��(Z�n�O���N�c�$������F�c�����=�e����qG
.�����L�.��t�=����w����M�8�>^S;-�Q
�g/
j����L���!q��
wKv|0��U��Y1Zg���(ZO���N�cM��A��j;	v��&v�Pf@�0�;ap`-��ta\������;�>v�<O���6w��j0�R���p�d��N���VR���()��B�f�2e��M�h=vT;�L��6f\�����C��q^2Y��Z�h��g7����/�F��������i�����fLS;m���w5��P��5��;=�+���{;����/��Y�#�<�@����z�����{���B��b�O��PRD�������.dQNd��B}�"h�=]����v���1
�����oj�
;�S�%>n?\3����C:��qs�;Df���Y$��������Y�z��&vz��,�)k�-���������,��i
�hnG�S�G�1ca\^�d�v����MooY$���i������U�5��;=�;�9;FJ��"Ck�1���ay�	�c�R�=�]�9���=v�*�t�4�FD��1����MpCG7����N)�v���qM�h\2����h:DU`$��p�d��N�:A5���:��h��L����`�bX�z��&vz��	5y;� ��>Y�E�t��t+G����dq�������0dOi\<^��-��v����
�P�TM��`�8���>�5\3����C:R��Q�p��p�R�*�����,��6�������M��0�v��
���&�L��<���[ei��2��h��7�$�vdU���v��c?�;����hR7����5������d��N�:@��wo������m���VQ��;����L�*������v���r;��
��hcQ�h�Q�MD�+����|pu&�D��a�2eh�){|l�AV����2���;�A�G
�L&{���u�r��12��l-Z�L��q�h��$����6;~����v�>�mq��[�?���W)~�;���e���w�����V��TC�Vv1.�s�,�-�����2��.^
m��/X����q�T�
�q�)��i������ZuP����D����4[����L�t�!1�I+��j�9Y����]G��;��B�:k�����'�[K�uo�P���"�&��k��]�%�=��Q�,Z�8�Z�,A��������� Rs
��\C����z���No��N<.&d���L�x�!!�0r��t�A�"!��Qxd����x!�'�[Kmwo���l�^)A?���m����E��V����n�3!Y�r�����rd�Ej�a��nh�vs���M��xo��y.�Y�5��o=���7��F1K��Y��(��G����F������!������A	���<ao>������?��ri�a��,v��9�u#�-RC���C���^�����J;m�)��y�->:;\3����C:�7+��sF+
�-�z9)h-�*�+Z�������]1�G�O�7�����I���T����#pR�7��z�z3��|��U�ik��Y��m�u��v��3�?�������U���o=��oR��|�|�}sL��k�}+Z��Z��{+���A��1��_�b��~�]��/+3�1�Q��f�e�>�����Zj�^��v[�M�0~��+ ��]]��&����L���!1t=��6�Zjs�=� �\����$�=���)�D�?\�.����B!��� y.Z�=�_YN�����o!������PKW���
�TPKP�KPA_test/part_pa/12_4.outUX���Y���Y��\mo����_A��%�^���
Q�u�������(���������)��;���D.M��R7����w>3;�$�?������������?���v��`����b�Z���x�������[��e���fu����;�������L���x6�����~�J�,wR2��bh�S���-~?9�'���$�n��-����������z�^��i����x~}�����������'K?�&��:]�����-?�n���UZK���o�m�&�^w�i�s�5U�O�<�`K?y~�j�3�?6���v��[�����4�#.fw��M�!t�f��=�����nHg7$=>L�>��R�I��~�pL���~d�~�����v��6����>(�}���0��9#0��9+p�[����������&��l���r���ew�&]m��'���}���O���nY�����j���+�!�7��~4�?8�f�e�L�oo��U�1�1<�cx���;��>������v7����p�*�����/����V.����0nc �u���5����h����?o�8�L���*�0���%�p�}�)?���=���/���l�l�����G'��%����~�g�����1��	��b�O���qb�������������������ty���_���|;��m����<�/���f�o�+b�����m:O����!���������I�9K^���%��+F��:;���O�/<���g�[��n�A���Y�'O��FVXm~M���?��Y"H���(������O����(����7�E0����Q�4�M���P��U|����rO�w
d0��(��� �q��R���x�wP6pE��/-��"����#"�#�����_!�������`��pd�h��u������h��2"5FC�v$���0����4�#����a<�r����0o������=�e�����j�=��P&����{*}GhGFhGNK;�I;��D��v�SH;�Y���L����8���62�nk��K1�	���c
SJ���3��l�No���2���n��^3�g������h�A"G1�/s�����b\��L;*B;jZ�QiHx�tZhG���,�O����;�N�����;���!��F�0��,K�Kz��b������a��@�j��x��P�q�@)�m��Za��#J��M�B�x�G�����iyG7����TSG��w����D��L�����,�j.w8bP�����GT=`����g���;��SH�!S�8�v�,K!�e�c-2H�e9��rG;�l��;��A�K85��1�1���i��0��Mt��;�f�c���pd�h��u���v���f��R�j�?D������b��c����a<�dY&5�
�	�����A�`����1kY�@�k�kK���.���l�v���c�j/Y��k���L��w ��0�`��a�c��c���,J�������,Ik5������vl[M���x�'I�����vhQ/O�tU5�FD���e�u�d�����)��k�.B;nZ�qM�A�gU���2���5�pd�h��u���A�(��������&YNQ�RV��g�������AoQ���K���DKYZT�)+�jG`���_�v���E�Jx��������ii�7h��(�j��B;�q�Kp���;�N�()f��i�f0����f4aY1z��M���p���014|`J�Y�v��|�lM1\�K����#�#,2��U��W�o��|J�����&:-��	������L�����t��j������a�G�Lj'MO;���.�,|U2S
���,j;���1��j�&Y���HU����ES\��`���)��m��lS������
���2�}Kdd�G&��uzX�u�r����s���Q�R�u\� �!�h��O����?��� ���Of��M��@��#�q7�N�����]P��[K7to`�jT%%��U���UB0Y�r�E���J��3�L�B:8����*�

�vCCK74�h���
���5�eU����	v���%�,�r���E�z!�E�"@K$��h<Q���[K�uo`��g�H�r����l�X��i�2*��S�cFo��]�v�6$�t]��]������^!��i� �S���J�R��	v���Ao��QrF�E
��r�Pox
��(�����������^6�����=]n��.Kc+������S��\bY��r^�Lo��n��������0��N�z���J����09�`~�a]m��pc�7�o_�j!) �oT�pY)>}A���"�L!�4m�U{����s�fh���
�w���d��NS<_�qhk����"�i����EN�uk�#�B�m�
uWJ2/�ph����u�7+���F+
���z�7<V��[}A��lW��T���N�j��f��m�d�LOi�]���L�	)E/J����K��vu��]������L��i���-�l�I;2����:�MJ=���/�o�w��jO�q��/��Z��{C����v���[�L��/����J���<����C��d2~-��D����q[��zk�T�����i��"����E�u��u=��6�Fjs�����\����$�w:��P�����b}���i"Q�K��vG����^#�������B������=9:�PK����
�TPKU�KPA_test/part_pa/14_1.outUX���Y���Y��[[o�6~���[��<��-�4[�n���E1�)�!�����i�EQrTKk�h�4��P&����2��A~�������g�������;����.J���u9��)+���������z�Cq^^N�<b�����f�,��	(�nF�����l�XR{t@?e�E1�f���=��X�Y��Cko����m�(a?g�e��j����A���lPa�����J�W��A���X]��5{3�������������rk�0�7��,h&���K��x�mK1�;!�R������l)�\^f�K�����9�/��-L�������g4*piUgF!rC�sI�f�tZ��\�T��q1'�g�</���L���%�v�d��2}0�}5��l���e>?O�<�]�Ap��X������\("*J������2�������eN���fs��������#b�S��Z�zMX�A���pG����3-�����m�kX�����U�������X��>��'��>�X������{��'������Q/&�I��������zI�(/7e5{�i1���x~��_���m�fq~{� �&�	m�E*A�{��Z�*:�_����] �g���z��������@S�#�]MS�^A?M�[OZ�k�e�_����4��p�&�*�P�����#��]0�2����Z��Z)k���Z!�1d�!K���
�� gI?�Tux�%�� ��Y�BS6"��@e4���9�-�TBP|t��i�8Hr�9�6�6d���4�M�{�D�����
����(c��n�>��X� �x�]��B�!�T�{/MU��
�h�[h��'s�R`,<�{�B��;�vL�����
�9�&�T��Emo�%�3�N��[)����,�����t�(�SP�0���������.�FMmCS��]�#-�d���^�J��b��K��	L5>Z��d����'�d����)�N"xR���(c������:����>�~�V)^�%�?��R\�HS���E�l %%lm����(c��o����U��"�D�T2����6�������:(�OS��iH3%��y���-Z9RU��QF@Sw2R��HY��d�����${oM
mAc�$���N����T�G������sU�O�.�FMa#�w����H����}4E�m(��'�4�	J
��Q6������t��^�OX~K(L$a���n�h�S2*�����
���mhn��y[�B
�T����	���b����gN!�$U�����X[xN��bo<���P�zxr"��s:H�{����%Q�@���1���bbQ��h�(�����8�47b|	��4��F�n���vZ�m�Z�������'Q�����f)�A��M����@(�;��P�QM�H|���.y�r<����m�Cc�����^L�.{e<oqR	
Z�k�:��Kn8%G.:wVp��M��� x.�����<
zx������:�SUJO�5����(c��F5��UJ�������9Z�x�6&�Q���~V$�����r�4����c���D�.h�.�{�;E* ����
��urj�&v���E1+����V��� :i���s���f�������.��;�>$9x���TB@�.�����d������Fr������*/� �k*�@:�J��	0�-p���_g�	�&����/c�&�������^zG��@a����g�]l�����z�`������=���7eAS��d�E���6����_PK�h���4PKU�KPA_test/part_pa/14_2.outUX���Y���Y��[[o�6~���[��<$/�\���uCWt-�aO����f+[�_�CQ�\���iZ%A(8�u����*�@~}�������g�����~�r�^��d�x�������po8o�[/��L��]\��%O�q6-o�+�y>�8X�ys������'G���S�^�lV|�����u~�����hh
�%(�cV^���C��zh���j�gU�nC�<T�S���F�m�����f�-���}w��n������d�� Mv�'�������Z��\8�/���������2�\����"�@�{����������gP+piUG#AqDR�#�F/�,�P��*Q*e�_.��Y���2�_|v�"�n�l��_��~_
v6����j�/.�2��<B[���z�`�H.
�4W]�{/p�Z�E9�V�eN��M�[.���[nx�iw9(��%�k�:�Wc����1M���}���]gk��������?����u���}:aO�{<�T������	{���}�w�.Gw1!���=��)����Q^/o�J{�q��O�X��j
�O����A����hM�lC[sA��S��H�5���Dn�pqi�����}h*��97���h�j�j��=Xd�9JO����|�������69U����:�F�`e<U
O�|���Sk��=�
!����N�*�.98o��G��OQ�������0�!�qi�1�T�4���]ESI!r�9����OS��`0��b��(�4��j�9�����=4%KjM�8�(c�)�^������7^rW'-Zi���SS��f��dx+���o�M^������k�����"D��8�(c��i�(�xp���1���6*K���DO'�9
��m�~��OF�S�M�)��!}���\�<�C �2�����S|Y���J ������"ZM�X�0E���[h��'7�>YG ��CS��B5.�C �2��-�:����r*Uj�vM����\W��=������KS��L9�%/��CY�UE�H�����u��sW���<����5�}Nc��7{�V=Z�����0J����76%+�C5:�C �2����"e����J6)�'�W�:�5E�8�&/�d�:M[h�*R��C��������q�Q�@S�J���*�w@��M�����2�������'_������Y���������fs��;Ba"H{0�w�X)]5M�@y�Gs���JA5647����ci��5Fr�M
F�{����*C
F��6_ak�i�R���>�������A�����-�C�$�h���|0��p��wG��X[�{�`�HG47�;������v�����6�s,��(��}Q��
��8��(c�y�G���h�����9��IX��.�������y�g��jp��O"=�[W�c��I�1��m�I%8V�b@�M����[��9Gr���gH�"<���F%=<���q���'������(c��VO#��UI���yR��$%��^���L��@�����B?�R�����>��������b��D�&��.�s�;m*�&;o���G7J�t5�[��q{Q��|�u��--���|�W~����h��\7���7��[�t�C�*]�F����HC8J1��X��j����tu~;��K:��������$B�c�����Mv��1i��9�2Fm)��!���KsMRU�N���V��P,��I��O�����=����eA�����������;9:�PK�� �4PKV�KPA_test/part_pa/14_3.outUX���Y���Y��[[o�6~���[��<�3�4[�n���E1�)�!��leK��w�(J�jy�4�����:���\� ����w�������v��N�E9b��yQ2v<]n��T^9ay��7����o&��..����'�8�������<��4Vq�4��e�2[.W8�!B�O{Q,�Y�1g�����UV�����}4�ztR��������:���ZyY?���*=�Y�W��&NQ;����&_o��Y�X��gL���*�]L�w�B����e����On-%^�O���������O�S�-���e��f?/�E�	�%��#p8��}�h>����a�2<8�F�<p���ho���������;���(����g+�&��&���*}���j���,G�W�|q��)��D n-���%��B-$W����8��{�;�:/�y�b/s���4[���m��%Gp�:�Wi��^���+n���X�
�8%X�v���6GNOo���.6������t��@�T�SO�������l����]�bB�U��v�uSf����^������r��|�~�j
�O����A����hM�lC[sA9�;[C��U��	.�z�8�4J�����>4�
M����1RhG4OV@�����hq4Bj\�
MA�G��M-x��WJ(��S�6��B�`e<U
O�\����p[{
C��T+nd@Hi�%L�0�<U���&sZ���)�'��<�C �2����Rs�+�J��>�Z�I�gN��HS�X7�Z�<Mu��:n������b����H��������r-+�o����h�A����B���wi���^��6y}tW�%��!�>s�y-R��q�Q��S�&Qh�%�*�q�HY�C���So8
�1�vN�g��X�'����&Q��fJ�D_p
�����j�1�5D��&Q����L~_[�7�Rx��� aj���o�M~�i�#�����U��H����~���������Z��q�4E>�Rm��5{���#M}�����r�����r@���8�(c�ih���W��B��	Ur���^�Z����)fQ&aj�z��-�5M%�����{�S���F�C@�2����*e����<J6U��'���  Q����TO%Z��N����T�I�������x�@e4��t�s]�Q���6�����b�E�,���X`)|�G�,����
���rz�9c��0QB:w0�w[Y�H�lS-PA�����8#�
��{�4o{Y�(SYc�l�0$��o�*��6�x����a/k�q�R���^���OA��}a�\���q�%Q�@����S�V+@'��L;e�[aU_BQt�3��bb|E��4�F�n+�{�SR�j����K.)����=�2���4�9�#s�������qO($ZrjRS�88�\�7�s-��n/��Ai�F����GG
�j�'Q����'������mrX�{�r�K:������`��n3�
�j��}�9�P��Uq�%Q�@��^�6�WE� �}��B�p�I(�H%<�jn�7x�=s.�����z��35�P���;�^�q��D�&h�.�s�;m*���8�zx#��o���E1+����V��� <i��_�%��C��^c����^j���1���s�L�*!�1�t�	�
]T���i(�(\��No�"����-���D���c�����Mv��1i�������D/�wt���z�GQp��[l�'�����0\;���#��.����R�J:$'%��������PKh��<� 4PKS�KPA_test/part_pa/14_4.outUX���Y���Y��[�n�6}�W�-��	9�u��M�i�&�>�Z����r���;Ei���������j��p.��=��������y���>���r�^��d�p�X��R9e���p�\;�Z|X��P\�W�J��lR�dSV���*����#��bI��t����<��r���r�_fe��Z����m{u����Yy���/�j���A�/5��K���N/��G���F�c���Wk�f�����	�}w��n�������d�� M��'7L���9~[M��S����S_����2[_�_�<�������
��Q�K��:zD�
,�g��B���tZ��*Q*e~\�	��i1��2��|z�$�������+�L~_
v6����r��/���$�� �QlX�k�t��+�m=����
���YQ��%{���w�l�s�i{�
o�!tA���T���<�~���$���d���xc���5,os���:����|}U,/}���'��>�X������{��'�y��.Gw1!N��=��.����(�7e�=����G_�_@����E~��Y\��7�&e6����,r%k��Z�*:�Z��!L���Q����w��lhJn�k4%7���4E�-,�������	S�����CS��!�J���8�=E�8��i�1�T5<�ZP�<���SG��C��T�@���q	SA����Tux�%����o��wJSP�Sw�4�h�-M%EpW�TZ�Ps��~��	��zd�(�4�M�\���wJ�>oj9�����Q�@S�F}Z�(��o���.ZP!��7��-pt�+�hj�{��-�)���S�G���S�����]0�2����R��<U���T)Kg��"E'���������>OM��*4Sb �S0P%�4�i�1��64E�U��1�����$=DS����\�������B��>�q�j"`hBTw�4�h�6hJ�e��
�&E}��M�������}��Z�ki��4��eE���o�] �2��6��]U�+�HTN����mIU5R�>���������4��Rr���M�3Un��[�i���g;R����TC�fG���4�!pM.�J�"�$_��i�B��#���}������V�R�w�4�h
���X�P����������s
��!aH�I%��wh�E��YZx6]L��',�%BOD��X�mc-�2�N���/i@�}H�������muh��k/eS���@�Upk�C3��F����s2S�����C��)���7��Up�q�eJ����m�6nQ!$j��V��6�����[*�M:������Js�no4����er��������|S������D��>���wN�8�I<W~�DU���
�V�#D~�<G�?�w�p&4�����"2�Sm��&��kN����m#N*�u�/-L��0������s��n���>��Ex�7�wyT�p��J��>�[���"4��LI�1�|����R�jK���9)lo$B����m*���s�vo4��H5S��
d�F�*;7����LI������������"�v�~���G7�Y
�Vqg�^�2_}[hEyK�J�-��/�_M�us�=�gg�)�m�t��TB@�.��2S)E���E�j���~���fr�����
N �e%����\}d���e~��	>�����I�p �C�;�K���P����������e��������m>�)R��x�Bk�zX�����PK}�3N�4PK[�KPA_test/part_pa/18_1.outUX��Y��Y��Z]o�8}��������o��lg��E��`0O��������t�R�DY�l�vV����9�����(����_�~��x���?��������U�n�8���t���:m���I�X|w�qW���o������������z2�<O��f�X�*�%������)k�i���	o�� �~,�����O���W��dSf7�%y&��+Vz#�^��TO�P���zS,Ww�j��sb�O��f2Z�����_���~_L����c�b�O��s�5���q�\`�q��q�1E9�J��Z��\���j����Z=q�1��_�~������)7������l�y%�K��\q����� ��M�����E-5�����!HM:AF2y����\�������T�����'"�i����lQ�6�cd=oX�yi|1�{��T.�jr�q��T��t�4[Z��W������Y��D���2p�q^�q�_�����X.��a� ��bZo�My�1�F! �k�m�8����D�TL��X�� .q	��o�X�N�������%��.q�<�����W�K�pY������\	�����4�>�r�u_^���X�����EL�N��n���KI�8%Q|�%m�2B���j�HO�T\���&Z�Z��G���Sho����e���V)�a��z�����'���p�`�17B�����k��)V�q���s�im\Un�(/D1C�LA(>W?�c�m�0�'��p���� ��d
�,�L�w������`x��q�)$���>W?�x=��B�q\�a_��$I2�~��O�?3Wy�{��S��T\�xo�f�5�5v�}�h<��2\����!�Z=
�F�s���Pb��/N�i'�#����X�'��W����L�I��l*\�S�^}KIl7��J�rlk�ck���}�f���=����s�Y��L�x3�LJ������u=�/o
�0�/���"B
-�" �?��PR���9��:�_���D�j�VV(&�����&RF)F�H%A���!,zE�@0���RY���/!��vRD������:x���
)�C�Q;�@��`S{���p�cg&��-y������z+�.V��5y
9��&O�"�1���&s�a	�8E�DA��������5�
�I��_����Odc��PS�F=~Y���S�S���{��D	�����������c���t�7'z��i�8Q��dM+Fp�J�0i-.������z��zwQ�HQ�����t}��u|B��8!O���%p�>N�����HO���E��[�U�<�>���4�wb��qf�#���E�l���q�D����}��N����#qB6xQ��-+��N��1yZ�w�^_g?�<E�CA{��r��e���!9=\���{82�������Y����\6~f
�Z�5�����81�W����vkOI�n�(8��\m�[�P8�G�J�����J�W��x1�g��l1���[F1�Z��=�B�(�C9D���1*al��$m�A��>U���N���|���|S�P���{_��<Ylf������P�~ws�xF}<����H�����+���Y��
�x�7ck{�y(W����c��H�K�<���#;���sfiv��	]@�:�%2|oM]���V')�z~[BY�u�~>Y,J��/�������Nr���(�OV�������V�\y�d;���l���daGR1���u���>1�'H�z.�*QM��r�=��]�`@�����#��P�J���.�ep���.���tWv;;�%�bu�+I�Ps���2>�/hI����2�,����	��k&Df\0D�����e&;���,k����GN��Q����Itv`[���Tg�a�]\��X������
����e�;���8�c����3�W�O�������t9?��������J�����F�����ef;�3�s���TbLg� ��d�K��p^f�3�Zc{ �q�"��+9�����`p^f�3�1�m�n5AB��bNr���+
geF�3�1S��Dg!��Yh�z���������:�3/�Vg���NN�
j^s:��o�]��r��nR��]���b*�����a-����z?����r��kZ-Ag�����L�p	���sU���Tg�����K�F1b�<��QLab{_����e�;�\v�9�m	��{b#R0��itv�u��23�A��3�D}gG�~����13�m�?���y����������s����;���.��� ]mX<��_���m�U��;�����7y�[���`��CE^���>=�&�<��p���,�#���o�E��L��2D>��q�M�)���q}U����>of�S�J9c9������������PKTuO%	*4PKY�KPA_test/part_pa/18_2.outUX��Y��Y��[[o7~����:@K��
�@�Mw��.�	��O�V�Bd��F�f�r�=�Hy�y�������t�}��_����{����~����o�����qUd��r�/n��(�cB0�-2n�����-p����+n
������Xg��1���r�7�Xx���������"a��c��u�=<�������"�.��pe�#��9�����	��_o���.��?ql������n�����"������/����-?�����������<�������E�$��J�J<�UW/x^N5"\�',�"�����������/y������V��� �_�������o�'y��m����]mj�I�� Ii�3+,�LL���Z�d\U�3��l�����' �i�!���&_�c`�nXg�qi�|��{���o�jr�F����Ie��-p�IG���;�E��
	E6�{�4�n�W�8�/���b�m7Y�a�RZ2�*s��q�C��f
u3�:�sq���L�fci0�K�
"�S��-/1�E�)[\��.�qY���y�q���Kp$�|�(.�����$';�N��G5��4�u��]J^2�Q�;������"�4NI�H�rIY��`�1��7��r���G����`�H^)Mg
<���e#zTZ��`������N�-/7����E4F�5�g�g�dX�b����P^��K�qU�.��{���
�3q
�=e��
#�`h0�}MBa�a'�������{L�������8�t\���\\�x��a��
yE�Q\��A	\�U��"�l\�x�PPM�Iv:W�.����L!nsq
��1��������PR=�[�\��_0��$f[="�	=bW-����b&��x��G���?ea��iM����J*��r�^~K
J7��Rjl����%�i������7������d4��7�b������_����&����q�R�6���iy2?�@��|��Q�a�i����O������fb�nM�n�0����!����B�0�&�$c&���(X'^B���&��||�6X�Wk�����qj���M���#����t�����|N��b��z+�>VB��NK���S���L];�9nZuN��n�iMjl���B��3���C���I#�F�N�=NK�T��8����B�"�w�N� ���TCyr���t)y��C�$�S��8�'	�+�)�A�># \�����������Q����qJ&u��u
'��]��(N7�i�kmMZ����A��!�Eq7~�<�������2.�EqF��j���}6�~1���;�{'�]g$���8m��d�##���h>:nK�CwMx^��0t�K��c)�^,DH��'5���p	��|d�y:^�OP���G����N82�/k��F��y�����Ws6t���'�:�=p���C�a5���l����6ib���N�|NkF!�������,����A�#Q������p���V���#)4���owU���mF��}:G����K���j]�;j
��������bU|}���}G]�w77������H| IdH�����m/�B�/�f8�^���U��K�#B�l)�
�������H�N	��v>�j
��Y����-����~~���v���z�������w�[��
9���yQ�_���#�^O�d�t����yFL���I�3��G3��R��}m^JV��l�\�U"�Z���:������jVW~[�X�8��EP7Uk/�[`�a>_4��g�}�e*���J�Pj�:-�>�b ����G��e��A��q�ib<3�R������j����
�:�Z������?zf���^#�x,~^f�7�0s�km�L�bF�K�u{�6�`���To����Ug	f���b�������L�1?�4�$�W��%s&\{^;�p���LocFm&�::Se�b�3-�������Qkl7$�5�.")��@S1��������1f��jb�P'"��G�d~L��u���2���I�lGg�&t�|4u��K�e�A������:���3�}3h�d�U�G�����`ocF1�.Cu�� X#j�cu�E��2�A����Q��Y{}4��_���	���|U���do����+��1u���+�55�	�=��L����Q<h#���g�08#�w���&_9/3���q��Z�5��_]��T�`�l����23������T����M��r����$]�{zZ�~�������v��Z0���z����,�f�P������R�e�	������e�#����WI~O�i���vz@>l�6�a�MF���q�������/�����
��1����-������PKAV,	*4PKW�KPA_test/part_pa/18_3.outUX���Y���Y��[]o7}����:@K�����m��n��&(�h��-D�}l����;���h(zdIy�y��6���K^�K���y������)��}���4`q����sU��=�6Eq=]�7��X�]`J1V�{��/V��������������W��d��N��f�X���$�q���������'������}_����������~�)��[��L��W�
V�g�wi�SA(�Q~�)���r����`����f2Z�����_T�;����n���c�b��������[���XX�	m��L�W��\m�������"��~��s����|������-7�����v6������|\�Gp����N$��&�������QMkk0��PMy)�bj�j�b>y����f������j�S!�q����lQ�6�c�z��.n�j�"�{��T.�r�q���:h���;Z"���5����$�a�wLX��X�y^��,?��E�\1�>:)�����n:��AhX�L����������C\J3��A�G� .�>Z����X}�^j��q��A��\Z�q�9�#���U=��r�`�� #�q��U#��ju9��f��v,Dz����|���y��.��{;��o%�<�$U5��b����
"���
q�d��,����+�,�W�u���gq��
� p�2cGZ��{&���y�^a���I�Nbn�:������]g�����z�=����gpa�#���I���D\C����@^84oE�9zpu'����^0�M� �8�����56����k��-|%�yK��qW�'����$V�d\Co��l����_U�p(�	n���{�5�}K�`�|f��x�`��rs��C}���(���Q�
�eAuT$�Sq��{	��g�<�iF���/��H�
�a�<N��}�����fh��cl�0d^��u���T����N?���Ma$m��d3}(��d�o��}ySh��0qHR��%+CL���'
E�z?����8~����a�Y!��[#��&,��� ����� ,I��,���ax�DC���3��$*N>>�	��Q�VA��]�)�M��%O�/�f�r���	�����w�������j����S���L*��y��:��PG��C�
�t;&V���$�����G}�����3�H�szb���L}Q�z����P�S��.�������_���Y����)�?����C��se4����pr&�e�t�u�b������b/��qjD_�z�9��_���.�3q:��}�z�j~z�%qQ���_����Y��{������L{�W�������pN]t~�'�5N�#��7�n~
~��h?Q��.L4���`�_�p?��RDcJ���h~"`r�]�p?B�:^O]�!/�gb?
G�O�+���e��0�ll����s#�n{���S�}8q����*J8��Tn6�T��V�Q���L�~�KFE>M��y���g���+Q�C�����������#����+����ns��r=�'�:=�����?���r��p�}�|��v���6_^�� ����f�E~<����{(Q��	�W����^g�N������d�P�:�!������R���d���
�41��{H�����9������M����Kk�.��'�E���E���z;�.�HN�I�n����Gre��:e���e{+#���>r{��S���A������e��RK%!�d>/��sI�U���H������u
kL0�tR�,d{`~�4	�������TN�;g�[��-�]Xjl���
}%}^f��H0s��MC�x�B�i��!�^wy<�R���T����c
LbF�3�u�Dq�����e�{��`8�3�|����������23�F�f<����l���03N% �9�j���l��Zg�C�<��4"W�BE�o�=�����e�z�3������2WNa�7
�Sp^f��H���@����B$M���R����2�F��|�y#m�����Xf�J���t�H1�t�������(B���fQ1��"��2�F�YPL��f�I�f�q*���!a3�������F��2m�XEW!S(F�?HZ$��E6Yhq^f��H1s��j!�u��)f������]�q^f��Hy}���Gq���FAE
�>w�~6�����L����g�� t��A��F������F������5R������{���U��1�lt��y��^c���f����*�W���0>BP�XF��kt����������5@U!����������ThL_��"�������c��XdO�����0�pS�5$_��U��B�~*�W��?��v3��W�n���@���U*����PK`�L	*4PKX�KPA_test/part_pa/18_4.outUX��Y��Y��Z���6~��Bo�-A� I���"MP}
|^a������������%R-�k;�'������p�dQ|�����~��x�����h��������x7{�UEq=]��[�
(�LJ����]�Z��������[+^��i����j�X�*�H�Z1�5�V�n�����^\��?�,��W������Uy?����Fp)���~�J4R���~��_���b��+W�O���T-���i5���w�w��b�YW��r�������k���8�g�q��'�q��9�J�����+�#�]-���"\�������N����w7���l�y���|\������n'/�?&�������QM)�%)e�&#���4���`���[qh��V?����\�g�rV�����e]�����E����S��k�q�9cR)T�2i��"*�fMwxQ/�j(g�����0��8�/���dQ,E�����F����B���Mx����pl�8Tp*.����I�b5��R$�����5�d�d��LE\��\�oqa�^f�W���z�,�H���j��bI���Q��R�a���|�$W!�r���.%y��4��C.��B�v�(�T\���A�G0���\�9qq0���2����8)@��z�%�z��	'��=\Fh�5bn@���_�h�n�
��jy".��z!�5-�U�pa�{g�6�t�P|"�~����������������t'����N0�"y��~\6�	�= ���������G����+�U���$���q����<�$��1qU�8A�*��Sq������QK^�
90�&�C��<Jl�>5~�~�'}���w�X�Ei�5��)�<u};�^0��CCI�r��?�1��v�d����5qj��o)I�M3����\�������}����5��f����7����L�x=��%� ����}=�/o
%��Yji!AKJ�QG���b@q��GT���W?i=�����rk�w���P�2�`�����!��	|��D ���G����Qi��)-�F������i�:5P�D���j���$��H�
������#����
]�ZH��<EPq�m��EJ��0�����K���	}����vL*�~�"1��R>��9����Y��m�Sx=Q���8U���B	N94�"��rh}Q���'g����z�-N	pQ�����t:��!=]�):~t	�6�G6�w;����D*���(N����2����z������}��!F=E���,2n.�v�Z�����X�Eqf�#����l�������]�>w��K��
��'�\t��Y�(#Kc�8�����^t=�-k=!�5C8��'��Z\g=�p��~������\t=�-,�<d�}�z�9�����/4�9o��������l$<�n�)G���p|��f�J�t������}~�R?��b����i���g�-��#Q�������Z���cP0�!Z�Z�q+��z�Ort��X{�w���rE��z�x�����dQ���/���wT�~ws��P}<�>��R��@S�������v��*���o���I�P���Q�OU���@�������;�N79�4;�)P(���6�2(�D
IY�@{�$��K����x?�,%���������fANr�����'�jFzd=��ST�+�*������W���o��x�AJlw_�	����\%#�d>/��}��U�R�m<�����q�����d|��w���&~�tW����Ry�W�-N����rZZ�?@<���>�>/3�42�,UxM��q<�	����x�6�P���d���3��Gf�3�MyS�^#y,~^f���0C�����uC�T�F�������2��Fn�8��6�eg�Q�e��y���y��N#�gdaM�@�@��+�)�tb������v9f�g�-3�����wA�c������uYk���."i� ZrC�?h��`p^f�i��Q~bb��>A�gF&�������(����i#�LI��9�n`�LX��g���~�%��2�F���(\�3�e����0�im��4g���gW:�3���1dW8pQ,� 9�g��hq^f���1��=j�s��f�Q��7���j����e�:�\�w�]�$o�k����Xe�����q^f���eWd[#���`��F)3�u���E���t9f��&Q�����in���G\89/3�i��1���@�����r�x�[�����Y2��59���i����7�U�Q�}���-l�yH���?���{H)��G��n�6�Ev�?�������#�9
��[E��}�tSP_<������M5#8��
CY2@������WW�PK#��b	*4PKT�KPA_test/part_pa/1_1.outUX���Y���Y���mo�F���)��%R����}B�R*�]���4Qu�W�.X16gL�w���l�����W$qk����'��^���}����n���lt3�5bw�2)����r�N!
W���+�/��`_�Y��He��U4-7Q��dO����U�w�y����#�r�c��,J�o1{_����|^����O�`9��d��	x`3a��k�_��c�&Y����	@�����M���F�^�~]�Q�Y7'
=�E'A�v��s�k� ��n������pN��C\.���}�$��u���x�t
%�~�'��>*q�r+�.`����j���"�]��9.��>��,&�e����M6]t|0�ueB����T������@��RVNV��������=����Q��)�]��l��(9��":��q���������]*������.�A��"w�X�f��(cy�����V���[�z�e)�S>��]�Y%P98#��z��e\��U��^$�YP���Fx�n�^�����x�O�f�"�(�6ee��g��W| s�E��������@8�����+_���KU�����i��$���r�����<��7_�����:�_G�j��[s��P���N�V����_�F��[|�����)���69I������a�����W��+<���riG~�A����qM��{�|���kZ|��.��Dg~��~C[,�|�i��W+�_��K��k�Z-t_���{��������T
������ES���|��S?I#�_v~@�z��|�����F�:�_&�����o���W�����6_�)�������F_g��PM~���WJ�O~���R������U� ��F���{j ��C����9�{�kj��#��*���UW��������]��>��>�#�����Sj��]���U�\����{�|]O�m�o�[N���WCWyQ���W���I�f�|����m�o<7��&�T��
j�`���*)�y����~���7+)���f�u]�7�E�N=p�_�^�n��������Z���hi�lw���Z��"����LJ�Q��z���O7eBo�B��P&��Mo_����h�/PK���^IPKZ�KPA_test/part_pa/1_2.outUX��Y��Y����n�6�����h�CrH�)��[��"MP�
T[����W���}�}��V\@@#�����f>���X���������no~���lt5�1b��"������Z;e����k-YU~Y]K�%���k��]$�z�����5x����z���~V^�Kz�����I�}K���\/of�*�%uz��Blof_7��`����g����c�gE����Q���*��U�W���{�Q�uU'�zo<z(�N�;C�k�K���5�i�����ypN��SZ����}^g�����O���� �~|9'��1��i����"�����(0N;E`;~/���Z��<)���W���M��d�����]R���S��S�9�rs��9*�J����WN���nO���y����2-�{oi-H����Nr�w;�V ��5/��Y:���p�w����ks��G,{H?��IR��x��s[%z������N{�e)�.��w�9R��H�=>dy�Vcv�?���rT��5{'�wW�^�	1��������6�%�F;o]o,b��"=���d.��]�w:e~m,�N{B����j��4|��������1���P����6_���������i�z���/ ��4r�|��1l�������-R���%R���D�:�2�����~)>�~���~���G�!���/�$���Meb���]����_��k��Z�� ;��M~�����(7h�|O|���j;�����/�~��e����������uG���2�7���/P���������W;>h�z��|�/my<����5�_'��9���~���_)�|��u2��w�7+_�9�����������'>�v���_���i�]�R|P��R���9JSrh���3�_+����=���^!���~�
'H�oZ>kq/_5@������W���jNN��U�wu���R�������������v=�����L#_#:��
�WQgA4|��v�|��~���7'<��{����i��@�p#_��}�W�~�����$	1�Q0u]|1T��s������/Xa����{���C���m�$W��������.�b��#��,��j����N�uFo?����:]��/v9�r4�PK05�<PPKU�KPA_test/part_pa/1_3.outUX���Y���Y���[o�F���������r��J�T��J�Vi�j�O�S�bl��nw?}���!�	�,u="�������s.��^���}�D��n�w]2����t�V�\M�u5+4�JF��8)�/�	'_�Y�����U2�6IF�t�'�Yc$�(m�����X������D��4O���'��b����K?O*�C�-�_7���N���k�_��1�����_>1�i�T�jS�e�������u�T�u�P���(���
����m�g��N�������j|�(fc�y�N��[�>�eQ�5$��������Z��P#Z�^@P�e
@P�Zn�����|����gI�{�^��w�l��"����}RV)��K�^�R�:�r{l�%*�
w��9��:�����=����I�����V>���e(LJ�5�S]om@43f��z�-��8��w��c\;P�{��G��<N�������r���T��D2��D�R@�Y������� 1����������\eO�E����8!o�s���c����x���f�B�p�m��E�[�������������kca��(���]���=+�+4@��4�/N�����4����}��N�~�����w�����V��+k�*���_	���:|�6���/��1����R(������~1>�~���~����|C,����q�(�6�:�.C�+��I���W	N�����A#_c-������v�|�t=�5]�X����Q\E�:�j]��WZ=�� ��~���AS��7-L��������&-;>��=�g����A~���"�/�F�b|�L���
��b����w�b�ey�_h�o��_l��h��0l�B�~���o�h�7�h~��Wh�8s7G&����%}���WsP��'�70M'�x�����X�4v$����a�7�����D��4��5k�E(,����������!��=����)�u�j��X,�����������6_�M?|���e���}�Z��~���fT��"�;h�z���� 2��7����W��;(?(0z
��E��~��l�p��h�PM*���j���Z��v����?��X���zD�����*E�w�$���Q������$~=�PK��1�LPKY�KPA_test/part_pa/1_4.outUX��Y��Y���mk�F���S��K�Yvvg�LSH��
��4���
���Ed�'���}��dKr"[vA�H8X�����f�3�al���������no~v^6��1a��2.��e���4 
W��������5�/��X\Ke.�E4+6Q��x��7��8o��QI����rB\�~`�C�FI�-��y�Y�<=��)*�q,b;�9i���	�2a{T�f?��S��i���|x���b��%��w�/���������J�����R;C�)�����t������pbo�O�Xd�)���g����Oa����?�S��1*!	��-��`��m9!�a�l����9�kv�Di�z��m�Ig��K�����������P��=/�s���R��������-�#>�G}��$$�f�
���V�
�\:�H��Y��1���]��[ET���^�0�Y���!|f�(eY�������F��W�:���R���C�����
P9q"����8)B>e��z��e�|�����J�����n:-�O�f�"�h�m��"�-K��O�'s�=���������js$��+_��+,���+
b_E�:�rY�9�UV�Z�o�����W�_��7|���8_���u�Q@�Wk1��Uh`���k��k�D
z�
M����Zb��7��%}�������o��Zz�/�~��R���K�r�|�P�����T*!��'��I�r��NK7j�Z����._��\��j������*�����q�-H��;��C�����|I��T���_�>�t�g����^~#}}��e}���S5�����r� �|������������T�_+q�|��a�����b��
�/�Q|S����v�D"%�f���>���5����@�����	���O~U�/��zWJm���W�����~�@��e�sJ����[]�Pvk�7��[o/e7t�o�{�$�������
��Byj��U_�����;�n��������H��{V��x*�44�������+p��������6������U���#�h����5_e����������=��,�v*���j���8}��Hp��������'�6ELo/)o�+�RU�/v9�r2�PK0�QPKS�KPA_test/part_pa/21_1.outUX���Y���Y���[o�8���)�6�bK�<���Lgfw;E��`�O��#���������dYJ�lE0���+Y$<<����Q4����w��}|�a�EW�/������yE�f�mr+4c�r�)!�q�Y���e�������u�j:K���(�?��L	�8�Z��4?	�����p�^������]v�c#�/�Yz���n�e�����\_G?����_���z�����,�Oq��_��au%����C<]o���?�6�v`������Y=���|��_�I\�'�E?�=�O��	�2�'Dj���-e
�,mC����-n�Qi������vR�];�
MD�C�DkC�t�������c���V�����!�&�}���w[ZM	@���@�#R0�_n�%�A��67�s�4����/�F)K,�F.L�(f�Q��P�X�;AZFM�Vm�i�<��Jc�nVpE�84{%�����-	��r%�ir�n��j�o����t���<?���������S����I��(
u���E"9��dj��c��Gz�J������i�dn�QL��+)4lf[�41����1ba7�,��'�/������������y�c�-�-����']�����f�X���n�����hP�I8FaQy������w�ajc�'q�\��]���X����hh������l��V�h1_��$~�P���j/���RD��M� ;���h�4����
�x'l���$�8��M<������7�b2[=a���|�O������?��mU��A��b��a��j��J��`A(�0������~1 ��0.p x�G�"&9��������4������0��B8���J��������-M�3�=������(o�!�{.7���v�9� �vy��=��1Yc���2��B9�!1{�GP�IU;#(&���}�����`������7
pz��`i���`���&vAY��f�a��4����[�X�G0� @�yXq���R���&�@��`���"��(A�(b�����rmF��6�>)I����������`���>�
�����4.�n����Rw�� ai������H�I�$4�q�5���
{k��;5�'��v�@�Fdiq��9]	��l\���5l�-��>t$�����
���"+p�M-g/p�,�����J1�K�{k�o���d�vSW��	p�E�;��}���A0����$��5�w�8��*RX.l��N�(nsh{9)��\�qI��k�o�5��[v��KI����"L����eG&�n�Y}�NU�j����M�_^�m�S��T�;��v;]��R��:�ku�R���z�C����c�lo"�,�o���!�oY��G�n�_��HJ�&���A�IA?b(����������eoXN��jY�`U���;A�%I{-��Z����iEt}�������������a�y����x��~;O�������C�����F�v��#���9"�K��4|�9��=e��+����M�L�������]5������4����5��R�JD����������Nr,�.M\G�n!q��=<uV���e48�e���z��p�j\��,�#D�d&�Ku���J��������|�%e�����}�����^�T������t��K�����Y�b�rd������~
�N&��?��e,{��Z���mq=�������(�=���ez�����J����?��P�������+��������_�E0������x���d������kK�\[��K��A������_�%-P.i���A������+i�\I34�gV3�!�C�����<P.���PF��;�|��_7������)���������Q�\	��J��_(L*e��i����*)�TI
��R|�0:�|���YP)�<3��;��v�	~MTj�zOWA��T'�Z���|��_��R����V:d�|��_���[���*N�n;���A�����g�.�U��*��{q���|�T�W{�ESz���Ny�2C�������c8��[��������;�U��I:i��V���p��{�$t��g
(U������p�-��~��TQ+�T��,xC�E^Q}���P	Du��P�+��~I��	���im�(H���p��U/p�z��������mh��
z���Ji��Bs@��&��5C�
�^bU�
���/�@��BXU�
U�����U� ���7�PQ��������c�"�~�5o����R���W4�P��7H�CXI�u`�lCUa��
a%�ce�������!�@��E�
U[y��:$
�
D�e�lC�Q����+������O��UL��b������aTf d
U�����\�%�`��e����V����v����%� ��-�1=:��u�l��8IG	?���>#��A�)a�FfVq��p�����{m�i��M�q�:�T7�#�R����h�G�ep�������FC }^�8
��x�O@�l4�G�i:�������@�R�����<
&x���0�!������h������FC���}!$ p6w�������c_�������jD@�l4�����;��.�+?���������l?U��
M��4�=��jO��������t=t��!	�*z�^E������A���)��F�I���L�����/PK���<�
��PKV�KPA_test/part_pa/21_2.outUX���Y���Y����o�F���������bg�k�8i��]j�	��>	�L��eI��6��o�����cE��b=���������$�W~�����$?���|������U�a�<^%���l��@RF%�z�,f-o!�k��z���:y5�^��d5~NoA)n�H#)�+j��(M&���]_��^�I�����N�J��O�Kv���n�i_��W��N��������|>��L����1?�����M���_�'��p��F�s�<[��q���:������e~���"�2\���]���E�S�{n8a
�����0�\�1��m�^5�&����d�y���|��=/ j�@�t����6���e<z�db�u*���O���!��c*�-��������9kM����d�<���m���y��k6�V�R�����u1�c��AQ����hEL	�j�U5����t�1Q�
�(�V/F	T)PE�%�+�)�L��v�o��S�X&'��4E���#?_�����ft7]m25�P��K-

Z:��O�Q�#c�*����J5����Gh�df�Q �((�����I���\R�������_.�������d���)������x��38=|.��I:I���t�Pj���(3ZtJ�P��Y���<��M�S���
���f���0������p����d<M���y��f��mq�E�W�<H�c�M�!�S�#����M~OV��q,�Q:����{�L��3�e�Y>���O���������Y��!���!�A�	��T;0��h�����Z�~Ap�5X��|����A�&� �=%��� 
�n� ��=Zi��C��a��4�P0�5�-���/�@�"�b^TY��~0��#_n+���D��rv���r �B2 ��j�iA3�6d(_���� �{���P>���@�.���O)k0O���	.(jR���#���&U]�p[tX���hC����5�r@Hmz���a|Z��5h�;��@��iT�#�%���i�+�}��� �$j���H�0pu>BA�5L_�{S�  K���bF'j���]�t�����
���$4W�zk�S���Z��J��	|�#nbC��KP��.����_w
Yn

��e����z�v�]h�L9�]^6�28��\�K����g��rk��ki5��v��+��h��P����T�#X�?��9��~Id�5��5�h���	�����&u����^N�(��_Yv
~z�1.�u�-;��!5�&�F�/���f�e{&�����_�rmCG
���������\�;�Z������)
v���.Hiy��S�V�%e����h�������&1`1x;\�S|	e�=�u9���$R��b��Jh(��D�Ta1{��HC��'�D5��5L���l�m6-`���,r(7���0��L���6���Ow���ws��|�8\��D�� N������p�����R�e��[#�h������%�$\e�i�eC1)Z����t�N�T�Bw��L�7Ot�LSo'��i��}�U�E�����s��~]#?|�X�)
�D7�Bb��<<�lg�Mg���%�D?����[�6t}r����"��6��A0k%��������x�%c�����C���f�^���
Z[�J�he���'�F.�&��#����|p���L$w����m,�x���Z�F���e��g~���=�7��J����?��-x��"�_&�G�W���#�N9�7y�������D���0\B��-��/��^�-	���qm"�N9����zK���?c����]���r>������N�����C*���r�7��zO0C�
�w���[�o����B�?(N������?P�]������_(*e��)��~����
^�R���:�|���Y|�9+xe��n9��~O��	�{�6��asX����[��_��h|�-���J�`�������F��)@E����������F�������l�\�h�;����[�xS+��m,��;��w�+�)��H���[WGp���lo��?����v�L�P���(��v���r<�����+��=��E�����`3m�`~�F*��-�FT��Y��>�������
����:kD�_K�["TQ�dw�h��K��p����^`~�B*&����YB
��h��@Zqh�9���'��=!�(ZU�U�Z�g~�>X��"�F	��3����� ��7KhQ��E�����C�""������~�J)�1�h��R3k(5�������YB����*��������@d�_�5TpC��+�ch�,��V�Pm
	��"^�h���(k�����������O�������Y����hdn4rU����^��Q���q��.KU�l-BB���ec�����E[�k&��@��eX��� �����1��1
7�h
	�d.�57��)>k��^��m:�CG�Iu#`>/��������."p��#����Io�����"'�!P>��8�
��x5���Io�����DNzC`}^
fm�wD��P�_Z���������ID��7~v�B"'�!��c��8�
���e���Io����FDNzC�g�pLvt�v�g���o#t�
�)�S0��������jOEf�jO���*�Fl�(��i������U��k:z�nT^��}��"�(��W��������PKrCg��
��PKZ�KPA_test/part_pa/21_3.outUX��Y��Y����o�F���W����B����q��i{w��&(�{T����*�m���,?DJKR!�n���lQ�����gDEQ�����?�'������EW�/F�����*�^�f��������8.G��_�[��?�oQ\G�����p����-(���
���1�� ���l6�G�W4u��D����4'd�<!��f������w�����������~��h�2�O��"^���d��������&Z������d8_����<��vh�������������|Y$_��d{���q�Un'�N��bn���eP3�LR��6�������zSO����� b�L������n���m
�?^���M&V[Q��=��d�J��S1li5�+cV;��Mlt:^P�����|N����f�i�����L)44�r��qJq�����QH���U�����c��PF���+�]�W�X�"e�D�d���\Ie���]~�-���2�8N�	����a�2=t�6���j���Ls����E��!�k�X��������{VJ5.}���cn�Qd��3)R�I?I/$l�KA����Yk���e�j�n6%�&O����o�m���?u7���d�L���<�>��bF.�-��X��M�l�O���:(uL��"�<����]�p�i2j���}
��lM��d�J��m��9
�V����py$9��18�����nYh$��a��0�����:X$�d<_=8��&�F�g
0���g�D�G�h������o��r-
A�'�L��� P�d�Z�M%�����Zc� ���5aJ��,��f.RQFi������ ��~A=�M�X���[,���lL��D���a�!�[�XV,B�?�"�p��A�C��v9;e��B9�!9���f��p���&U:k��� �=��@(el����P�7�X%j���7=��@���I��?��m.|5./�8k�r@Hmz���a|.K�X����9�*�Q�����W�����>``}RRVP	_m[B��.U,�Z�`���aho��	��W��D%���-��)	��k�9	��_o
Yj]�[]^�q6AO�m���
��MP�t9$,���kHsk�I�-__>t$�n����9"%����
�Np�,����`���$����ZZ��5��:(�����fr���q��|�K"M����iE��a��-&�*��"��)#'������q��%�f������e�:l�r��� 1��'���bX��{&�����_�����I��$r}1����V�5��`g��]� �2V\?E�����'��5tq�o_FO�jy���F�	=�����~In")MS�}�I�)�����LEl������D�g����x6���M�.�:�� /^y*
3Y��4���n��~�tw���w77��G����N�h
�S�v�z����QLyQ$���6:."���J��q�4�i�m4�"�)!K=�n����JV������':k:So'��i���}���E��T��s��|]#?|X�!
�@7���c�=<�,i��~�'�l�rs��z \O�,����H�mFon�Z���Q��O����>}�xH4�����?������$-��M��rCc'F&�&k��]�TR<�p���LDw��;
�6����m-�4h�\�����s���m��7���p�+��r>��w[`��"�_$�G�W����V9�7y�����n�����v9���`���3����r>��oi�jKKg��]���r>�����j'M����.����@9���x������a���[�|���7���������������������������R��r>��wI�F�T��-��
�U�������Y��Q6\n��-�{�p�'���BK<@�����[�p�mo�����M�Z+"�r>��o�����S���)��v9����F������/P�`�;����[����o�6u�=k�+z�+>e�Dw'�������8r�[6�����X����
��I�Q���#��������cVL?{@���i]�����k���7�������m@�_����(:�w|����N���T��Dp�%���o��EcM\uh�c@�_����B��rit�U�������{���Ni�����j����t	+YU�U���|���}T`�
�j�k������
�
�z�t-�s�(����b�����j��5��_wa��W�K�R3�)5wa%��U�t�
���pVB!Z������
nV�P���^�V[yM��K��p��^�VFyMe��]�7��?�j�b��*fqO�2!�@0%���e�*����Q���"$Tn�\4v���pew5H������=Pku�,]yq��.��mc��157�h	�T2�
��u7����k��^��m:�CG�jG�}^�8�
��. p��#����Io�����'�!P>��8�
��x5���Io�����NzC`}^
& p�`��8�
������Io���/�NzC�g�~}# p�?;��������_��������������A*����To�������j�NS]�z�����S�����4{p�}#�[�n��[106RG������d����<��B��H���W�d������PKg���
��PKR�KPA_test/part_pa/21_4.outUX���Y���Y���Yo�F���+�{����0���I��c2F�����-X��Jf��V��&)�-BXm��%��X�_WW�LE�������'����~�E��F.�O��iE���:����)���y�Z���!�_��������r<I6�Y�L_�"��	&*�+��q4[,��������(zX��tBV����f������k{��<����U��������r6�Wh=��_��1?�����:J�w��s<^�S��_+�8�_��C��~\-6����U�4N��8�*�I��q
B���)�@�)�I�@3Iu���9�R�65=U]O*����gzR��������h��m[����N^�L�2�J�9��x����'���0
#;��1�c��kx�����z������N�-��R)i�!�RLsD��J)�A)��C�9a�{iU���8y�W5ex�*����Z��,PK�`�T�"(X�n3�~�m�z�W���l<��@�z���f>y>��&��yRg�+�����]�"���������h
Y>+l�Y)��i�~vU�`��p-	G<?�5��������a9�|]�9co�/�T��9��4Yk��E7U����3�y�<^�g�x�.�����
�0�
��k����H`*`e�|2tc.� ���6
^,`�55^����C�G�0���<�M��4�_G�n��>
��d	]�<N����3�����f�p���F�%?LgI��\G�xO���u~���h�x����>�P=����`�����*���(`���
QV�`R��k��^kU��J��� h�E��'Y����VTID/@���3!���[���B������Ip�(�nM\0�����BX��b��X�}~�_E �dE�	 1�D��rrR<,iA����"�f��[e������^������.E�.A��,"�!j��5�3��$��mA���I��?��D�,�H�!��X���qs> ��oM���.����(�z@�����b]�~N���J
�ndd\B�b�O�W���A�*!5G���
 ���|@h���A����+dt��Z���a�H0J�5�-	��9	�����d�uKn�Uy������$8x�Li9�	�������5iI�
�^>�$����L!D���l�P	�
%G�`��iIb�$������0
1Q�������� �Z�K�La��`��8��<2,�4�&nv
+�2��������5�R�[(s>!��`���H�k����b��[�\�K6������g�ji&�����_#\�����u~��oags������&V�t^����uA������:�
[K��4�8����%N���&���q2y��%)������S|	�����NBKA��8=d*���aL���D�g���d��7-���4��6NU���<���l�`�vPD7�w?|��������������H'�4~����&��e�S?�)/���QZ�"�������������H>11X����t�N�T�Bw���_���L���0��+��I��JU����1��E2}�:*��C����B@�@<M��A�>'�l��s�l=����r$h���H�Q�P� �1�a����
��z:J�m}�b��`�i�����Ck+ZIz-���W�GF&5x����.m*)^{�a�������V������j[���]����� ��m{`��o���\)�����gn��v[d���w������_���
�{�t�����F���G[;�6�\3A�~9����jo���;�F(S:������nK������;���t������+���U;i|�?ba����(��n�6�x���`��^9��
�j����'�aj���/'�O������J���%�"�����'n��uIy/j0|lX�r�@��,Vk��^�hi��~9�`nO�����HV�'��WB�/����V4VkE����T!�!��/�8Vk��;��/��8������n�����o�$�O��92��9:��V>������EGOk��������4���;W�p����na��b�{��������r���v�����9����[V�0{@���k]���f���A��
TPZ�
�����-}EGu��t�����:k@�_Ku["d^��=w����-�p���^�n��**�
��]|
��h����0��Bs@�_Ou{|Pa�������|O���*b$�
�Z���N�J���"P�`�]|���-��X��X�P�W��-�kVR��v�-5��R�+!�
����*L[��>��d���@`�_��Tp}X1�T-����J[��>a ���K��[�-�Q�"��������I����]�{@��h��R8T+�;��U0����o*Ev!I���Ec�{�Q��d�h�~��h��t���8Jg��j�k|GL��6:C�=���f\��������8����t~���G@]N�Xs8\@`e0�E�f����oVC ]N�' �2�"pj:���h�S�	�����L@`e(;��J@`e0nv�VL+�!p�c�X�����=t��R@�J?7;v��
������jD@`e0nvL�����O����T7�Y��7��,[�o-^4��wn��=%���=]�G.�o�����~#�S��u}}�O6�����I!	E\s��K��4�����PK�(��
��PKV�KPA_test/part_pa/3_1.outUX���Y���Y���[o�F���)�-�b=���sf�u���t�M���A��@�	[�.�D��~�=C���DJ�����)��x�?�B	q���W�?�wo^�=�����#�3�f4�B�g��
�
���Z���|���J+���&��x.��|9�|4��|@���J������g�>x~��.���l��v`���K��?sS������V�����Kq�XN����i6���G�������Mv�0
3�7q�L?���.'�b��oF��l9�����z����l~����g�_ny�6���nvs)����[q�
���l�c3x���$�z=����e���l����v����V!XNI	��lR�-	��������q��+�AB�C��b(%�5.����O�p��_����l>�?��$N��A~��������:��P=y��|f�hx��6�`]5c2�n�qi?�xn��x7L�?F���7��tx���8�w�y>�n|�&���t*X$�j���l��<YO���d�#I��B�Vs���'>�u"m�h�8�I�����x�.(��*%!n��F<�
��V���r4�_����"�_����a6�Vd������d��.8��Vdmp�H�dYik���li����My���\W=�����X���x�������
����.�	�k��J��5����q�_�YH,Yt�o
�����}��x?L�l����[S`����@��V����*ay
(��m�����8���n�_|#��G�e/���4�����Q�k��������f�q��n�m�.c��[�&,��2��V4�����1�`��1Ow��m�{V��1��I����?25cg�9J�!��1tGTm�����Vh�Z�9�v��&}���"���������5��8�,o"ht��
�Q�1�����a�8��(�&������(l��Q�1z���S�1���6���������<�cdL�Z���G+���f�3*$c�c����[�9J�@���8t��<�Rz�+Cj��+���i��;N���X�.cv�n1v*��|����C�8x:J�8��?�n�G��@��y(��Z�9���5�3�z��57-�/a�5oEt?���������&���4�yu�f�?��������'�FB�#��"v��
�D��p?��y��U�mf�.��dE?6^I��������~���y�WAb�y�>Yw+�mk���G�����w�<���V��W���',/���em��>@mS��y���:����q�)V���9����g����:x�xW�<��xU�6��'�mn�����au�A$�������>����
6�^��]�gt�����O8�l��������O���;��t�#-)�d ���;>��:��"/�=�h6��rX���;E*}���
UH!��iv&�}���
�W�h��^�
���v�\A9��Y�S���R��X��.�>|]��z-�������m��Z��am�H�t�k�W5�`4@*���i�����C�_)�	�w��Z]h�F��K�����|H-Td������"<����{��$���c��m�C�|�W5��,���P�'���OT�g�9-��7 ���}�����w��#y�;}�L�z�I�g��^������6VB��S��u,g��4��v/XmcO����������w�����|��E�1�^[����"��C��]����"���\;U�
�i����_��I�W~��^���|���
�>ol��Oy��u�����I��n����V���:wOL��H>����������#'��a�[Y%�P��%���m�X-m��#o��{rX��Qa4N%�����(C�6���j?m{���z���*c��#%���*�^�[9���6�K�|7�e���uf�)����y����|l'�y&���vsX�t�8����m�������V��OJ�c��:���R����z�������l��<�)��q����m$�m�U��������V��)���6�i��X,�E=�:,Q�b��^�|3)z=��_u�jW�+`��.�V��2��x��9������rUGl��v�]NKi�������V{����>_d��l��+�����������+��_�w���|�+"�������m1�K�����x�G6\�#�Z�����6���_���������PK�����>PK[�KPA_test/part_pa/3_2.outUX��Y��Y���mo����S�]��&���p�����\�������@'/,�z�I�����C��Zk���Hw��^k��o������8���������^�?�����#�3�n8�B���������J-e:�b6�u~���uxS��})�{�r��r8.���+/e}H���tz�/���wB|��:ngAk�hV�B B���]��m�ie�����r)������h8)�e1�������oe1�)n�g�~!�$�_��������Bl|�f8�O��%�������YLg7�l��������^Yl���Lo.E9��x/E�~���S1��xl��^�/���'�����a6]��������5AK�X�$D��h6��	z����y+Icu%�G��P�l9�
������/]86���{&>�
�/�7����W����n��1���&�O-e��w<��%��g����W��)�m6F���������-&�A��q�z�r��|�g�C:E���a����ld&��/�uR��JF�!��j:��;>�M"M�h���u!��v���c����
sS����OP�Z<~
������b����46�"�o��@�}:��d5��9/k�v�d
��V��u��Y�L��-��t�����������?�������xu�r_�PR))!��AK�r/��5�g*
�oP��4�-���S���EV�3�-��X�">�{1�<p����H,�����)X������Ha���������pT���n�_|'^�G�e.@q<�'����<Jq�~���F��%��X�����]���f�b��{2��3�ct��=\���12F���?c����|�Q�,c3�x�;��(oi��m�1z�X����Y�fl9��������qP�������#�K~���9?F�Za�$
I�������	~��}�����&?v��t3��6�JN�Z+��Q2F��3�6c�ZA��}�
g�K3��y�#c4���3[��su��y�(��K!�c�I+����dlp����m#��0��\*s~���y�zl{ oc����k�f������cc�"N����)�Q2��Y����<���:/aN+����4��S��S���6��a���<�o��,�-�c8���<�
��������V�1��1���Mo���MZ�1��� Q+��\� �X�����]�9Bi\b4dw�b~� ��c^ �y�����q����O�[�!�ji��������T8R?&4�g���8!^�w�����
��4�I�cw�{��{�z���uI�
h��?s��jw7�b{��:usCSN�+����*@�!�S�������������Gxf��z��+����{[7�^�����3����O��m�'\p`����
�A�'�?9ihgF1Ilqc����Ce�����N1/�vO<������'�u�He[����F��X�Dhs���|�!Y�FT��*��b�gs���<���+-�Z��Y���V\&��/�/1���hZZ���������i]BI^k��~X�:�C���z��	8���T��2y�fM�y�_�� �������N��M�������Q
u���W��M�&�;��_��N��E�z��Fe�rj*�:%�,7�E����]�|H�/q��n�W�Iq��B��i?�$��
��_��N���6������3��b�%�ur��=1�j����u����V�}�&6C8�Ij����;�p��M����������m���`"Oz�7���� ���o�4��<m���&����7����!$����I�h���o6�*�O:��"���&���:�����"lp������<%�7���wX���R:dN�<�!�o�a��T��S�|�W�=�O��:�u{?M�&�[s�����W���D�c�AwbO=j���=�����olZ'�v
�j�>aD����B�v�qV����IEX&a�5�[w�(��s�M�����BzX�OJ�c��:��������y�AC�%��%Q���xJ�c��c��i���������]_�W}��6���7�j���Y.1�+T^)�^���$c,��]}�jWR������9+uS2]<�b�I����t��#��p;��)I	P�z��
Ne����w����R\�o?]���_�����|��f�A���Hg��j'���_
�$r�1���7��E9d���������z�������PKXf_���>PKT�KPA_test/part_pa/3_3.outUX���Y���Y���mo�8���S�]��� ���\h����n�M�8�����?�-�n�����(��U;��bP@ULK�i���,���Oon�->�{���..�N�.���x�q��.�$E�+i����������V���]>��������eo$��8�����*)�C�������`DW�	�q:�r9Z���PD�+5!]�P�E[]����kq�X�//G�I6���g��}�~���]v7������|�_\_O���O�+������?]N��l��o>~�g1��e�����������l�~�����Z����{1�z��
���x:�{|x]
���'������a>]�^����{����%c�S"A4�M*����G*C��##��b$zt-[�Js��j���m86�/�������������������gl����	������_�1;���I�T����3.�_S��|!>�z�I���?������s����C�����lyaH�� y	��ggx�
:&����d(#]�*��h�s5[��>�M"u�����u!xV���C��
Z�??w����ADwPX%��UK��u9�?l����E������pR��`�#��dt��Iv"k�XG$ GO$[�����Nx���\7-�����X���h��������D�����[���{k��4nP�Q���	,�2X������A�]~�~����N9�������R������y����q���2�|[����Gy6�p�����w�{��R�
��x�I_~�w)n������_������[�u�����+o�� ������7������"coy��glZ�8����=;i'c�FDI��3N�1*:<c�`��K_���.@�V�,�k�9���1���d���g��~���K~�����5g)�iJWz�N���s@<<cl0v��6�����?F.�#
�J+P��c0:��o2� C��t����%�}��O��7���3-Z�R�b(��c���[���>	36��$[�������Z�V�����p]��+8_9E�����3����9t��S_�c��M���I2F�,����u����:/�.��DOB�|������S���6-���XkE�~�R�Y��X�]�\���1�Xb��C8�D9:|~�[�<i�V��it1�����Q��������:����%����;Y��-�"���c���c���q����b����YK�:/��L�Wu�
'�����7�<N��������q��Kr�C�1v'���AaoSo�yR�CW*�}�f[����C��x�G�����1x~�x�CO��N�4�\�d2�����=�^��|q-���������?���.{:���w��	�]Z;��y�������/e)v��/����Ca�����N���vO<������'�t�H�����m�LC,z"o;�%�j��I0�����XT�Y_�����Pq���K�4�pW�e���-g�b}[
P���kmO��C�c��	%��j����m%M����#hk�R*�����$���W
{��}w����i��|TbT��j�n
�$�h"o������ �����y���#����Y��I�B2y���"�zs��I>���8xwm�+��(��Im�M�igA�U`����m%�M�1�V:O��Z�+��8P#O��`��=�j��^������������Qmt|�(�6���"���1��&y�������1�Z�\f���H����6{�U�����m%Om�|JT�U�kS�c�6�������;+��[wx��M�����s���|^[	.�6V�3���!�n��
��|�x������s��{83���A���5�%Fp�+����������L��o����� o�v6�^En��<��9���9��j�3�m��:�f
�j�+��v�|�*����������L$#�6kXWu�8���+�	f�Ww�S��b={F�c��:�R��J����z�!D�w\��t>������<{��e���o����B�����f��tN��O�k4�T�,��:�<,P�R�^i�2)�G��Y���xr��i�~�1���\��}�x�P�.Wv�6n>��9��~��),y`���!_d��l��������������_f�Ao~4��������pr����0NZ��xq!��������"J�C����|�����PK��B���>PKW�KPA_test/part_pa/3_4.outUX���Y���Y���mo�8���S�]��� ��	.�{��C���[�U�u������o���o(Y�b�J��{�h�(�%�8��G����w���>��x�OgGhg��p<��8�O��'E�4Z�tl�|���J+���&\Y�Z�����7�p�]�r�J����C�i�h:����3Ft�����-���5I4������.��E5�^�������]���r|~>N�a���*���k�{�Mn���|������+���r�g|B\��?�.���$�&�������M6_��~-~�����
������������b��f����x:�{�v=$�z?��F��f���t9{s{;�n�s+����NI���lR��>('�e�R�j$�G�Z(�V�J	s�K1����6�����O�����E�����l�t���36�Y���j��RP�������j�hB����_S��|!>�z�I�����������7��|�[��l���N���\y���ll�,��O�*{v�b :$����2�?p�7���>K���Ik��������];�P�	Y�(������P�j��.���
�`��u������b �1N*��sd�a>���IE�*'y1"Y~�$���-��������0W��Ub>��k����h��������B��RRB@k��~�^��5�HJ
�oP�Q������Qk�@[$�jg-w�9�U|��&b:��x��[#��KW-��`-v�5��q� (�Xe4�C��m���<��,�<z��N�b�p�\��xO>����.�5������t1c=+�u�f�aM�����M8�X�����b�u�b�s����t�����q`��#j���#�G�����
G�������m0F����{�S+"c�;3%�>&���8(��g��~���K~��H��ct�����f��(G����h�����q`�;Dg:{`?V����k���(�����c�1�k������L�s	&9e��<���1'tF��qh�
���:���.?���%ia��;J��������b^�,���*r7+9����%c����k�d����5�N=������%cO�E��q��CW����]Z�e��Ii�p��b���ii��Zs(��qK����OZ�.p����!)�����1�VU1��1���M��y�&���it� !�m{b�����c�cC��f���1�v��q7��8�\b�e^�'��g����"�S�|]��b?Fe�$o�u�
G���f���u'���V����(��A��;��M}��M�Y�q�P�J4�8/�b_s�!xY<S��jnh
�iN�y~�x����)q�fsC� 'O���#<1{z���e��R����so{y��]�t��E�6�.8xTk���6O�r2�);����$�D���r�����q(�������/y���G�q6���������{8�0�4�z$B�U�X���-�M��b{��}��5������J���bE�Yh�k�4�z^
P��,����/���mJ�Z������<4�k�W�*��])q����s���;���+�������u�J�4�sH���jS��j��s:���K��3������v��m+y� o��^!FK]jc]$�>��DL���������k�)�%����� �9bS"o->�5��� �*��~q����M�1��%���Z�6��\�YTk�dO���qg�n[�ck���@w�~c��;��0����"���!��&y���
t������a6�'���_��E�Wa��^���<���U��}L�����Y�]"����"�����lZ�>�t^�������k{b6�D��VVI������|��p\���#�y�C�o�a��0�S��|��q�p�U:��������`�`��U�6>���� _��������'���]��u�J�Y���`��;u�*�G\����",	������;d[��/,�b�,H���X�OJ�c��:�R��Z����z���a��Ab1�|��)��y����m%�l�U-�.��h�U��)m
?io���Sq�\bZ��X�}z]d��I@1C����V�J
��r��ie9U�]]<�bwm�mB��\��p������W�\��@�g8�%�����v�Mrq%^��r���?�~����?����`Y����j'���/W���/������22��W�!�JM{�M	�����PK������>PKV�KPA_test/part_pa/4_1.outUX���Y���Y���[o�8���)�6�C	��lhwgf/m��b��������8�L;�~I����k[f�pU�H����O�����$�������d��~���.^�����m�P���r�xZ�w��fZ3��n�`������~+nW�o^e����y2�V�C~B8��9���ku�l�x�������,���Of��y��r�����n��MVy��*dX�_���^�_o�,���?����by�/�>q�l�)��qY,�����t�O���|����w[#��R|K���&�j�kpC���{���0���|��^g����������S�c|�,WE�+����QIr���/X��h�|�y�����,�UO��j�L�A[e��Aw�W�:�r���R�u�X�o�3�B9�Aj���W�Tq�<����p�����>U8k���mQ��d6�g����|~[���^+
�<�"�f����iLA�D%Jn�7i��}���o��tQ�1(��/�y�9����;!hL���~nd_tT�|2�2�&^�Q���C�"|S�&��_s���t2��ZM6B%2��e�F(�U���H%s�.A�H��c��4��s�������9��������n��M��W��������O������5��Su���2����t��i5yx������
�����W{�Y��o���K�������6ETW�l�Jh&MW���
����x<^�Xb�\���5Y����]PvV��b�?�H���AZ�=��!5JR���]�R�6�y�m��|���T���,��l�'�|�O�����e�����|MY�O�|������9zd�x�_	c������O@�@����?�t/�����iC���>�����p����C�����qU�6@���G������,_=�9�^�W��Z��|���
w^0@��������G�o��.�7�O��?���J�?��?����?�48�M����*�K4�?�^����?Y��\�@�+w��?�T���r)�1�?����O���������S�Yzm�x<^�X��?R����!�OG������������o���YJ��T��^�y���_G8����W������BUt�����Z�z�D��?���;�Q���R�����.F�?����~YW��?��o
�x����������������z��z���V������L#5z�2���Z9|��k�j�l�����-n�S��R~f����v�Z��8��V��%��t���b�t<^�X��?�����!�GJ���3����
�AG��&�a/��L��3�;f0�?���}v�?h��QM�#���?j���G*������?y��h����d���>��)P�������2&�Ak����?�������RF�P�J4����?�k���>`�%�j�_���5�"���l#Ti��:������b������4�I�������l���r�U���4�.�����eL�#���2�����k������s���������4���?G���� ���M���=;�'j�b9��0�l���_|���g~i���
����u���LZ�RWx����
�����2*����!��F�g��J�?qN���rm��r����������je�5J����N��F�f�?j8U���A����Q���5sB�����5�g��#���k���e
�k8�o,���i��:�����3��?����?A��&��t/�,0?��R�L���gX��s�O7BE��O����>�Q��ouP�0�Z����u1N�Ag���_P]�7L�#�6Y<�o,c���?������Et�����������LE��+�����_N�Ta��jO��p'���P!��2�j#��Pz�G����dM)Zr�N��4�_g���B]�
�����������[��xw�:�������?T��0��Tm���������^|�K�\���E�����)d6��rg��F�������L
�Q�����E��_���l1����/����������(�4�i�*m\XE��;��bE�5��"�e����.��F�����N�5�d�(���Y�(q��
��.���t��5���X�p����"�����z�!���"n���������*����S����!l���R,c�(o��������-����MQRzx/zNk���i��9`&��H�OQ��
k�Yo����	�"�F�����u��O�^��8U��2#�\�3��;-EUgF�������)3��x<_K�����EQuE1R�%����:'�����jm5���H���g9���D�%"��Hu��UD�z(�M����"�~��������w��#E���EQ�}��z�P}3b,���X�DQhQ�()=|FN�d'#E�p������}/�S����d��N�`]�k�)����U+J������o�Q�g��:g-��.|#E�T����EQIo�4^W��
�k�1aw
���X�DQ���E�!��vRzx/������S�M�v=����)>Q��?1���B��U_�E�P{�6��"_t�1��ko�K	o}m����H)}�X����u,G4�u�u�WhG�������������O������o����:]g�Qz%������p��������Ko�}�W�PK�P�kR
�ZPKQ�KPA_test/part_pa/4_2.outUX���Y���Y���mo�8���S��&/Jp��SpY����{h�����*|��
q�������oHJ�,��e`O[TnbY���?��Cg�I�?������>�}��4��]��?m����Xe��t����P��f��6J0�r���
d����7����t�<�e��!���a�!�S��N�-�t�� �^}�e?����=�~Z.�_��-���*���B����]����a��c�������|���g�O�?��b�,V_�o�F�4Y}����|y��RY]�o1UjF&SM)n05�/��df�a6�����L�����y>���-�6>L���,��^���zT���%=h)r��S�~d�~rC�^>.������W��U�4t�|e����4_Kv�Cl���n�v��|�yA������?��{�?,�������7�>���d6�g����|~[���^+
r��[d1����	�q�d�$Jn�7i�6�}���o��tR�1(��/�y��BdN��4�_U^�~nd�u�f)�����S��h{��6����c�����8���������DC&�@�����rt�������.Z�������sN�{z��y1��ryY��[?�������+.^q������*{��/Y�}����m���/������<<��Pg|^���~_���=������m�%�Z�~�Tj�,��\�V%4�Fb]���
���x<^����\���5Y����]PvV��b�?DK���AZ�=��!���b��M��O.!0�ph>U��{*��v��n6��~����X�V�2|�t�@qM����e>�����}��2
�������~M��A�' ��s��"�O(�Kg��#	P��]����pg����G��n���?�W44���RN��eJ����Q/���o�r�?����
w0@���������G�o��.�7�O��?!��J�?��?�`���ip���?��FP�'��L��?A/��d�T�\���"V��J{����GL�eS|"8�_#��-���5��6�Y���x������b3����OE��v�GJ���������YJ��y34{�_M�����:����=;��5S�f.�4F-V�S���)���w!���'�����)���?��e]�N�+��b���x��������?������0�`}(X�O�^�QT��GhI��*�A�r��W�L%���������;�G�����w��VgH��Z�b��[�\��ZfMW�
��R<�t<^����?�������������>#�!e���2�A������?
y���aU�������A�T�j�
�4�G���?g��0��^�?��_{�
��|A_���K�'�x����������������4���QZ��)y�.=�	6������R�L(������b��[�?�0U��?����z�_`������%�q4/�k���?E@�]����� ���x}c�H�f�g����m��������G!��?_=��_�8��?���cFg3�ZV�s��.j�b9��PZ�N�����)3j��Ig������':�3����p��H�^�p<^��F�?����A���'��V2c�|��z����)����C�_5LP����v��4L��y�?�8��5�������������5�g����U5�u�7����n��7�Q����g���H���g��>�5��_��?���'(�
���L�����v������U��S5��?�����A���j�D���8����Au��9�B�G����mL��������?Rz����Z���\�P|�����_����P3h�%�Uq���a*D�Q�^m����~�q�|� e�z]�+_�?��_���}������K�?:�olc�o���n���������_Rz���������"��*U�y������>?g��9$�	y��o��?����_�,K�MV����7������_|����|3[L�����ir#�3��Qdg�d���N!g2.����^�mL��,"2�.R���HRz�(��f�����NRhF�IU4haK�~���fU���hg�
S�.:��U�!L��"!��O���S�DQ�c�E�LQkO�vk�U��Y���\\�k1Z����DQ�����Z�J�rqRz�X��j��*�y�b���Z����D2�~�w���|Q�V'����B��A�xZ�qBo�E�_�������������U��E]�ObQ�����x/�6*��E��%����:'�*�L�(E���fTb��>��taGe�*M|�T'X�Q
SE����L3�j��f:#|�
���r��w4o�(�����(��/���
_<�R�x<�K������(DQ��N,jN�/��(*�))*������z������� |�f���t�uA�aj�(�1��V���M��_R���i�W����o�������kQTZ������CQ��O���x/�6&����X�BQ)��
J����h�����J�r�e��TU�O�p��Pzh����
��a��������/���]J�x��k����5d�^B>4LJ���������^whG�rD�������KB>}��{��)�c��t����{x��~.���+��/����%��_W�PKN�G	@
�ZPKT�KPA_test/part_pa/4_3.outUX��Y���Y��\[s�6~���[��`p���g��d��^�L��Ng�2Z��p,K�,�I�$��(U5���d�&�����\�,;I��O���������O�������v��-�U�]NO����Lk��g��-�=�@�[q��|#�*��LW��Y�*�-���9��t�l�x����@z�m�}_�'���<�a�x~|}w���&����
2,�L�������n�~l��������by�/�>q�l�)��qY,�����po����|����w[-��m�~S��L����J�sp�����X������d>�o�3�w����|�y�!�6>L���,��^��8��$�z0r8�c[�#;�s�0����b���\Z�	��C�*[��c�V�	����G�����[0�����._}^����\L������K�V����o�}E9���,�e���mg��Kn�A�P�X?i�j����r5I���?�������*����I�������(�i�"s}��U��~nd�u���{\��	O�(����m�)V����9��q:�g�ym$�
��R#��ju�
��d�#�yyt{��9-��}�������e9�n�2��&�����x����k�������e���R����������&?�&�t;4�W�&����jO;k��c~�I��_>��&��(��U	M3��p`��X���7�h��c�n
���b~����X�����k��`O�h@���b�Z@%�m��!�X��O����7��%C��
�I?����/f�|�y�x ]S����2���c���k�n����W���~M��A�������&�	����(��
�g^��Mrg�����fo�T�8�����*Wxz��,������UOV��Q/�W��5������#�y��@�c���x|c����W��X�H������&������qL�������@����?����O�L����X�R'�7l�?�tw�#�{cr���5��l�:]�?��?`��	�?�����bS��C�OF��m�GH���Y���������
��pz�k����K:�Z��R�����=;�k�*�����5j�Z�e���'iBM1��'����c1R�����.�o�%� ����mT��-�����	]����xN��
������i���I�W��O�$�C+��u�T��
��,��,���$��3DN�'�3��_��c1N�[�\�����.���L������mL�G�6�_=$������������g�&��~�#��D��$]	1:�����0��5�?r�x���������"i�>^�M�����A���p���yY�b<��������C�[��	�����Q�r^�K�|"������WUBX��N��a��l�Ti�����h���)e�<�LH[�O���������'tm�6�h��>�����mL�G(6��=$��1��m�#�������("���n]��������@���_�VE~�����D�T,� ��-���/���t�/%����W��_��D��L��#�������?��F���b8�����9�I�Pi(�r��
����IT!��+���������a���4qTZmm���X[��V�B^�/���?��?���.��g�� `�?��F���fP��������y���$|m�J���q>���_���J&�9��kg�O7LE���������q�����S}�V�;��X������ �������`�>�����mL�������!�����U���AE��W.����/g����?La"w���l�
���c�6�hkT/�yK��(K�o�g9���h��N���B�
�?����x|c��V�������/!=��;��?T����U�V���F�����/��I�4��o>�0�����2���Y���*�������4�C�)>��^��|3[L�����ar#�3��*�E�f�8uS�qI���G)�1�(���pHQ��"U[E����s�"��a�$i&I�(��-Y��.b)4XW����G;U$6L���������H$��I3%���������Y���j�n-�J�2���!�� �b�(�6&��Zl����j$�E��k�s��F���p!���\���,�+�I��������V'����B��A��$y~B����<�gQ`��-Fu�,�N�����!-����������?J���EU�E��,JH����X�?��jm5��s@b��>Rxa���m-��N��Q
SEdQ_{�"�j��f���m�7���b���y3��iY:,�"�����-
a�y��������bQ8�E{��cN�/��XT:S������bQ����� �"�����N��.�5L�,�i�������F&�{q|H5���i9����,j�:�u-����u�q��E�A�R�(�6&%h�Z�
������;��]�_^!��������.�J�z1
C��k�l��h���c�KW��K	o���������2)���pY�v�t[@���v�/9V���<�5_���g��'Z��M���<N���L�?<]d��dx�+�����^B�2�����PK��tI'
�ZPKQ�KPA_test/part_pa/4_4.outUX���Y���Y���]o�8���+t7�E	��+�����~�Ew��b�
�#�B;�83���=$%Z�]�20+�3�����^���C9�N����~�O������������i����*�.����
p��h�5c��F	& [.~{�����v��F�Uv9���'�lU<�7����{��Vo�-�t�� �^}�e?����=�~Z.�_��-���*�9�B�������v��c'�������|���g�O���b�,V_�/�A�4Y}����|y�5RY)���
�)+R�\����Pc��byOaff��<���D�+�N����_�/��d�*(�^�e� �]�J2����_�5���b$D�zd�z����|\,W=�K�-3��
�V�2|����S��9��`���>a,���b�t!����W��q�<���pq�����>U8k���mQ��d6�g����|~[�����[i�3�"�!���B�aV��@���}�&����O��6{Ko�>�E��E1O=�i����S����F�F�:M�����0�m����b�0y������y���z�*��I%H,#5B9�f����p�l�D������sN�{z��y1��ryY��[?�����#I�+.^q������*{��/Y�u�^��e���z��i5yx������
�����W{�Y��o���K�������6ETW��Jh&MW���Z04P��x}c�!�q5���d}~*�wA�Y1��U�#���iI�����(#����R�6��K��=�w����
7���@��
������?�U��<]<��){c�i�O��1�{�9G�L�a�+a��_�9`�	���"7��'������G.�Y&+�k�����p�h��C�����qU�6 ����"�����&��wd��Q/���o�r�?��^u�;������������G�o��.���#���������pN�����NSbT�o�
��
x'��F1���C����O�B����X�r'���H�d��)[������
�'[�C�k��^��[&�Y<�olc����rH�GJ���Y��t�?�,�����-�9G��@JG��)����=;���P�.�?����)Q����Hmh���ti~���z�#�v�g���+��CwP�������-��!�S���?������02��`����E/���C��!�������_]U����"i�Y�]����/�_�V��I��Q-�b��[�\��Z���k�7�?��������G�6�_=��#�����)��,i��F�����DE��d�S ��!������A#T�j������;�}�Hc�,������?y��h����RNw��`,3��B<�olc��������~�w���sZ�C�(����r�������s>g�92H��_C�-����6B��,�O��_�@�^�'���~�����'G������m��k���p[`�~�v8�olc����vH�����?{N�3���3���_�8��:G�G>U��i�N��Z�X��Y���[�_,�a�}�����+�^�W����������4�������'Z���OG����/)=<��9��J�H����?P���WQBI��P��������o'�L#T��4pT5j�A�����i�=�}"����/�k�����y����_����x}c�L�f8�����9'���7�kA����)��?_�����S�	����e_;��t#T���q��3#��G�
�3J�����V�'�e_�������TW�
���������6&�Ak����WY�5�H�����j�������W.�����__��}�������?p'X���P!��2�Tvm�����Q~{RpuZ1t�e����:��u�7=��&>���x}c�x���w��p��������s��Co����R�����R����C�e5MP�<�V[�S�l�����t��d�����H��T�U������^��|3[L�����ir#�3��.���(��~��E#�,����DQh�"�A��2RT�]$)=��<�UD�hFx�N�5�,=������)b���EZ���)��/�t��5>Et������O��H%��(�.-UQ��*�f�Z{Z/���V���z?m��a�������(�[���[�]����E���{�s��F��#^�Sf�^���
9�Y
����Z��Yd�b-*������^�"��dJ���TgJQwZ����yQ���
^��W�x<��bEU��jH����ST�E�#;�)����V_�#���>J��/-
�&��*�A�N�����HQ_{�V4��ot���>U$kE����HQ<-E�CQ���z?u(����0�����DQhQ���5/jN�\�9QT:SRT
�Y��R@��B����I���
t��'�r�P#E9��T������)R��{q@�}-�^���V��zQ����|�^W��
��2z���?��bEI��u�y����^��E�p����rvoe�3�8)�nZ�b.���
����������/]{�.%d�����i�}��P��.�8�c;b��;��{�C;�����?�����|�L�D0|S�����:������"�!������%�	�{	�W����PK��>2=
�ZPKQ�KPA_test/part_pa/5_1.outUX���Y���Y��\Yo�F~��������Vd�N��^�b������L�H�����&�9�txh��)v�"�����bwg�7�������������eG/����w���:������LR'���B��f��_�3��5�X_�){�����h��'7��Q�H"�"$�G����=9��!�>���/2<��]o�m��m�����9��W�����d�O���gJ�~�o���"�X,'�<�Gv���8=�mnr����j�_LV��f�>�~|��u�E������i��/^�����b���k~3_\]�J��<�Nf����<�i9�,�//���h�'Yk��""���1��QD��gT����I�F��|Ff�g���~2��h}�/aV��nN�#Ro�=�JA��%����u�����u�\e���,������6��U��8���z�uj��9J?Gn� ���
�����J`���S|�f�s��Z���)^HZ8��(�-�kC���B[*�*���g��=���%���L����RZ�*�����hu��k>�����1�\��`�\k�#N5uDo��Zb�z�`y�^�g0���Zm���Kv��h:��dU^<��������sp[d�9��q�|ro��%����KL�%i�Qx`�S��H�
�	�	eD&����=�3���?ThM��$7_^�� 
?\���x��o*|�h:����b��%$��t�j�Xa c/D��"���1#�!��dy��Hy���)�#�4���l+�ms���Z \0.5���;<�)�D�b�0�$��y=����q v�8��C�9�����FK;8��E��p�8W��q>\A�!hVI��-��-����/`���^�pn,�Q�X�����ZTS���K��2��C'�������i�d�?8��i��0�8����(�s�-!T8���49��u$��{d�8���P��\3����X��b-�O�
�\`V!i�����kr����������)K@U���tf0�mr�������e���D�;�D���%�:A�M2�+=��9�8���%�c6t�g�
�kL]|�w�O��u�"(@�A�����yK56��(s-D������p���3����f5����[x��0��C��d�D�;8���@�.!�C(d���#Z�(sM����5c�n*^��>�H�+J�0���9��zoQ0]��ZLDBS^�X����C��a�aQ�1ic(o)��ix.��(g�k�t����:_�N3H��2��W���*�+e� �m5��O3����i���Z3�T�<H �j
��9���;�G�jA�U�b� T��~Aq�+������!O��
k,k@-D���H]s]2D#Q0�A��bGEA�$�$��}5Y����9�U"/��1)���J��i#{g�n���A��{�;�B��9���:�����6-8�^���/)�?/NO�j�-���O�
�e���?p?Nf��,`����f����=���o!������||
��oA�g*�v��W����"��z�\�~.t����@�e,Z4^��`��m��@a:����^LI0����,�Y.���T-�����I
J?���E{
<�4��"�q[y[o�[���5ho�+*M���<�0�2�\�N�1b
DI��������.��*�)����VGk,u,o�x3�i��+)j<��O��g8��G<�����	�:���/����t����v�-�O�Q�4������E�
�Ab���5ho��R��u�G<jb��3]�aw�tX���0���xUz^�Su�i��li�<��xa0���-�tj�HH01�	�N�c����� ��{�2��XJ������<]O5�#	E�7�� �)��������3)�A����q�O2u��7�%T�����E����\���O>
�L=�A}�u������ZSM?P���x6E<��p��L��x@�Iig��Db����,�A?b�i?�)�����#���7���p\_�x����t59��#��$e���T�L
U6~"x[�i�x[�����FWG�����!��z�0P���xi9_#|WSS�c������5����A��b��C���g�j�q*'�moO������@{<�����\���_ul7� �I�2�!�?z��j��R��dp���6�7����/�x����x���d2H�2~���T&l��#�����Ts������z������j�	<N$���x����|@7D��LTj<\��S�I\��-��fZ@�k5��x�i�'�M��]�x[�����0P���x2E<������0��,T�]����>)��%������+���RM����'Z�h�	<U����@{���'	�;��Z���t���S��\$
����rE�C��M���O�Tx��h�	<G�2Q�x���N���<�/3P�������%cp�O<4�JJi�q����m��x�i�W[���E������5�.�x��N@���g�Uw�F�5������� �s#���Zjh�����t�/�/����p3�3��~�g����a�����E#����������!���������]v�����
I�[�UB����x��	Qp"P-	l��ro�����y��T��n��b�s�UGP����'�������:���#(8F�3������%WG��������W��Z�AG>�*��e����7�*�o�'��W:*�tC��N���G;_������,����!pR���~����n�c����	������re�na��V�s��@O�*��L����-=���Jz�b2��gy:^������s�h~_��F����i���n�Zb/��0(@�E{�B��g!���#��C�M�9���i5{�����<%��a�����^PA�����Nf����!���fu����������r�4L��=f�������PKgl���W\PKQ�KPA_test/part_pa/5_2.outUX���Y���Y��\[o�8~���[�J�N�`3@�igf�St[�Tx!1�X[�i���!)Q�nNZ�I
)�H��w��\H1I�����W����{s��[�-9y~�r������$9�f��BRVR�
!�m�U����%����e����4�L�I>�K/�6�����,�g��>;A���$�����Or��m�z��z[�����������t>[��<��H)#����<]\�W��l�&�HN��g����]���I����z�m�J���//�5�o���$����&7�d����5��V�W�/�G�g��|�w����6����Uz=��
"k��D� bj7DZ#���Y]<�: 
���ZL�Y� ����]���)��$�IW���u���,����rN����FBa�~U��7[���u�n>Y,R���f�YLont={7Y�3T����Q�>r+�f�5�A,�r}����mXC�t�a�i�e�ju���8Z`��^th/a��%�aP�MZ���z�����Q{������v�UJiS��(?O�7�����D�j��"�:`�\k�g\qB0B+���� ^�����������6��\$�5��+�./�%�o��l�y���~��������S7b�$�CL�H�`���(cK&�&d�&B�LQP���A�^��
]�lu�������x��o*���|������G	%�8�I�	�b, "�/ �MH�)��Fq���V#�r��A��������t�H���S�����%����R�?tGSj�J��b6��[�|��������v�4Gj�@]�5�5��������N+��C�3��` e��1�]��~5�-�g[i�������i�b��i@�o0ZTs
��
s������W@��	
�*�)���8�H��Qs���\�9���5�`t�M[s�y���|sj,j.q��BE�5=���c�`��j�E
a�C�tu������C[s��sQs��C���FD�R���4g����P����C���C	�D��}�T�������������i��s$:��%p�2ll���>#��x�lHs�}�T����:'J�����y����C�Z�!O�FH��E�q��?��>�`�m-n�C��{J�Y�s��]��� �b(*2�P�B*@�����j����}(�|nj��5;#.�*5��et��>}a�t�.�P��F
�U����������L��&$6���I]y;Q�����n�+��s8��W����m���F�U-�b�OoRw��$��'��yb8~�B��D�	t�g�"���a�de�<������U	B6\���}��l�4��,�U��Y�.(/1saqjn"��!���.P��p]>����{@����,��,��S�Q2����)��1K������p%M�����#�G��.y��X����QQ��rE�.�g8��s�����sw�������M��J����uf�O�����&���{���"�	iX<�/���	�~B/�TL�������U�)�yv��ca��C��:[f:,���lS�-$t����N���T�r8I"�FEd�.d�T��x8�J�����h��W����!^#Q�e,�mY�M<E�Q�x�1���,�bF�]��k�0yKL5���8A�����`&���>���(�����GkW�����-�)M�`^T4F�x��%��<��y/�x��?&��#��yI�
7*ZN2��h9�a���O6=Z�����'MXv+#Z<��OU�-I)��0c��d�L�$������y�������xuyZ�SM�i�G�,�"����e�EcD�9X����"1�tOP�D�P����$���2D�j��l�H�-yZ���PS�����P������1���,�L���}�-o0��U���6k���uk��r����m������4=��\wX�]\�D)��+�1���,�l��0�s�R0����NJ{s<|�m0�U���`���(����6=����]9�[(%EcD�9X�A�����W���<�+��jbN(�1\����'
=oK��:����0��x;���m�#EcD�9T�U��|���W55%:��[�����p�y���Rmy�x���B��&Ns�.�w�x����acD�9X���u<��A������^���'��P�	bk�ez����B��d�ATtY�jbb�BcD�9X��Z�'[�#���mUs<�s�mU��-=�����v�4��k�?cU�n��O9�0���,�Dm��L�r<��7�d��"�c�"���^�m�!}�������ko��n����#Z<��OV������YJ��p����^M%��!��s��=�\���j��:%������jZa��50���,�j;W����s{���&@�'�r_[�W�����1���n��G�x�I<��vX�j��T4F�x��%��BM�q��2f0��y���e�=��1X�	���:�\�����x��+�c�j��n����na9Aj;���|���	����mF�$�;����_U������y���/|��G8pj���L���?��|E����0c(��|�,�B�gC�}+�I<������b��C�VTH�e_w"T�r�wX���6c�&\y^�q�	R�x��T��n��b����A���4�s���5�u6w�V_i�8�������S/{F���?`��G�v�1��Z���*���������u��|�Ps_t�9����A'H}��q�J���m��.��E�A��7���@�lc��^w�B�x�x�������J����;��U�9��A�ui�9�V��=w*G��������?�������_^^bD��rz3Y��@_Ls�
����.���4�]t{�7fhl=�ev�c�k������=t�6�]yJh��po��s�=�%l���;[\�G�B�i���O�W�����o�WF� �������������PK�C��W\PKX�KPA_test/part_pa/5_3.outUXz��Y��Y��\Yo�H~���[����0�8�dfv3A6�`�O�F&l�����g~�Vu�M�����#���VW��u��$��������K>�;�-����C:J��nfy�O�u~&�VR�
!�m�U����%�.��3eO���4�L�I>�I���&!�Yv�g�~��_��$������=���k�����Z ���_N������x>[��<��L)#���m�..���j6M�$�/������&�������l=�6��(���������W��i�g�����t�\�����d+����WU����b2���&?������r�^N����VH"k"b�n1���QDTbU"
��$����lA�����2��$�JW0��e?�`R_��*K/8����A����o��NW���|�X� t>����f1�xG�a��g�\�6���c�V�����L.5��)Ne��=C|�f�c��Z���)��9�D��hq��q�1�kg
q��?i��A�
��F�c	�?6���6������!��<Y_%��f�R2V3������X{�pN� B����Y�8�����0��4��,�����,��|����|x�����Y�L[d�9���~||ro��%����KL�%�}!��A`N["�6E���j�Q�2�xa��3�a�����kH.[]�� 
�\��Y|��/s�d>O���r	��%$��t�jLa c/D��"�^ K"a�fy���Q/���?�O��"�[Nm�{f|�E��Y���;�N�F�+\>��:������s�����+�@�)�c�s�S�!���s���s�!sC8�8W�q�Q�K���\w�?��\�8wMl��6v�s�>�QpC���FQ�������1,2��9���s���8g��9��[�n�9�n�\1��9�2W`-����p����is�!��d.$��B;��++�5D��sn;�BU�5 sj��\����0>��]�s�`���+��
S(Q�PqPD(51��sF��K�j27vH��(
��V�|���+��z�;|���J��d�s	$�Y�2�� s�4��gm
�k]��wl�s��S8��N���9��jlCQ�Z���gE���4�9n|;o�P�U_�[�����P_y������C����G���!��J�����J�����s�����S���[0�u`-����s4:�M
�S�t�.k���1D����s7>�*�>�LC�c�H������e���0���,�Q��W��W��u��OF�U4�j�O�R|RV������41�9~�B��,��0�L���	�7a
��JF��;�G�zB���b� d��zA��+������!N�Nn,�\|�|X����� �7�/������ ~�z���}5�o&����*�W��5)�G�,�;���:U����hw6�@S���u���,Z�p���)Z����8=��E����?�k*��Pz����8�u>�Y�8��l���%g���%�a��5������x�bj7�{M��,.��������B�Q�j�f�����x�G0����V�Q���p�?��	��*`�Q�2��K���� T��81�d���cK�}���cO�F"��+\��[�����Qh���@{��Qi���E�9�ID����O+pt1KlLc��z�L��x�i����5��X��������,K�����������^r��(��2eB�^�I� ��iQmTS����'��{�����S�����sD�P�Q���x��x��Cy�GM��d�L/��E�"Z�hm�N��.���yO5��y\��x����F����5ho����
V���	\y�Q�N��O;\-�A�odU\�l��-z^���P�\������s��Q���x��x<�-o�A��UU����Y��&����V���P\���O>
�L��A~�u�����}���#j<���Vr<\N��I]wR:<���L\_��=b��0�)����6=���Mth�x������@{<W�j����x��}E��/zCM����/-�)Sz�=�u<��j2�\���BM�x5�1���+���| |A��UM��
���U��	�8�\����`U�-o��U�������vi�
<0��#4F�x���Z��.x<O{���������q�g!���r�=�P��&]9z��;�+xr�h���@{<^��p{���#���mU�>���9�am��C��M�j�Js�f�w�j�5��s��;W��x����-�0}��D-�����Z�,����cP��Ws�z�G�x���nju������J����������8��u<��r�L-����#��;�U��v�l��
5ek���(]�x��x��5ho�W���$���U������x�(�����T[�C��M���O��Tx�vh�
<��x�1���-�tjJ����1,[�����-c\H"�	[�T�j�;W������^c�
�Vn�o�w#�1���o���@�3��'`#���_��etM���5�-/G�u6�v6����%�@A05�d���_�H��t�<#����pL��fq
��
�{$V�|*��W�����]t�����
IwZ�SB����h�@	Q��
��`D������~x��V��n��b����A���4s��$��E����7�0<�O�.�z����h��c��8*����g��:�q�����!�w_o�u�>>O0��t��=�atq���n���W�e4��&���\4�8izC��
T��{7���q�1���������n��M�F��(���@O�:5�L���-=�����^���]����o?���"������jd�b�{o�hZ��X���:4
�w��a��S���.��C�M�!w�N��C
�iv�-��
��%P�K���������zg�K�4dpCkr�>J����������J�8���g�O���PK|����W\PKR�KPA_test/part_pa/5_4.outUX���Y���Y��\YoG~���7K���!����d�1�6��>5�Q$C9�~��gz��a�%,vk4������>���~{����w��������{HG����,K���r��I����pCHl�d��ks���f����'��d�m'�$���gF2�	������.��r���_��$�����	zru����_��k�����~9M�7�����l�����3���?��Y��H/V��4M���`/NO��$/�����t�]d'@��o>�n|��4�Z^�&�r��}r�NV?�_���x���U���v���g��O��vu~y�N/'YZ��Z+$�1u����DKD���4�Q"
��$��d���,>/&7i�=d��Iv��aT��~NAAR�_��
C���ZS(���@�]����&�0�,)���n�]L�^��}���0��
��(���hs�6���(aq��S��kC��a�i���*�AS<�B8�����+C
	�%QR�~�
%�5�@%�=�J�Zlg��:��R���!��<�\%�Z��d�fL)�(
FQ���>��L1J$�(cv�`y�^/0����lW���Kr�`k>K�dS<<I�����n�%�-2���z?>>�7O�S�p�%3�jg����)c$�����d
"��p��gB������kHn��H�A������������|�����
�G!J(H����A�s$b���HxS$����2����@�r��I��������t�H���S�s�����m��������A��X�:��:������9�-�����s
?K]Yj��:rn���s.:dn�%���!�!�R�dE��+��-����+`K����������`���7
b�o0[T[��jJ�s���s��������)���<�1 p�S��3?[t5�[��cnt�M�sx)s!���� �m������fN�����-`�Te���S���9�NW�"�����9������!�J?��$���Pjb83���9��Udn��<�$R	�!�Q�N���������:9M)sx2$s��s��\,�ac{�������{�8��
�Q0���E'J{�G�����P��b�Y?[ ��&�s&�����C�5����>4[������K{n`��s����a29F��"�ZDs���2�T�n�Y��:���b��������u�s4:�M
���.�e�a��;k>w��3�"���	�s�y������#Qf\!���h�(g�+������:�6�	��Bl	O&��*�'E� �m3�LO���W�`@�D�	p���@��+1�s�E�|g�TM�ki/V	B6\����{(�z6q��@o
����AHe$���)f&0��:+�^?I������e7�U�s
�+e������q�������XC��L�\n�<��
!�����:�����-8K^���/)�?/NO�i^-���O��
�%���?�>f�MnV0��j����������%�a��5_N���-H�@��n����Y\��Q�����������8g�����`d[��#j<P����;�CR�A��1*0��L�T�x
BQ�C����Q
J?���G{
<�4�k$"��W,Jy5���'p�Q���e5y��[���JS��,�9�_E��<]��a��G�I�w�TiO�g�x�<��&�wh�
<HZ���|cD��[�������rb#������H�:UlTS����'��{�����S���6��C��Q���x��x��Cy�GM��1c�x�B��1I0�"��B����������<���4��$F�o���@{<]W$���u��	��7���h���(#�G5H�������n����:4����_$�5ho�gJ�(�[���T��y����Y��x��2�t��P\�SO�'�x��� �/����UMM���Jh���@{<[z<����	`�b�q'�}�s�
j@�#K���]�����O<��x��b�tM�-�����R�#j<���U����=82���]7}�3���6������j����\���so�R�2!��#j<�������%|USS�cq�[����;��'��2���W�gj67q*'�pvh����8a#$6F�x�����a�=����pU���m5��G����C���gj6� k��(�4�<�\�ccj<���Wr<Ax�2�Qe�o������9�E��c�!��������B����U�wO�S#j<��OT�Ea��J���qBM����\�2�S��W�N=��O<�Z@7�:vE�m�����x�1���-�d��0��~��Di!s���"	��TK%�R����+uz^��l��QH:4�Y��#j<����s����U/�������:�x`y��P�����x��<�j��v�����$$o���@{<]��`���2�{�l!S�l��1�x�9b�"��e&�:�\�S���O���<,Z��V5���^c�7F�x��x��p����~]dx+���_UR�:�~;�g����o�� 0
7�?����/q$_Q:i�f�:<��k�E3?�����=��.�V�b/�Uv����VTH���������[��H�����Mu��'P��p��b���;�vg�����~��MV���2Pp���h
0���8/���v,�&�?�X�/����3F{�U��Me�C�?0���O�*u�'x��������}���������|�^���d]>��&'Mo��W�
�q��c�r�����/���
����l
Z�9���
Uj�9L�V�s��XH���/�@1Z��</��~���9D4���W�����4�����Zb-����A�E���A���a�F�����&�����i5{�mN����F����' ��h�z��/��-.}�^�!���Q��6�n�}�2�3F$8k��1�����PK���g�W\PKP�KPA_test/part_pa/6_1.outUX���Y���Y���[o�6���)����yxx���n�"k1{*<[H��rj+k�O�CQG�Vp���
j&�L��x.:b���?�z��v�����?6c��?[��W�:/;Ylv�9x���\{���l���;��c�,����������X���s��.%r�tus�j��������������*���g����r^fG��ah���~��W��?������O��\
�*\=�4��7���v�.V����gRG������C��/��2����]J��/�]�T������~��R��e�Ze4����X63<,
x�@b�z�+z[ODLD�K
@/��x�5B��&30��y��_��3���b^�M�Vy��e�~'�l&����M;������I��
���������Dp`�����&#���/��?9Y��]������s��>I?����_O��������TG�x������Dw�|}MQ^mn�j�����>e�������l���'��}����(��S��)n��6_�~���j]�(��m+L��}���\��J�+�Z�m��V��=��n���ug�<�����}=��c�����P���	�WR&��������#l�����O����cQK�t��B���v>��Ta7���M���:~L�t��*���>d<�������-��#��?G�G������{�K[�U���Hy&z������(=��p���J�/��>@bB��������VCa�D�z�G��J���D#�M[�Q���T��M������S���r+�
\:T�6'X�$��be������}�c��l���� 9������M��c:~$�Cr�u]c�N{o`=����1!w��P������c��HZr�F?(i�S�G"7��~bo�@?�����<P��Mc����$?J ��?@�8
]p�&��X��>?V(��P�|g��=ym!'_�F
4�3����Q��?�oB�?.mh�#Q�m�a?{�|���������5��( L�o6��
ny�)��� ?�����b��Js����I�o�	�k�����P a��H��~��](��L�Zn���7R���!~h9[~��\D�n����t�M(�:�"d�:�#���>?�Jm�Z���n�"�T���S���3R���Az�*�o�M9���i��4.E����XB�:�USh����"���J��������e�����F����`���]q�@�4Z��Cm��]�uI��vIy�;�7Z��O��&�O7FK�$vR�&W����aT���EMFk�
��(�@�A��F�v�,S�����fm�)llw=��JB=VUi�����uO��q���(=q���'��� w�����H�E�e��I��%�4�������6��#���[����-q���"#�}�9!����B)?Y�D�x�=y���t	�M���|��iz#��m"�G��� �*��
����{��T��e���#��T�h�6�=��;(�����L�f�������Hic������fc%����:��&����:��6O���u����N�
�D�����:Lx~��T��R��w��P4PwF
�������o�"I��}�L��a��B�������l�/zloh]hYo�����O�e�Y|�0/.�9�1ZKB��w3��6[��9	T��)bV*�{R[����PK(�����;PKP�KPA_test/part_pa/6_2.outUX���Y���Y���[o9���)�F��u���(���J���j�P6�#�II�K������N���R`�
dR{�??��8�}���_/�����z���������6f�U��K�N�]y.���k�yk`����9�O���<�������|��|���p���3���6�+��tF�}����y1_�_2���b�]���(S�8���a����l��T�8�v�o�I�w��&��-r`�z���~��;�f5/�ly�D��W��bq9xIX�����i%C�S{�R��������+��������Z�|��h1WWY�lW�|BkP^���Y�Sy�,�BsC�HCr�E�6k��3���y��_��3���b^�M�Vy��e�~��b����h��	o�g�H;�\.(��	�u�s)ZG+��M~
����^|`�l
R��d�~w�_-��O��#�^=��Ggg�O��������TW��*pV���a��r����(/7�e�t�eS����w��uQ��H6<|�I���^�x=/������)���6_���f�@�.���&_����i�l_��#%�������+���'���Q���1R�z����%�zDG�U�p�_���4�j/������SK;)�V	��O�����*�|�L����[k�@?�����A�A��X���I;
�I~��\k�� ���Ac������}~t�>��Za*��u������H�&~�Q?N[�
?��t*O�?�H���������L�'j9T8L���G���]Kv�*C*$������)��� ?z/�P�1?��_n����Kk@5������^��S7�s$~t?�I���|-��mk���4�H�&~�1�$x�"�|K�k��t���r��.^Q&����������c���t��B�u��	�U[#����F~����$�P(2]���$?��>/�����:���������mV1S������rm�h�g���(��?J;���?.B���w��l�	�'~�����#���������{
���k2F�3�3H���P�C�LM����������R�P=�p	1�=G�����;���c*�xK���1F�3�3DB�GP���U��M�{o ��i����� ������*u�k-��}N%0E��\����6F�3�3H���g��	%��Z�8m������1 8���T�b�e�{��	���x��,>��XvC�<���;8��T8��DYw2d�E8������v]R��B'�Cp
M��r?��,�I�4�X&�X�:NUos0	)�5��	��c:Cn������V���UPf��)�*�����)�%��/t{g���*��!q:�<�t�Hx�����2"&"��)O�9pF>w�#����Fx��|:EY�5��X#���D^�<��� ��
Mz�)x�9��wuU8���B*><�S����Yt@^�\�$S����������jc������q�owF���"^�l��O��f���G[il�z1��M����"���=�����<����*���Hi�1��#/��0�"W��9(���3Ye���@N>�/���v��'��y�c8��� ������TQZ/���Gn�Dc���?y���C��������j���DX1ED�m�oq��|Uf����xK~�]d��/[�?w��g�7���Z�C���c�������eN5/����W6�z�<p>���PK9���;PKV�KPA_test/part_pa/6_3.outUX���Y���Y���]o9���+|G{�eW	vaw%�X�j�W(�����$��������0�\�M����>�w��c���^�������O^~��	�=�����(�eM��b��/�)����Q��5#����%'�e}})������|E�r]\r�VS)�mU���6��y>��}����e5_��
���j[\���$]��k1�u,�����u�v*L����/��N%U�u�2����Nc�{�}Wlw��j^U���@����m��m��5��%Z26~r�Qb�~y���R��(������G����jU�177E�l-a�1o�t�b���U:����Rh;gZ(���a4��K���������{���,��TdUVEY�����X��T�w���8f2v�,�4�����r#���I�`�o�ay]���w��M��������������%y�������������f�B+[�.�6������������t�iS�eY�������l����pAt������?�~Ab7��������xX/P�������)�wo;q�'�W)�RBK��i_j��u�Bl �3@-��Me�<�Dcp�����1�?�9�Rh���	��'�i���rg-[y��'�sz`@���S�� �M�����*���D�2?������2�1�J�zt�ig��������u�r�U��D��!?�{����>��'C���&
���Gv�Xe��������H�/�Do7� �K�Uts�c���~���0Tn�\j�M2z�Y���P�(O�g���}p|u�G����Fo`�K5�q�����
a��B�L���QC��q���H����Q����L�'�3J�����s�X���a*�1))6o|���C��ez��=���������X'��8�)�9����dzF�1==F8\�z4G7����������:��T�+8�/Lm��NE��cp��z?/�Mf�F/���Mm�@��Q~l���@�	�He�e���G��O���9�~���v���c�����B�}������0�n�2Q�L�(=��8�����*�z�u���`������8dzND�;�Gq��h��I�R3��nf�M(�3�g=?
�c���@U}�&���$��q!s���W�L������U���J��HFo���'E��n�2Q�L�(={g�����^�Ba��F��*�)z$��o����T�&o �����G���%��6�w�R��h^J�����*X��u��:�d��V�M�O9���K�������b4�j�t7�p�uZ�'��%P?�R\�2*��<,;�D!�2c�=�wp�AIIM���D�?n��Be���d���r0�	�����sL�f�1$8<}���:+3w�e�;��=pwp<B;��$��\Lr��y��.T&
K�n���Kr lq�&(��0��#)B�TZ$	�Q��iQ���e��q{�
�U��aN'������<�T&
K�n���<9"F�w~`:��q���4WlB?-B>�(������;8������<~ �o�4������p����	�|����8��&�Q.\��7�Q�����f��Kb}w�
�rh���S*�A*���8
�K��&J��G����q����j�����bl~���2���U]l�A��5��"�������{l5�,��`�	��$���<�+�u�5�@`��3��z��9��f�PKk1-���;PKZ�KPA_test/part_pa/6_4.outUX��Y��Y���Ko7�����>���/#��I[ �E�S�J{=i�8���S^-[�sH�0l�Z�\���<8+�wo����?������l!���l�H��y!��t�)�0���&H������������<�x"�'��z2E����U�(/e�m�2_���������B����y�5/..�����2���6CCW-N��Iq����j�
j����/��fPr�<���U�{����7�|>Y.�����o&�����%q���u��L����n���o���`�~U�����[�5��|��d��������J�9P@��s�_)�z"�3�A�'��r%�������d��2/�+�{���O'K�Z�y���"[|P
��jk���1X^3�6P���k��
�=V�z�=���`���X��G��j���?l.��Y����x!�S����4��D�x�����Yy���J��_|y����,��&���uQN]|]-��2�7�������l���������z�,��K��Y^/�u>=���F�j]�X���V�|v�����m�������t��Z��VH�
AKxG�;{�S�j�?�D�[zy+C���fn-��
���fz�&�m��]#�V������������EA�M���1sf\���������;~,�,���iv%���3���c?�����������h}T��)�*�ic����)��� ?����c�;z��*���N�����$?�<`��8j��j�F#?��C=~���
�,�K��-�����)��� ?f����)#YO�J����t����wY�������~��@������KE�Zc:��N����O��S���AzlG0:*�����o��2wxo�F��]�����ti��a��>=�F��A"��NY���>UoO�F~�q?N^��l�l�=p��TLE����u���{;?���SZv�$�"��S��{{
4�3���������2^6����O�����1 m�}Ck������}~���o��R��Ucv���)�H� =���C������F7�'X������������GzDO����*�2:@����*&G���������� Ue}�!���7J���xS��O v(��5*=���O�R��6Zv�H��#5@y�S��h�g���������kC,Oc�A����mV�2�=249{�"���R�������Td7��g�����a]��N��g]C[��&�VL8�����uI�vIy��n��X'V���&��-I�L������!6z���T�S����6����#p�S�`���*��T��z*	�	b���S��K�s_��JZY"��[���.(}�Q�f$:����tw�<���p �+o�#��S$a������y��Hc<�*;{
[���w����T�i:����:�Nj������������%� w�#p�/�p�~+A��Mo���������)m�F�z���	��L�-^\k���y�<����j:����cql�
[<�o�,#�V�Vj����X�a+����1�j#w=���kH��`��-jr�I�4w��q �*����f�����6l��cp�/����m�T�.`����(��]��S���|���>
��+�1o?m��C�1IY��.}�����E����j�x]xY���f��/�m�����|yQ��T��
�bs$^�d��"g���0>��_|��>u>9:�PK�lF��;PKP�KPA_test/part_pa/7_1.outUX���Y���Y��[mo����_�oI�������vo{��t�������#��������%Q�%��7�o��� ���y��C3v��������}|}s��2vr��1N����(g�l8]�WZ*������k/�|���
��������sv6���1�GO��3
-7�(�/���t:���'���w�}?����3�n>}�]�����A��y�vBr����@����28�'C9����g_.����d����Md}yvGB}�
�����l0?�����r�`<�d�<{�,�����h�=�H�E>x������9/$c�O'�y���w��!����~���E�D�)@����qc�^:V���������c6_����d��]2����������������$YG����d�&����	Fp�EX�D�-�r����:k���m��i:��@!@
.*zQ���e{����HPV�����8�Q��c��CFsH�~y
N����D�R��7�o��m�x`?NG�08�Me?�R�	�Z!��9�JE�m6��L�Nh]g��g����b�j<��|o���������|4�t��o��M�#�7�o�1���������i#��5�V4!��T�4����rRnQ�N��]6/�_\.����������`<���z6�&w�`y�{��.A@��T��[(�,�Y�P�-:��$��[c�#����i8����5aY��\����H�����z�)8p#<ra�� �#�������~4��y���W`�]�S�/�?�"��(���������U?�Ub��z�-y��fw��/�X(�X�vO����kW�A�H�"2<�:�v��J�e)��o���vO�����eh�.i��R�����"���3GN�pwO����NS-��*E�d%^/����q�������]�P��v��L��4�B.���(QR�M(#�`�CyCnU���K�AI-��pS�)�x�n
�2{��K���+2��-LY����7��j2(#\_ko2�y2�4dg =6!�u����$Zj4I2�2���j���c'��=�)�%#T:u`S�-�l��'N)��2@9��wN�.���g)������tj�Ma�_t�d���e�dt+pz��-�Lp�9����N������3(�7�XM+6����G_o�P�a)��
��m�B��(2&S����=��-�Q�c�@�R��j�E�L(d�9yf+�����@����XM�e�_�'T���Yf���8n2&����-��t
2P�T���0lM��2�r������{��n��"@��h�)�5h���T����r���.�)�0��O�X���Nc�iq�����Zn��/�+���x���F����+Eu}'
��Km�u����1��L��*�������@l���������p��o��f���!����z)		M�p�@�;������Jws�����g��>es>�7��M�����xG`����F�~�h���.	BH��
	%��*(�b6���U�)����(��u3a��@��S:��RG���,�"��1Xd����"�!�Y0���_�UWd�M2#�����4#�K@d�9TM�P	�0T��S86�� 1vA���e�p^�������l�"�M��L��2��\�(�~%�}�g�jaj�
z����`������!i�^bm�")4=����K�)�~I��3[0�k��� o���H��0�p�>��
EFLH���<G-&qO����SH�R��
]�2�]����V�V@3�B
c���~���n�#�e1	/�1r������c��
i��$�	z~=�����$�V���D�(i]��|�P	_�q�/U�N]�ruNW�Wu�E�� ��Pi$�t��m��^P�����V�PQ�thV8��1;F����Qy6!W������������-U&sDR���P��D:e�G�H��S;�Q�r7U�
]=��1A��`s=������V�b�?���5j���l�gw�=-�u�zLB���Za?������n�Y�������woo������������`����S?�^�,8g��e��Y}��+z^�u��F{#qe�[��I����~�M�
r�QT�
�)T@�}�@9�:��?��������_������}��������/S%\AU�8#Uf��S�v�#��P��_����h-��c}�a_��F����h�
"���C�"���i�k���be���l��yg`������#�U��|��������G�Ys�;�����zo�8{Z����e������A�T����T�����PKGV����=PKT�KPA_test/part_pa/7_2.outUX���Y���Y��[mo�6��_�oI�A_'h
d��{mws���~Z���ql���n�
%Q�%Y��-\�"Q>��rh�9~����O�������������8a�GO�����t�^iPQqc8��l>�mq%�o����
���
��r0f��)��NK��{���8e<�����	az�5c��&�������O�����yr?H�5��^7�x����T2�.����}�|�d�'�'���l���	�y6��g�_�����2M~O���$���g!$^<�f��s���X�����>L�i&�c:I�k�(����}H��k����J�k!�^�t���P��5��t�������1�/���`2I�.����`9>t�V�� �$�������0�;������R�~Z��5����|j��|���5P	���^#F�rP�2(@xnm�,�3p�#����&�����R��
N����DtJ��7}O��{�x`�MG����+����Zp�	�Z��6Xp-
��Pj����d��tB�:k!����/�����(��E���]���������ug�q?������/���{[�"X�F�l!@rFBVH�F_�dt�lQ�N��]2���].��������N��8���,��E(��q�Q;�U�H4^E=�@�fI��r�[
�1��4�vzk:�����>
6��:,�rK���E���"��Z ����q:�Rxipmt�d|3��<S��+�����D4B��"�����v�W�m�B��*o{���vK���������������m2���Hx����A��]�D{�p��+K�|��������a�]�m����3�����O�}�d�����-2����dH���K�JrlR�nJ=GN�2(vO�i[�<"����a����$\I��B7RY���-2�r��e(��sdH��|,��MYc��MI���k��%e���-�C��2�:JC���d^�M��6��&d���*K�eml�f��%�lU��p�dH��������NtSB����g��5��,C����<�����M���y��"xO�Y��(�[pU6EU�q�!=��[�lW����
z8�������g�y��A>��\H���v�\t��M*3���\hJ����L$��H�b2�HX)�����c�
��#C/��r;F2���<��v��!�]�k��4��!@?[�[�]�Bt7��C�n�|��}_�����U��5��-8c�`��(�c@�i�}�!C2E?T�!�#'C�6me����z�L��y�+�����mZ\Y+-���s�Hn�,����h���p+�->������iq'
P�\�}�x�>&���i�Z�=�7�t���'2�|����}BY��V���o��N�;��%	��ZXT��]���z�s��9\.�����O����/zS@����)���=P�>v~}G��C�B9��mq��WAQ�Yw(�b�H���
�
���.�U�]S��W67;-�B��-�*�Yf��z��RaT�/���Y�%FJ8j�Y�^�����Q��X�$�_]-T�Z8�p������1�c���dw��4���%6��9v���$1�&�m&���<W�
���bf�.0�}[b�u1m{,�V`8w0��-�#c�0&�z2M��Oz�%qs5�xc=�:��TtL���g�15=6f���3�`9zq0�;b�)�7������	,u0_�����b����)
��YI,�-r;�� k��\.�4������z6G9H1��{y2��*��K)���������{4�e���~�U��
Pi����syY��tO�%�I,��UYs	��6I>�2,H�_�*+��Eh'(��40�A
Es��!��G�3D�Q/Hk�g�rqL<?~�����Pe�Gcy�!�2��dD�`.99*�b�vV�Vr�U�
�1����(���p���uWB��T�l��G��5j]��d�&w�=-�q�zLB
~�����|dX�����~ �>��}������\�^^�<>��>�������������5��zG�#:^��
�u������-O������~�M�
r���
��oC��S�Q�@?������^����-����@��=x���I>O��Ue������u*���o�b��r��U,]��p�:���%�k����V*#���_T��nw�VBq~��X��-1[dt��3_(k.���KX��������|�P)���5g��M��)����f������=.3��DU���z&���''�PK��.\��=PKR�KPA_test/part_pa/7_3.outUX���Y���Y��[ko����_1�l�`���P���M�]w�m��B�	[�,)�d��{��P_���R��K��x����1v�������>���=�k;���g���y�2v1�-�k�Rx/����k�l1�my
���}�x���]G�j8a��9�vNJ�%
��%�G&�����<#N��e���t8���w��j~���H�i��~�r�L�/`[b��!���?�/6>�<>'��)�����|�H/��$���`�&�����x�������'����y���n �t�<'���l�f���irYyw`��0}L�C�x�����K���=u��c�����1�2�|�4[<%�%�8N����a���������0��D���t��>�fs��v+9�x��9��DA�,�$s������i�H;�@�������lRp�3yAj��*�@/������I����!��u5=-3�>$��A���M�7n�.��g�i$�$G���P��mVJ8Eg�&��6F<�A���G����)�����j>J��k�&�d����Kvs�7vQh��g���������#�7E|og
_D����!SgH��QD
&h�&�t�{+�5���4[�'�|���2�|V~��;��4�L�	�����}�"x���5\F4x�d�T5�)P�K���Mf�v�����W�i4����Ui��
�*%�(x�=�>����R�,�	:B�;��6���$M���Q�}{���{}%���(�����+��%��\^�����qG�aw9����_���4��N�_l�AE�*2�'e��$�C���(��P0���%���![<�S��b�z=Cr��<]yQ���3'.�pV�_�CS^5��Q��A��HOh_�)a4��R{�1t�3���9���aH#r���
%y�b�4��abP4��=CJ���������I*kK1�����Xi��mSTbh��t�g��3���b(r���\�c1\3L!�%�34��}b(�%pc��-������)��Y��N)���������!*=���@c��]M1()��3����nCi�C�J5��Q<i1�y���f��@C���N��BJ�M��`��N��A>�����vf�Z�4����8�fXJ��Z`p�6��_�|x�m�����l��0\�����>14=c����'w:�������@�W�R�Pzu�}��$��\�(<��8m1�&s���l�
��Z�����7�G�Z�O|�����f���^.�)G��C1���[�n3�3�-���Qo����<`��eu�J������.cw���������R�i�-^_���:������@��'���� �L~�}��x�=%�r�x%���a:zL��[�����!0T�����C��f���V�L���,
�\�� ������|T�=7v7G�e�oz���s�����W�)�����(���{������Z�X�N)R>��x�@��UT�aV����M�$5�0T6��Lh-}�(`��%<Rf��@�&dYnf��r����"�������#�eY6!{nD�2#:!C�
'�� ������U�Bv��Ep��T9b��Xp_��ws,��3�8b�D�)$V�o��V!Bsg�������Z����b���k"V\�j�0=gy�j"��h�}�%�\�LU�!��h�� �Qs���dEv��!.��rK��\��)k�
9b#��*�x�[�.�p2����B�H��9s4�-�Oq�&����2"r��h����X���=C����b\}	�[��d%	���h���N��C�[r�#VR���!n6������Be�t�_������*;�\^V/��wIM������u;%�(;Q4�����}h+)���
�#)T�"����p�gF�Q�I+�g�rqR<�@~����Z3erGm��k��S�=5Q�F0���F��7���N�qx<���2
���/o��|���������YWG �6Y��={OS�������98'��<p��wQ�����������p{s>�2=�(>V�����v}�%���}��-_����[3�����i�xp���7�lZ�#�qU ��l1��b������vj�X]�������D�0�/����K�_�*l����T��t�X}�+&�]�<������jb�������������()3!���[�,���it�jCxq������P�yg^�MY}�U�����4��g�����T<���s�S���!{d���{^����'�U�����h�V�zE�yv�_PK�]!2��=PKR�KPA_test/part_pa/7_4.outUX���Y���Y��[[o�F~���7��z0��]�8�i����:[,�)Pe�,K�Dm���=Cr(�7K���A�P<�}�>C��9~�������7�G},cgW����?�S�.F�ez�P
�%���x��-f�.���:�O�=^���(]
',?'��".5p^^�[&���>/���o�n<N��'��b���<<,��a�t<_9���1!4� �r�������������~�����)���$���p�^�I���� M~K���4���g!�O>/��0��&��t�<'���*�$c����e���w��1Y����s��j.E� T��!L��\:��6���|�{�xJK�q2�N����������uJX�G�hL�t��>[fk��v+:���zki����d�&�X�Om�O�E��\�j�}�.�������|*�/���L�?��0������x���p��<�$�� ����B�����#�a6�F��Io
�A#��v��4�\ZE��kWx<�A���G&�_gSZ�E��|��|a�,\M���/������o���&����������#�7�o�1����(k}D�4�RsT�iG��A�7z+�5���4[�'�|���2�~V~��3
�4�L�	�����}�"X���%�� h����'�j(�]@�IcA�����&�S;c�"����i4����Ua���UJrQxo�G�'4�f(�������3:l3�O�d�)z�������W�y��W��]��M>�\^�r�������;��M���_J�4B�����&`yL6<1C��C�c�v�	��(�-K�~���@-����b���H����2�'2�T\Y�A��'C8��O�j�A���Hh�/Z�����)a4�8R{�2t�2��1f��CF���u����rJ	y�d�4��ad�7$|�eHi]�e}���R9q7V�p�$�V�24�[��2L��l��tSV�y�6d�h��m�pM7�T�D��J�>2�l��R��k7x�d�q�n�7�p��2��}��	�����u���O�4����$���y�5E��N�>����y�u6��P<i2�y���fN�Z��Pao��1Rt1����W���'�����s�R�k_6q������@�����@������
�'�Y�k�9�A����}d��W�)��d��[;q2��7���+,-#�|On�i�������� ���w�7�*dh���72(��e�S����h�>�e4p-5%I�M9Z�K}B��kXw�d'N��4m�^�s�6�nqN�y��O=�4v�-�l+-n�	k�n�Z^���:v��hn�����7//���i��k�Y���t9`
���G�f�����/�i9|H�3��o�������B���������F �{��!�}�����j�����j��,�(~��'T�6=�;��	�4��;��ix�D
�p�T���"_�Z���X��)�IJ��!���fB��=IfU��pK="26E���`�+�#�"+A�I����,�"{nD2#�E�eR���s�UC�P�UC�)�	(�Y[��&�n�,����{P��b����4E��b���E��e��h"�6��U��=�!C��t��n<���)r���z���PxyD��M�-��5��J�d~�i��D�wRs���e�3�
�b�&rK��\��.k����Jq��f~��\.�����3?�yI"M����\16l	�P�Q1��DnF?B��|��e��h"7���P������G�%��t98qs4��Euy�[P����vP�u�5jQ�TJe����J���/GX��2^��!���
��/H��k=�
J�6E����R(r7������|Tk�JZ=�\��'����ue�O��2��6��vD�XmOZ!�H�\�W������UU'B����B�����8u_B��T�������k��:��2M��{ZJ���6�dYa�%���$�������7w��������������q��G��2�������K��;V���-?������F�����������~�M�{�<����E����!U k��?�������j��G��G ���~x���W��oAU���&U��t�X��V*&r/��2���h5��cm��P�}�����Q�p�YzY���y�m���4Q!CJ�]l��+1;dt��>���/����~��:XO(�U��k��+���Cv��Qo���{�[2Ze�����H�R��^@�7vyv�?PK�o����=PKY�KPA_test/part_pa/9_1.outUX��Y��Y��ZIo�F��W���
4��.��]� �=�4���BQM�C~{�p���dYfQWs��������B�1���^�}8;}���A���:@g��I���(�g'�i����`\}W����	E_&���D�#t��E4E����P*��[�0nT��I2������O����t����i���^^��2��J����?F�z)x[
�%E�������1��l��8����#:�<�(����(}v|���\H��K��P�\���#�2���f�eW�"��B����������+��w.���1)A[��Ys�-3
f�h�Ot`.�Iz
0��i�n|���;��E<����C�M@���d9(����
�����E��T��4��4'[���O����'i����^�@�.`������9�����Q�\�!�@�^��\+�;��>/&��yN�;w�� �`�
v���7<�?G�+�k2�+��0�LYE0�YG0���^���_�bTik�F�+��e���b6�N\��������/On���{����s���6)��YB�ib���c43��W���x�| Lim,&b3�48>6L�a����[/$x-���/�&.����tF��p�w!d#j�}K�������f�ReJQ��`��-�0�3,��������"H�j[��t�I�n��1���P�3�����#t��Z��o��gW��M��dM�n�Ng3�+~�N�1��
��J��nUn0�D/���k�����F�!�r����|�(�Q���H�� �T�D�����4%���+��@0^Y?<
��V���GA����XYC�#M�B+aE�;t�1#
	��PZ1�BZ�P]����
+��lP(m��0�A�SFG-�`[�uA �F��>��2P�	����a-J�~���p���
�k�R;(
�P�n��C%��n� 
[�P�����0aS�,J���W�J	F���lX]���X5�-���C�s����7�M��C��$D!86-]h��k=
���.��
��#w[�_L�8�� CKM8��]��!+�n���U�X����nhM�1i��kQ�[0�F
zP(oQR��L���aQ��[��2���X�E��B
%�H���(��m����w�P}�Mr�����t��4_�0w[��i����$�i��V5
kP��W�D��t���>xiC���uw\�#���a[-�b	}��U��"��!�a�������NE�����b1�v��Qp���I����������4�.�12�Be�?����2�f��"�I��� gB�I����0/�/��}-$^�'�e.�E>@(4���'��tV�@VZaU-WRR�&�b������������y3�f.���Y>�=*����w�������PNn����
������n�g���-V��Z����6Y����L�C�����;��5�������U�,�^���1��Z�]�_�F������v��S/����Ls*�4�V7>��w=mg�)_Et*���S^X���By�{^!/b���!/r�!*�E�y��'�E�y !/f�!6���y!���G��_^�z�nU����Tx	�]�U���x	�]�U���x	�]��z�����t�z���KX������&/�il�?�����X5���z��.�l��X0.�)f�S����y	�
cQ����S�f|O���S���~� ���.�F-^�����������TX3�n���W7Z��,.Q*��
hA���)�6�PKl�}��1PKU�KPA_test/part_pa/9_2.outUX���Y���Y��ZYo�F~���� 6�,�>��s�p�4FP��`��-X&�j�<��w���%�2������}3����=����7��B�N?<�c���'�����$C�p����4g\`a0��Ci�u~B���8�:Q�F�lMQ6�q'�
���J��
Z]6M��m/~F��$�����K�����2u�Q�V� ,���ek���e�4]VK�x���=Fq�M��qt�~B�c��b����[����3�-�I:v������|��z���y�����*Yd�U�{���3=�wQv�R����=�5&%h+�0+�����������$���4�c7>Fl�g�"]u���|�&`�JO���A����q��1Vc#+P�b��`%*�����L�y�f�IX�%��"A2���$K}&V�,�r
���2����Wk-y��B��/���z�����$)�'��^����'��h~�~K&q��Fs�)����u3M���KO/��p���U�Z������*���|1�M'.�����kw�N���;7����8wct��A���C�`m�s��Xa+m��Yk1�d[���G��Oim,&b3�\�
ZQl)y�y���X��HIa}yp���H0c
�3���0�1�;��AyiPDYR�I�2���w�J�1�?���-k2�F��l-�v�N'��d�&;�>��������y���~x��/�'���{W
��2�*�N���f.W�� \��*���HG���`v�^D��c�9�v��/�|�(�Q���H(8��j���2(���h&����5��������(�U���(l�{CK�
�������4($l�FAa8Rd�b���%`�PXI`X�Q(O�P�B^�TZ��(�������Jy�R����a#J�y��R8��A��A!u��abX�J��Q(�n��������-(���E�z;
�QtX+�d�/|^h����&�(�6D��&�%����(���uv-��(@(�Yc�m�|!�^����	[���z^5���
�AM������A������$*�n����^�X�W/��
\��/���Q����&��P������u^Pn�E�n
	�LKG�����(dw5��:
��"���=��n0�/�IQ�i�k���A�
k��z�[#-����R��FG��sPE�k7&UJT�d�-d�_9��bT�g��9*�`�����r^,a&y5�P�����?�+W�+���
�/�k����p�����(]9���~�����#c(����)SmgV�����'i^��f��5g;$���k<v�
��I|�}�7J��P���Y��7��,�+����0�X������l}��t.����4s�������Q��E?~���.Sw����>����	"�7F�����o�^����'�-o���K��L�7y�M~�����[��/�xq�jYu����]n3����*l�j�Zx�T�] ��K���=��
2����/����L��o�W�����������C^��BD����s�����������'�E�y !/f�!6���y!���[����>^B�K�������]���}j��z�n�w�/���[����K�w�Vz����]������������:]����U-�]��[�+������
���?���	�������/�W�l�of�K��'L���)��m��'����B�s��t �?��9�O>FRa�-���7��h�?�8D�����_'�PK�.$��1PK[�KPA_test/part_pa/9_3.outUX��Y��Y��Z�n�F}�W�Kh{�p����1��O+-l�2�HT�!��Y���(��Y��������sfvf�B�1���^�}8=y���A���8@��u�!4���X0�X��c��<��8��K<�.��:D�h�-�)��kwL)�jl���>���6M�|m�B�M�D���Co��rvrq1wQ��� ,���1�6[�[V(�eE1������J�,N��'����&`��,�g�g7.�?;:����H�7_�Bqz���[�{�a�E�3�/�L�Yn��&���L��m�]�9z��=�5&%h+�0k�0g����b���\�?���@�Q���b��8�������P�5k�d9(���c�Y�MB���F,u�N�O����P�{���r����$��B��]$��(��B��f5�(Fn�l�W��^n������M����j����]�s�O0���z�������%�5���a-��SV�1���#++=��<b�%�Q����-Fn��4d��r6��n���W���//n���{��M�)s���v)��YA�i'ba�GL
�����#��>��}Jkc1�����X[� �a�m6RbM#%������N#��"��9vF6�v���,O
�(Kj6)U�4U��TZ+�@o!5�	�����0�dk1��?����3w���s	���e������z�N��B�S�������%w[f![E�������%��_��`������[����q�z��za%^�5f�������gt6��&�M�m$TS0�2Yp�x����<pa ��- Ph����� p����2^��J�AQ�-�?H�B�I����&B��Bu���B(,Td�A���EA=
(�����D��=J�Z���h���A!`�TJ*��h� dX�Ra\@��*@� u���Z(�]-R��[p�!Q�����(8T2=($�%!��&����0(L�GI��Z%��������5J6E��Z���X5�
����ytCAA�����P%!
��ii�M���qM�ha�7$���ma�b-�J_tC���Z���-`��
s����*��
���|������ �zTG��s�h����������:.(���s��e��V��<*��,�i�(�>(�0w[���&������5���jF�Z("�4����n���O�S���OY��\��1(����I�6�t���v������	��w?!����X�>L�j��"�
[��}�a9S�|�l�|�_�lq���Z3�/�l|������>-�w���D����L���6����#i�
��C��1�;�� ���K2q_���8���>�%��zH�XbSuA�H���������jbaWl oao���m�\��1o�i����2�;��E�}����p1w7?���<�#p�>�'��&���������@��Z+Y]h��r5W�3�5y�&��RGI������T\�ZV���;uFW�L�k-������Hu��
����4]�Z��Ri|�6>U��#mg���":����)/,���y!����=/�������!/r�!*�E�y!D���=/$���y!����=/��xi�L�����.�W�k^�S�%�w�C��'�KX��{��O�_�z���]��y�����r�z��G�nl�?l�����X�������.��{1�`\m�S��6q������Mu�Q�{�.u��'L���)��m���_BW�a+��x�"D����>��GH*� �_/���n���Y|E)��@�Y������PK���Y��1PKZ�KPA_test/part_pa/9_4.outUX��Y��Y��ZIo�F��W���
4��.��]� �=	�4���"QM�C~{��R�P�m�,��s�}����cn�z��O�������<�:_�s���<?Ls��p��e_�'}������a2����kwB)�\h����9�a�$���?:������q�L��z;�������Hr�Va���?�l�|Y
�&E��G���c�&�8Kq:H�k�#:�<�i2����d���8w_s���Fn6Bq6(N��Gpd������/�E^H��e�;Zz���6�/��s����
�]����[�`V�Z'Z0����
`��$M���uO�%�tx�����(�4k5�
P��
�����%�%X�R��Z���"Y��G��=���r����$,�
	R�� ���$PK1e�
���l ��
Q�`�z}�r�$�,k����M��u6�SD��]����s2�D�f�40����;�`���D0SJ�	�������\�(��Uw���
�^f) ;�/������|PR~�n��2���;�_��=p�F����XaY� 1+�9mE,i��
���P�w��r/���X��I>���e0��o��C�,���W���B2$jE4�&#�.�lDm�o�Y��G 66)U�U�XTi�P>5��C��2n�5���d-�e�?��S7��u;&�	���0���t��������`����.���Y�V�d�&�t:u�(�t�(S�m����
fW��U�0�8�w�l����3:&)�R�L�m$TSRf%e���`���1x��X�����E(�Uu��Q8����I�4dWY��J�^Q�]h�H�B�I7
�+i��u��
��C!"VlJ�N��2]E*-�`=[��u�Nt�Ft
Y���_������~-J�~�*T8��A��@y+	��d��[j{E�cJ`���C%���J	hX�B�:�����X��R.Y�����e|�*�e|]�'
�P��QH(3:P@~�����M��}��$F!8�c�.���$i)d�&�Z��zE���-�/��\��n(��At��G��s���5�REbiW���������-tA������
t$�.��G�������_Pn�E�nm�1Ku��(��B�_��H�z�(��m����w�P]��ks��
�[�B�k��q���/�R�5�+��kZK1gM�`��j��$�Dh�I�`Z�]D��;����h�0?\V}��>L�r��{c�`����Z�H��,|�X�\>?FR&�0�I>�t�R�~�O���#h��f�3��2����L�V�=I�����o�LHQ;!���K:r_K��qzQ=(�FzH���0	#���R��'�������l��R�j�1�0�x�k��'���������sZ��;z������PMn�x�G�}K����_7�;��m��8��d���MVhs�;�io+����:Jn��tyK����U�,�^���1��Z+0f��#����u�vH[��L�k�4�"O�u�SU���i;�M�)�U9��Nya1/l�!<���y!D���=/�����=/����������������b��J"^�G��_^�z�>���%/�����t�z������A��S������z�O�c^��w�����.}P������il�?�����X��=b\������0��)f�S����d�G�
���������&�F���}l�j�OBW��N�w��wa��n����cdp.�����������-��q�%&Z����$����PK<����1PK
�FK@�APA_test/UX���Y���YPK�FK=�z�� @��6PA_test/.DS_StoreUX���Y���YPK
�FK	@�A__MACOSX/UX���Y���YPK
�FK@�AM__MACOSX/PA_test/UX���Y���YPK�FK���4x@���__MACOSX/PA_test/._.DS_StoreUX���Y���YPK
�FK
@�APA_test/head/UX���Y���YPK18�J��y���@��UPA_test/head/10_1.outUX���Yv�vYPK=8�J7��X��@���PA_test/head/10_2.outUX���Y��vYPKF8�J1�%�3��@��")PA_test/head/10_3.outUX���Y��vYPKB8�J��v9��@���:PA_test/head/10_4.outUX���Y��vYPK-8�J-C��o7@��4LPA_test/head/12_1.outUX���Ym�vYPKA8�J7�Y^�o7@��SPA_test/head/12_2.outUX���Y��vYPK-8�J�:d[�o7@���YPA_test/head/12_3.outUX���Yn�vYPK%8�J�����o7@���`PA_test/head/12_4.outUX���Y]�vYPK48�J
B�Y�6@���gPA_test/head/14_1.outUX���Y|�vYPK28�J�By�6@��8oPA_test/head/14_2.outUX���Yx�vYPK68�J76!���6@���vPA_test/head/14_3.outUX���Y�vYPK,8�J�
�6@���}PA_test/head/14_4.outUX���Yl�vYPKS8�J�1w8?	�:@��;�PA_test/head/18_1.outUX���Y��vYPKH8�JXA�X	�:@����PA_test/head/18_2.outUX���Y��vYPK@8�J��{�O	�:@��x�PA_test/head/18_3.outUX���Y��vYPKC8�J\��V	�:@���PA_test/head/18_4.outUX���Y��vYPK08�J���5@����PA_test/head/1_1.outUX���Ys�vYPKJ8�JD���8@��J�PA_test/head/1_2.outUX���Y��vYPK38�J���3@����PA_test/head/1_3.outUX���Yz�vYPKG8�J��&�5@��Y�PA_test/head/1_4.outUX���Y��vYPK.8�J�q�%P^g@����PA_test/head/21_1.outUX���Yo�vYPK88�J	3�8OWg@����PA_test/head/21_2.outUX���Y��vYPKP8�Jh��gMWg@��%�PA_test/head/21_3.outUX���Y��vYPK)8�J�^��OWg@����PA_test/head/21_4.outUX���Ye�vYPK78�J����uA@��g�PA_test/head/3_1.outUX���Y��vYPKR8�J��l�uA@��|�PA_test/head/3_2.outUX���Y��vYPK28�J.0(�uA@����PA_test/head/3_3.outUX���Yw�vYPK;8�JE���uA@���PA_test/head/3_4.outUX���Y��vYPK68�JfE�}+<@���PA_test/head/4_1.outUX���Y�vYPK(8�J`����+<@���PA_test/head/4_2.outUX���Yd�vYPK.8�J�R$<�+<@�� PA_test/head/4_3.outUX���Yp�vYPK%8�J�_��+<@��R(PA_test/head/4_4.outUX���Y^�vYPK&8�J����U	�B@��'0PA_test/head/5_1.outUX���Y_�vYPK'8�J}�Pg	�B@���9PA_test/head/5_2.outUX���Yb�vYPKB8�JG��Wa	�B@���CPA_test/head/5_3.outUX���Y��vYPK*8�JX-�{V	�B@��:MPA_test/head/5_4.outUX���Yh�vYPK$8�J����>@���VPA_test/head/6_1.outUX���Y\�vYPK#8�JR���>@��E^PA_test/head/6_2.outUX���YY�vYPK58�J��>@���ePA_test/head/6_3.outUX���Y}�vYPKK8�J����	>@��mPA_test/head/6_4.outUX���Y��vYPK"8�J0B��rT@��atPA_test/head/7_1.outUX���YW�vYPK08�J�sT@���PA_test/head/7_2.outUX���Ys�vYPK,8�J����
sT@��-�PA_test/head/7_3.outUX���Yk�vYPK,8�J�����
tT@����PA_test/head/7_4.outUX���Yl�vYPKF8�JIs[�+@����PA_test/head/9_1.outUX���Y��vYPK38�J���+@����PA_test/head/9_2.outUX���Yy�vYPKS8�J��7��+@����PA_test/head/9_3.outUX���Y��vYPKM8�J�sP"�+@����PA_test/head/9_4.outUX���Y��vYPK
�FK@�A��PA_test/part_pa/UX���Y^��YPK�FKj�m�@����PA_test/part_pa/.DS_StoreUX^��Y^��YPK
�FK@�A��__MACOSX/PA_test/part_pa/UX���Y���YPK�FK���4x$@��>�__MACOSX/PA_test/part_pa/._.DS_StoreUX^��Y^��YPKT�K�b���,�@����PA_test/part_pa/10_1.outUX���Y���YPKW�KF��N�/�@����PA_test/part_pa/10_2.outUX���Y���YPKY�KC����2�@���PA_test/part_pa/10_3.outUX%��Y��YPKX�K��8�.�@��9PA_test/part_pa/10_4.outUX���Y���YPKS�K���
�T@��iPA_test/part_pa/12_1.outUX���Y���YPKX�Ku*"��
�T@��J#PA_test/part_pa/12_2.outUX���Y���YPKS�KW���
�T@��P.PA_test/part_pa/12_3.outUXc��Y���YPKP�K����
�T@��O9PA_test/part_pa/12_4.outUX���Y���YPKU�K�h���4@��EDPA_test/part_pa/14_1.outUX���Y���YPKU�K�� �4@��yKPA_test/part_pa/14_2.outUX���Y���YPKV�Kh��<� 4@���RPA_test/part_pa/14_3.outUX���Y���YPKS�K}�3N�4@���YPA_test/part_pa/14_4.outUX���Y���YPK[�KTuO%	*4@���`PA_test/part_pa/18_1.outUX��Y��YPKY�KAV,	*4@��kjPA_test/part_pa/18_2.outUX��Y��YPKW�K`�L	*4@���sPA_test/part_pa/18_3.outUX���Y���YPKX�K#��b	*4@��b}PA_test/part_pa/18_4.outUX��Y��YPKT�K���^I@����PA_test/part_pa/1_1.outUX���Y���YPKZ�K05�<P@��s�PA_test/part_pa/1_2.outUX��Y��YPKU�K��1�L@���PA_test/part_pa/1_3.outUX���Y���YPKY�K0�Q@����PA_test/part_pa/1_4.outUX��Y��YPKS�K���<�
��@��_�PA_test/part_pa/21_1.outUX���Y���YPKV�KrCg��
��@��l�PA_test/part_pa/21_2.outUX���Y���YPKZ�Kg���
��@��l�PA_test/part_pa/21_3.outUX��Y��YPKR�K�(��
��@��c�PA_test/part_pa/21_4.outUX���Y���YPKV�K�����>@��i�PA_test/part_pa/3_1.outUX���Y���YPK[�KXf_���>@��W�PA_test/part_pa/3_2.outUX��Y��YPKT�K��B���>@��I�PA_test/part_pa/3_3.outUX���Y���YPKW�K������>@��5�PA_test/part_pa/3_4.outUX���Y���YPKV�K�P�kR
�Z@���PA_test/part_pa/4_1.outUX���Y���YPKQ�KN�G	@
�Z@����PA_test/part_pa/4_2.outUX���Y���YPKT�K��tI'
�Z@��Y�PA_test/part_pa/4_3.outUX��Y���YPKQ�K��>2=
�Z@���PA_test/part_pa/4_4.outUX���Y���YPKQ�Kgl���W\@��gPA_test/part_pa/5_1.outUX���Y���YPKQ�K�C��W\@��[PA_test/part_pa/5_2.outUX���Y���YPKX�K|����W\@��U+PA_test/part_pa/5_3.outUXz��Y��YPKR�K���g�W\@��H7PA_test/part_pa/5_4.outUX���Y���YPKP�K(�����;@��;CPA_test/part_pa/6_1.outUX���Y���YPKP�K9���;@��+JPA_test/part_pa/6_2.outUX���Y���YPKV�Kk1-���;@��"QPA_test/part_pa/6_3.outUX���Y���YPKZ�K�lF��;@��XPA_test/part_pa/6_4.outUX��Y��YPKP�KGV����=@��_PA_test/part_pa/7_1.outUX���Y���YPKT�K��.\��=@��ShPA_test/part_pa/7_2.outUX���Y���YPKR�K�]!2��=@���qPA_test/part_pa/7_3.outUX���Y���YPKR�K�o����=@���zPA_test/part_pa/7_4.outUX���Y���YPKY�Kl�}��1@����PA_test/part_pa/9_1.outUX��Y��YPKU�K�.$��1@��2�PA_test/part_pa/9_2.outUX���Y���YPK[�K���Y��1@��x�PA_test/part_pa/9_3.outUX��Y��YPKZ�K<����1@����PA_test/part_pa/9_4.outUX��Y��YPKjj� 
�

#85

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Rafia Sabih (#84)

Re: Parallel Append implementation

Thanks for the benchmarking results!

On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41

12 seconds instead of 244? Whoa. I find it curious that we picked a
Parallel Append with a bunch of non-partial plans when we could've
just as easily picked partial plans, or so it seems to me. To put
that another way, why did we end up with a bunch of Bitmap Heap Scans
here instead of Parallel Bitmap Heap Scans?

Q7 | 134 | 88 | PA only
Q18 | 508 | 489 | PA only

What's interesting in these results is that the join order changes
quite a lot when PA is in the mix, and I don't really see why that
should happen. I haven't thought about how we're doing the PA costing
in a while, so that might just be my ignorance. But I think we need
to try to separate the effect of the plan changes from the
execution-time effect of PA itself, so that we can (1) be sure that
the plan changes are legitimate and justifiable rather than the result
of some bug and (2) make sure that replacing an Append with a Parallel
Append with no other changes to the plan produces an execution-time
benefit as we're hoping.

Q21 | 649 | 163 | PA only

This is a particularly interesting case because in both the patched
and unpatched plans, the driving scan is on the lineitem table and in
both cases a Parallel Seq Scan is used. The join order is more
similar than in some of the other pans, but not the same: in the
unpatched case, it's l1-(nation-supplier)-l2-orders-l3; in the patched
case, it's l1-(nation-supplier)-l3-l2-orders. The Parallel Append
node actually runs slower than the plan Append node (42.4 s vs. 39.0
s) but that plan ends up being faster overall. I suspect that's
partly because the patched plan pulls 265680 rows through the Gather
node while the unpatched plan pulls 2888728 rows through the Gather
node, more than 10x more. That's a very strange choice for the
planner to make, seemingly, and what's even stranger is that if it did
ALL of the joins below the Gather node it would only need to pull
78214 rows through the Gather node; why not do that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#86

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Robert Haas (#85)

Re: Parallel Append implementation

On 16 August 2017 at 18:34, Robert Haas <robertmhaas@gmail.com> wrote:

Thanks for the benchmarking results!

On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41

12 seconds instead of 244? Whoa. I find it curious that we picked a
Parallel Append with a bunch of non-partial plans when we could've
just as easily picked partial plans, or so it seems to me. To put
that another way, why did we end up with a bunch of Bitmap Heap Scans
here instead of Parallel Bitmap Heap Scans?

Q7 | 134 | 88 | PA only
Q18 | 508 | 489 | PA only

What's interesting in these results is that the join order changes
quite a lot when PA is in the mix, and I don't really see why that
should happen.

Yes, it seems hard to determine why exactly the join order changes.
Parallel Append is expected to give the benefit especially if there
are no partial subplans. But for all of the cases here, partial
subplans seem possible, and so even on HEAD it executed Partial
Append. So between a Parallel Append having partial subplans and a
Partial Append having partial subplans , the cost difference would not
be significant. Even if we assume that Parallel Append was chosen
because its cost turned out to be a bit cheaper, the actual
performance gain seems quite large as compared to the expected cost
difference. So it might be even possible that the performance gain
might be due to some other reasons. I will investigate this, and the
other queries.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#86)

1 attachment(s)

Re: Parallel Append implementation

Hi Rafia,

On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

But for all of the cases here, partial
subplans seem possible, and so even on HEAD it executed Partial
Append. So between a Parallel Append having partial subplans and a
Partial Append having partial subplans , the cost difference would not
be significant. Even if we assume that Parallel Append was chosen
because its cost turned out to be a bit cheaper, the actual
performance gain seems quite large as compared to the expected cost
difference. So it might be even possible that the performance gain
might be due to some other reasons. I will investigate this, and the
other queries.

I ran all the queries that were showing performance benefits in your
run. But for me, the ParallelAppend benefits are shown only for plans
that use Partition-Wise-Join.

For all the queries that use only PA plans but not PWJ plans, I got
the exact same plan for HEAD as for PA+PWJ patch, except that for the
later, the Append is a ParallelAppend. Whereas, for you, the plans
have join-order changed.

Regarding actual costs; consequtively, for me the actual-cost are more
or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
quite different costs naturally because the plans themselves are
different on head versus PA+PWJ.

My PA+PWJ plan outputs (and actual costs) match exactly what you get
with PA+PWJ patch. But like I said, I get the same join order and same
plans (and actual costs) for HEAD as well (except
ParallelAppend=>Append).

May be, if you have the latest HEAD code with your setup, you can
yourself check some of the queries again to see if they are still
seeing higher costs as compared to PA ? I suspect that some changes in
latest code might be causing this discrepancy; because when I tested
some of the explains with a HEAD-branch server running with your
database, I got results matching PA figures.

Attached is my explain-analyze outputs.

On 16 August 2017 at 18:34, Robert Haas <robertmhaas@gmail.com> wrote:

Thanks for the benchmarking results!

On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41

12 seconds instead of 244? Whoa. I find it curious that we picked a
Parallel Append with a bunch of non-partial plans when we could've
just as easily picked partial plans, or so it seems to me. To put
that another way, why did we end up with a bunch of Bitmap Heap Scans
here instead of Parallel Bitmap Heap Scans?

Actually, the cost difference would be quite low for Parallel Append
with partial plans and Parallel Append with non-partial plans with 2
workers. But yes, I should take a look at why it is consistently
taking non-partial Bitmap Heap Scan.

----

Q6 | 29 | 12 | PA only

This one needs to be analysed, because here, the plan cost is the
same, but actual cost for PA is almost half the cost for HEAD. This is
the same observation for my run also.

Thanks
-Amit

#88

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#87)

1 attachment(s)

Re: Parallel Append implementation

The last updated patch needs a rebase. Attached is the rebased version.

Thanks
-Amit Khandekar

Attachments:

ParallelAppend_v13_rebased_3.patchapplication/octet-stream; name=ParallelAppend_v13_rebased_3.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255e..2ad1397 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3679,6 +3679,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5575c2c..b2fa245 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="60"><literal>LWLock</></entry>
+        <entry morerows="61"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1109,6 +1109,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c713b85..df0acdb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -908,6 +919,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..11f9688 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,46 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +230,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +249,14 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +285,31 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 * This is Parallel-aware append. Follow it's own logic of choosing
+			 * the next subplan.
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -279,6 +347,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
 	int			i;
+	ParallelAppendDesc padesc = node->as_padesc;
 
 	for (i = 0; i < node->as_nplans; i++)
 	{
@@ -298,6 +367,276 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	if (padesc)
+	{
+		padesc->pa_first_plan = padesc->pa_next_plan = 0;
+		memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared next_subplan counter index to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this counter to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader. The backend should scan
+ *		backwards from the last plan. This is to prevent it from taking up
+ *		the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f9ddf4e..1b8a7d1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9ee3e23..b1d8bc7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -394,6 +394,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 67b9e19..a25213f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1594,6 +1594,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d8..5f58fe1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1281,7 +1284,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1317,7 +1324,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1395,23 +1460,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1476,7 +1557,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1694,6 +1776,78 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		return list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			return list_concat(partial_subpaths, list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+			return partial_subpaths;
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			return lappend(partial_subpaths, subpath);
+		else
+		{
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+			return partial_subpaths;
+		}
+	}
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1713,7 +1867,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..79c08aa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee2350..0eee647 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 2821662..055ac64 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1049,7 +1050,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5274,7 +5276,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5285,6 +5287,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9662302..d9fab6b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3603,8 +3603,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e73c819..4af5740 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -578,8 +578,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -690,7 +690,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..846f33e 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1265,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,43 +1279,50 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5..f2770fa 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 246fea8..0782aa3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -910,6 +910,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index df5d2f3..0a079b2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -297,6 +297,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..7d9e881 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,14 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6cf128a..2642502 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -995,12 +996,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331..1678497 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a39e59d..bdf1152 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1167,10 +1167,14 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
  * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath
 {
@@ -1178,6 +1182,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 63feba0..8e66cf0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e372f88..d578a5d 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132..35adf12 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 1fa9650..7a5b3c7 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 888da5a..daf00c2 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index c96580c..60ac387 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index cefb5a2..5aafc4d 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#89

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#87)

Re: Parallel Append implementation

On 30 August 2017 at 17:32, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 16 August 2017 at 18:34, Robert Haas <robertmhaas@gmail.com> wrote:

Thanks for the benchmarking results!

On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41

12 seconds instead of 244? Whoa. I find it curious that we picked a
Parallel Append with a bunch of non-partial plans when we could've
just as easily picked partial plans, or so it seems to me. To put
that another way, why did we end up with a bunch of Bitmap Heap Scans
here instead of Parallel Bitmap Heap Scans?

Actually, the cost difference would be quite low for Parallel Append
with partial plans and Parallel Append with non-partial plans with 2
workers. But yes, I should take a look at why it is consistently
taking non-partial Bitmap Heap Scan.

Here, I checked that Partial Bitmap Heap Scan Path is not getting
created in the first place; but I think it should.

As you can see from the below plan snippet, the inner path of the join
is a parameterized Index Scan :

-> Parallel Append
-> Nested Loop Semi Join
-> Bitmap Heap Scan on orders_004
Recheck Cond: ((o_orderdate >= '1994-01-01'::date) AND
(o_orderdate < '1994-04-01 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on idx_orders_orderdate_004
Index Cond: ((o_orderdate >= '1994-01-01'::date) AND
(o_orderdate < '1994-04-01 00:00:00'::timestamp without time zone))
-> Index Scan using idx_lineitem_orderkey_004 on lineitem_004
Index Cond: (l_orderkey = orders_004.o_orderkey)
Filter: (l_commitdate < l_receiptdate)

In the index condition of the inner IndexScan path, it is referencing
partition order_004 which is used by the outer path. So this should
satisfy the partial join path restriction concerning parameterized
inner path : "inner path should not refer to relations *outside* the
join path". Here, it is referring to relations *inside* the join path.
But still this join path gets rejected by try_partial_nestloop_path(),
here :

if (inner_path->param_info != NULL)
{
Relids inner_paramrels = inner_path->param_info->ppi_req_outer;
if (!bms_is_subset(inner_paramrels, outer_path->parent->relids))
return;
}

Actually, bms_is_subset() above should return true, because
inner_paramrels and outer_path relids should have orders_004. But
that's not happening. inner_paramrels is referring to orders, not
orders_004. And hence bms_is_subset() returns false (thereby rejecting
the partial nestloop path). I suspect this is because the innerpath is
not getting reparameterized so as to refer to child relations. In the
PWJ patch, I saw that reparameterize_path_by_child() is called by
try_nestloop_path(), but not by try_partial_nestloop_path().

Now, for Parallel Append, if this partial nestloop subpath gets
created, it may or may not get chosen, depending upon the number of
workers. For e.g. if the number of workers is 6, and ParalleAppend+PWJ
runs with only 2 partitions, then partial nestedloop join would
definitely win because we can put all 6 workers to work, whereas for
ParallelAppend with all non-partial nested loop join subpaths, at the
most only 2 workers could be allotted, one for each child. But if the
partitions are more, and available workers are less, then I think the
cost difference in case of partial versus non-partial join paths would
not be significant.

But here the issue is, partial nest loop subpaths don't get created in
the first place. Looking at the above analysis, this issue should be
worked by a different thread, not in this one.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#88)

Re: Parallel Append implementation

On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

The last updated patch needs a rebase. Attached is the rebased version.

Few comments on the first read of the patch:

1.
@@ -279,6 +347,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
  int i;
+ ParallelAppendDesc padesc = node->as_padesc;

  for (i = 0; i < node->as_nplans; i++)
  {
@@ -298,6 +367,276 @@ ExecReScanAppend(AppendState *node)
  if (subnode->chgParam == NULL)
  ExecReScan(subnode);
  }
+
+ if (padesc)
+ {
+ padesc->pa_first_plan = padesc->pa_next_plan = 0;
+ memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+ }
+

For rescan purpose, the parallel state should be reinitialized via
ExecParallelReInitializeDSM. We need to do that way mainly to avoid
cases where sometimes in rescan machinery we don't perform rescan of
all the nodes. See commit 41b0dd987d44089dc48e9c70024277e253b396b7.

2.
+ * shared next_subplan counter index to start looking for unfinished plan,

Here using "counter index" seems slightly confusing. I think using one
of those will be better.

3.
+/* ----------------------------------------------------------------
+ * exec_append_leader_next
+ *
+ * To be used only if it's a parallel leader. The backend should scan
+ * backwards from the last plan. This is to prevent it from taking up
+ * the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)

From above explanation, it is clear that you don't want backend to
pick an expensive plan for a leader, but the reason for this different
treatment is not clear.

4.
accumulate_partialappend_subpath()
{
..
+ /* Add partial subpaths, if any. */
+ return list_concat(partial_subpaths, apath_partial_paths);
..
+ return partial_subpaths;
..
+ if (is_partial)
+ return lappend(partial_subpaths, subpath);
..
}

In this function, instead of returning from multiple places
partial_subpaths list, you can just return it at the end and in all
other places just append to it if required. That way code will look
more clear and simpler.

5.
* is created to represent the case that a relation is provably empty.
+ *
*/
typedef struct AppendPath

Spurious line addition.

6.
if (!node->as_padesc)
{
/*
* This is Parallel-aware append. Follow it's own logic of choosing
* the next subplan.
*/
if (!exec_append_seq_next(node))

I think this is the case of non-parallel-aware appends, but the
comments are indicating the opposite.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#91

Rafia Sabih

rafia.sabih@enterprisedb.com

over 8 years ago

In reply to: Amit Khandekar (#87)

Re: Parallel Append implementation

On Wed, Aug 30, 2017 at 5:32 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Hi Rafia,

On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

But for all of the cases here, partial
subplans seem possible, and so even on HEAD it executed Partial
Append. So between a Parallel Append having partial subplans and a
Partial Append having partial subplans , the cost difference would not
be significant. Even if we assume that Parallel Append was chosen
because its cost turned out to be a bit cheaper, the actual
performance gain seems quite large as compared to the expected cost
difference. So it might be even possible that the performance gain
might be due to some other reasons. I will investigate this, and the
other queries.

I ran all the queries that were showing performance benefits in your
run. But for me, the ParallelAppend benefits are shown only for plans
that use Partition-Wise-Join.

For all the queries that use only PA plans but not PWJ plans, I got
the exact same plan for HEAD as for PA+PWJ patch, except that for the
later, the Append is a ParallelAppend. Whereas, for you, the plans
have join-order changed.

Regarding actual costs; consequtively, for me the actual-cost are more
or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
quite different costs naturally because the plans themselves are
different on head versus PA+PWJ.

My PA+PWJ plan outputs (and actual costs) match exactly what you get
with PA+PWJ patch. But like I said, I get the same join order and same
plans (and actual costs) for HEAD as well (except
ParallelAppend=>Append).

May be, if you have the latest HEAD code with your setup, you can
yourself check some of the queries again to see if they are still
seeing higher costs as compared to PA ? I suspect that some changes in
latest code might be causing this discrepancy; because when I tested
some of the explains with a HEAD-branch server running with your
database, I got results matching PA figures.

Attached is my explain-analyze outputs.

Strange. Please let me know what was the commit-id you were
experimenting on. I think we need to investigate this a little
further. Additionally I want to point that I also applied patch [1]/messages/by-id/CAEepm=3=NHHko3oOzpik+ggLy17AO+px3rGYrg3x_x05+Br9-A@mail.gmail.com,
which I forgot to mention before.

[1]: /messages/by-id/CAEepm=3=NHHko3oOzpik+ggLy17AO+px3rGYrg3x_x05+Br9-A@mail.gmail.com

On 16 August 2017 at 18:34, Robert Haas <robertmhaas@gmail.com> wrote:

Thanks for the benchmarking results!

On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41

12 seconds instead of 244? Whoa. I find it curious that we picked a
Parallel Append with a bunch of non-partial plans when we could've
just as easily picked partial plans, or so it seems to me. To put
that another way, why did we end up with a bunch of Bitmap Heap Scans
here instead of Parallel Bitmap Heap Scans?

Actually, the cost difference would be quite low for Parallel Append
with partial plans and Parallel Append with non-partial plans with 2
workers. But yes, I should take a look at why it is consistently
taking non-partial Bitmap Heap Scan.

----

Q6 | 29 | 12 | PA only

This one needs to be analysed, because here, the plan cost is the
same, but actual cost for PA is almost half the cost for HEAD. This is
the same observation for my run also.

Thanks
-Amit

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Rafia Sabih (#91)

Re: Parallel Append implementation

On 7 September 2017 at 13:40, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:

On Wed, Aug 30, 2017 at 5:32 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Hi Rafia,

On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

But for all of the cases here, partial
subplans seem possible, and so even on HEAD it executed Partial
Append. So between a Parallel Append having partial subplans and a
Partial Append having partial subplans , the cost difference would not
be significant. Even if we assume that Parallel Append was chosen
because its cost turned out to be a bit cheaper, the actual
performance gain seems quite large as compared to the expected cost
difference. So it might be even possible that the performance gain
might be due to some other reasons. I will investigate this, and the
other queries.

I ran all the queries that were showing performance benefits in your
run. But for me, the ParallelAppend benefits are shown only for plans
that use Partition-Wise-Join.

For all the queries that use only PA plans but not PWJ plans, I got
the exact same plan for HEAD as for PA+PWJ patch, except that for the
later, the Append is a ParallelAppend. Whereas, for you, the plans
have join-order changed.

Regarding actual costs; consequtively, for me the actual-cost are more
or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
quite different costs naturally because the plans themselves are
different on head versus PA+PWJ.

My PA+PWJ plan outputs (and actual costs) match exactly what you get
with PA+PWJ patch. But like I said, I get the same join order and same
plans (and actual costs) for HEAD as well (except
ParallelAppend=>Append).

May be, if you have the latest HEAD code with your setup, you can
yourself check some of the queries again to see if they are still
seeing higher costs as compared to PA ? I suspect that some changes in
latest code might be causing this discrepancy; because when I tested
some of the explains with a HEAD-branch server running with your
database, I got results matching PA figures.

Attached is my explain-analyze outputs.

Strange. Please let me know what was the commit-id you were
experimenting on. I think we need to investigate this a little
further.

Sure. I think the commit was b5c75fec. It was sometime in Aug 30 when
I ran the tests. But you may try on latest head.

Additionally I want to point that I also applied patch [1],
which I forgot to mention before.

Yes , I also had applied that patch over PA+PWJ.

[1] /messages/by-id/CAEepm=3=NHHko3oOzpik+ggLy17AO+px3rGYrg3x_x05+Br9-A@mail.gmail.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#90)

1 attachment(s)

Re: Parallel Append implementation

On 7 September 2017 at 11:05, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

The last updated patch needs a rebase. Attached is the rebased version.

Few comments on the first read of the patch:

Thanks !

1.
@@ -279,6 +347,7 @@ void
ExecReScanAppend(AppendState *node)
{
int i;
+ ParallelAppendDesc padesc = node->as_padesc;
for (i = 0; i < node->as_nplans; i++)
{
@@ -298,6 +367,276 @@ ExecReScanAppend(AppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+
+ if (padesc)
+ {
+ padesc->pa_first_plan = padesc->pa_next_plan = 0;
+ memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+ }
+
For rescan purpose, the parallel state should be reinitialized via
ExecParallelReInitializeDSM. We need to do that way mainly to avoid
cases where sometimes in rescan machinery we don't perform rescan of
all the nodes. See commit 41b0dd987d44089dc48e9c70024277e253b396b7.

Right. I didn't notice this while I rebased my patch over that commit.
Fixed it. Also added an exec_append_parallel_next() call in
ExecAppendReInitializeDSM(), otherwise the next ExecAppend() in leader
will get an uninitialized as_whichplan.

2.
+ * shared next_subplan counter index to start looking for unfinished plan,

Done.

Here using "counter index" seems slightly confusing. I think using one
of those will be better.

Re-worded it a bit. See whether that's what you wanted.

3.
+/* ----------------------------------------------------------------
+ * exec_append_leader_next
+ *
+ * To be used only if it's a parallel leader. The backend should scan
+ * backwards from the last plan. This is to prevent it from taking up
+ * the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)

From above explanation, it is clear that you don't want backend to
pick an expensive plan for a leader, but the reason for this different
treatment is not clear.

Explained it, saying that for more workers, a leader spends more work
in processing the worker tuples , and less work contributing to
parallel processing. So it should not take expensive plans, otherwise
it will affect the total time to finish Append plan.

4.
accumulate_partialappend_subpath()
{
..
+ /* Add partial subpaths, if any. */
+ return list_concat(partial_subpaths, apath_partial_paths);
..
+ return partial_subpaths;
..
+ if (is_partial)
+ return lappend(partial_subpaths, subpath);
..
}
In this function, instead of returning from multiple places
partial_subpaths list, you can just return it at the end and in all
other places just append to it if required. That way code will look
more clear and simpler.

Agreed. Did it that way.

5.
* is created to represent the case that a relation is provably empty.
+ *
*/
typedef struct AppendPath

Spurious line addition.

Removed.

6.
if (!node->as_padesc)
{
/*
* This is Parallel-aware append. Follow it's own logic of choosing
* the next subplan.
*/
if (!exec_append_seq_next(node))

I think this is the case of non-parallel-aware appends, but the
comments are indicating the opposite.

Yeah, this comment got left over there when the relevant code got
changed. Shifted that comment upwards where it is appropriate.

Attached is the updated patch v14.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v14.patchapplication/octet-stream; name=ParallelAppend_v14.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5f59a38..26f6f45 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3671,6 +3671,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 38bf636..1dc84aa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="60"><literal>LWLock</></entry>
+        <entry morerows="61"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1109,6 +1109,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 8737cc1..02b3eca 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -702,6 +713,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecBitmapHeapReInitializeDSM((BitmapHeapScanState *) planstate,
 											  pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_SortState:
 			/* even when not parallel-aware */
 			ExecSortReInitializeDSM((SortState *) planstate, pcxt);
@@ -972,6 +987,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..e57e1c0 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,46 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +230,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +249,14 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +285,31 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
+		 * Note: Parallel-aware Append follows different logic for choosing the
+		 * next subplan.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc = node->as_padesc;
+
+	padesc->pa_first_plan = padesc->pa_next_plan = 0;
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared field 'pa_next_plan' to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this field to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader.
+ *		With more workers, the leader is known to do more work servicing the
+ *		worker tuple queue, and less work contributing to parallel processing.
+ *		Hence, it should not take expensive plans, otherwise it will affect the
+ *		total time to finish Append. Since we have non-partial plans sorted in
+ *		descending cost, let the leader scan backwards from the last plan, i.e.
+ *		the cheapest plan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9bae264..2404d97 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9ee3e23..b1d8bc7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -394,6 +394,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 67b9e19..a25213f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1594,6 +1594,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d8..93f6228 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,6 +101,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1281,7 +1284,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1317,7 +1324,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1395,23 +1460,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1476,7 +1557,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1694,6 +1776,77 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		partial_subpaths = list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/*
+		 * If at all MergeAppend is partial, all its child plans have to be
+		 * partial : we don't currently support a mix of partial and
+		 * non-partial MergeAppend subpaths.
+		 */
+		if (is_partial)
+			partial_subpaths = list_concat(partial_subpaths,
+										   list_copy(mpath->subpaths));
+		else
+		{
+			/*
+			 * Since MergePath itself is non-partial, treat all its subpaths
+			 * non-partial.
+			 */
+			*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+											   list_copy(mpath->subpaths));
+		}
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			partial_subpaths = lappend(partial_subpaths, subpath);
+		else
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+	}
+
+	return partial_subpaths;
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1713,7 +1866,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..79c08aa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee2350..0eee647 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 2821662..055ac64 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1049,7 +1050,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5274,7 +5276,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5285,6 +5287,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6b79b3a..9edde27 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3603,8 +3603,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index ccf2145..6637459 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -592,8 +592,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -704,7 +704,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..846f33e 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1265,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,43 +1279,50 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5..f2770fa 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 969e80f..16184c7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -910,6 +910,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index df5d2f3..0a079b2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -297,6 +297,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..78e5943 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 90a60ab..647bfe3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -995,12 +996,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331..1678497 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a39e59d..04c32d1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1167,6 +1167,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1178,6 +1181,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 63feba0..8e66cf0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e372f88..d578a5d 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132..35adf12 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 1fa9650..7a5b3c7 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1382,6 +1382,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1448,6 +1449,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..5426a92 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index c96580c..60ac387 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -491,11 +491,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..8915f46 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
-explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+-- test Parallel Append.
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#94

Rafia Sabih

rafia.sabih@enterprisedb.com

over 8 years ago

In reply to: Amit Khandekar (#87)

Re: Parallel Append implementation

On Wed, Aug 30, 2017 at 5:32 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Hi Rafia,

On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

But for all of the cases here, partial
subplans seem possible, and so even on HEAD it executed Partial
Append. So between a Parallel Append having partial subplans and a
Partial Append having partial subplans , the cost difference would not
be significant. Even if we assume that Parallel Append was chosen
because its cost turned out to be a bit cheaper, the actual
performance gain seems quite large as compared to the expected cost
difference. So it might be even possible that the performance gain
might be due to some other reasons. I will investigate this, and the
other queries.

I ran all the queries that were showing performance benefits in your
run. But for me, the ParallelAppend benefits are shown only for plans
that use Partition-Wise-Join.

For all the queries that use only PA plans but not PWJ plans, I got
the exact same plan for HEAD as for PA+PWJ patch, except that for the
later, the Append is a ParallelAppend. Whereas, for you, the plans
have join-order changed.

Regarding actual costs; consequtively, for me the actual-cost are more
or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
quite different costs naturally because the plans themselves are
different on head versus PA+PWJ.

My PA+PWJ plan outputs (and actual costs) match exactly what you get
with PA+PWJ patch. But like I said, I get the same join order and same
plans (and actual costs) for HEAD as well (except
ParallelAppend=>Append).

May be, if you have the latest HEAD code with your setup, you can
yourself check some of the queries again to see if they are still
seeing higher costs as compared to PA ? I suspect that some changes in
latest code might be causing this discrepancy; because when I tested
some of the explains with a HEAD-branch server running with your
database, I got results matching PA figures.

Attached is my explain-analyze outputs.

Now, when I compare your results with the ones I posted I could see
one major difference between them -- selectivity estimation errors.
In the results I posted, e.g. Q3, on head it gives following

-> Finalize GroupAggregate (cost=41131358.89..101076015.45
rows=455492628 width=44) (actual time=126436.642..129247.972
rows=226765 loops=1)
Group Key: lineitem_001.l_orderkey,
orders_001.o_orderdate, orders_001.o_shippriority
-> Gather Merge (cost=41131358.89..90637642.73
rows=379577190 width=44) (actual time=126436.602..127791.768
rows=235461 loops=1)
Workers Planned: 2
Workers Launched: 2

and in your results it is,
-> Finalize GroupAggregate (cost=4940619.86..6652725.07
rows=13009521 width=44) (actual time=89573.830..91956.956 rows=226460
loops=1)
Group Key: lineitem_001.l_orderkey,
orders_001.o_orderdate, orders_001.o_shippriority
-> Gather Merge (cost=4940619.86..6354590.21
rows=10841268 width=44) (actual time=89573.752..90747.393 rows=235465
loops=1)
Workers Planned: 2
Workers Launched: 2

However, for the results with the patch/es this is not the case,

in my results, with patch,

-> Finalize GroupAggregate (cost=4933450.21..6631111.01
rows=12899766 width=44) (actual time=87250.039..90593.716 rows=226765
loops=1)
Group Key: lineitem_001.l_orderkey,
orders_001.o_orderdate, orders_001.o_shippriority
-> Gather Merge (cost=4933450.21..6335491.38
rows=10749804 width=44) (actual time=87250.020..89125.279 rows=227291
loops=1)
Workers Planned: 2
Workers Launched: 2

I think this explains the reason for drastic different in the plan
choices and thus the performance for both the cases.

Since I was using same database for the cases, I don't have much
reasons for such difference in selectivity estimation for these
queries. The only thing might be a missing vacuum analyse, but since I
checked it a couple of times I am not sure if even that could be the
reason. Additionally, it is not the case for all the queries, like in
Q10 and Q21, the estimates are similar.

However, on a fresh database the selectivity-estimates and plans as
reported by you and with the patched version I posted seems to be the
correct one. I'll see if I may check performance of these queries once
again to verify these.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#93)

Re: Parallel Append implementation

On Fri, Sep 8, 2017 at 3:59 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 7 September 2017 at 11:05, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
3.
+/* ----------------------------------------------------------------
+ * exec_append_leader_next
+ *
+ * To be used only if it's a parallel leader. The backend should scan
+ * backwards from the last plan. This is to prevent it from taking up
+ * the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
From above explanation, it is clear that you don't want backend to
pick an expensive plan for a leader, but the reason for this different
treatment is not clear.
Explained it, saying that for more workers, a leader spends more work
in processing the worker tuples , and less work contributing to
parallel processing. So it should not take expensive plans, otherwise
it will affect the total time to finish Append plan.

In that case, why can't we keep the workers also process in same
order, what is the harm in that? Also, the leader will always scan
the subplans from last subplan even if all the subplans are partial
plans. I think this will be the unnecessary difference in the
strategy of leader and worker especially when all paths are partial.
I think the selection of next subplan might become simpler if we use
the same strategy for worker and leader.

Few more comments:

1.
+ else if (IsA(subpath, MergeAppendPath))
+ {
+ MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+ /*
+ * If at all MergeAppend is partial, all its child plans have to be
+ * partial : we don't currently support a mix of partial and
+ * non-partial MergeAppend subpaths.
+ */
+ if (is_partial)
+ return list_concat(partial_subpaths, list_copy(mpath->subpaths));

In which situation partial MergeAppendPath is generated? Can you
provide one example of such path?

2.
add_paths_to_append_rel()
{
..
+ /* Consider parallel append path. */
+ if (pa_subpaths_valid)
+ {
+ AppendPath *appendpath;
+ int parallel_workers;
+
+ parallel_workers = get_append_num_workers(pa_partial_subpaths,
+  pa_nonpartial_subpaths);
+ appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+ pa_partial_subpaths,
+ NULL, parallel_workers, true,
+ partitioned_rels);
+ add_partial_path(rel, (Path *) appendpath);
+ }
+
  /*
- * Consider an append of partial unordered, unparameterized partial paths.
+ * Consider non-parallel partial append path. But if the parallel append
+ * path is made out of all partial subpaths, don't create another partial
+ * path; we will keep only the parallel append path in that case.
  */
- if (partial_subpaths_valid)
+ if (partial_subpaths_valid && !pa_all_partial_subpaths)
  {
  AppendPath *appendpath;
  ListCell   *lc;
  int parallel_workers = 0;

  /*
- * Decide on the number of workers to request for this append path.
- * For now, we just use the maximum value from among the members.  It
- * might be useful to use a higher number if the Append node were
- * smart enough to spread out the workers, but it currently isn't.
+ * To decide the number of workers, just use the maximum value from
+ * among the children.
  */
  foreach(lc, partial_subpaths)
  {
@@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
  }
  Assert(parallel_workers > 0);

- /* Generate a partial append path. */
- appendpath = create_append_path(rel, partial_subpaths, NULL,
- parallel_workers, partitioned_rels);
+ appendpath = create_append_path(rel, NIL, partial_subpaths,
+ NULL, parallel_workers, false,
+ partitioned_rels);
  add_partial_path(rel, (Path *) appendpath);
  }
..
}

I think it might be better to add a sentence why we choose a different
way to decide a number of workers in the second case
(non-parallel-aware append). Do you think non-parallel-aware Append
will be better in any case when there is a parallel-aware append? I
mean to say let's try to create non-parallel-aware append only when
parallel-aware append is not possible.

3.
+ * evaluates to a value just a bit greater than max(w1,w2, w3). So, we

The spacing between w1, w2, w3 is not same.

4.
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;

Why you have changed the existing test. It seems count(*) will also
give what you are expecting.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#95)

Re: Parallel Append implementation

On 8 September 2017 at 19:17, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 8, 2017 at 3:59 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 7 September 2017 at 11:05, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
3.
+/* ----------------------------------------------------------------
+ * exec_append_leader_next
+ *
+ * To be used only if it's a parallel leader. The backend should scan
+ * backwards from the last plan. This is to prevent it from taking up
+ * the most expensive non-partial plan, i.e. the first subplan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
From above explanation, it is clear that you don't want backend to
pick an expensive plan for a leader, but the reason for this different
treatment is not clear.
Explained it, saying that for more workers, a leader spends more work
in processing the worker tuples , and less work contributing to
parallel processing. So it should not take expensive plans, otherwise
it will affect the total time to finish Append plan.
In that case, why can't we keep the workers also process in same
order, what is the harm in that?

Because of the way the logic of queuing works, the workers finish
earlier if they start with expensive plans first. For e.g. : with 3
plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
w2 will finish the remaining 2 plans in 8 units. Whereas if the plans
are ordered like : 4, 4, 8, then the workers will finish in 12 time
units (w1 and w2 will finish each of the 1st two plans in 4 units, and
then w1 or w2 will take up plan 3 and finish in 8 units, while the
other worker remains idle).

Also, the leader will always scan
the subplans from last subplan even if all the subplans are partial
plans.

Since we already need to have two different code paths, I think we can
use the same code paths for any subplans.

I think this will be the unnecessary difference in the
strategy of leader and worker especially when all paths are partial.
I think the selection of next subplan might become simpler if we use
the same strategy for worker and leader.

Yeah if we had a common method for both it would have been better. But
anyways we have different logics to maintain.

Few more comments:

1.
+ else if (IsA(subpath, MergeAppendPath))
+ {
+ MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+ /*
+ * If at all MergeAppend is partial, all its child plans have to be
+ * partial : we don't currently support a mix of partial and
+ * non-partial MergeAppend subpaths.
+ */
+ if (is_partial)
+ return list_concat(partial_subpaths, list_copy(mpath->subpaths));

In which situation partial MergeAppendPath is generated? Can you
provide one example of such path?

Actually currently we don't support partial paths for MergeAppendPath.
That code just has that if condition (is_partial) but currently that
condition won't be true for MergeAppendPath.

2.
add_paths_to_append_rel()
{
..
+ /* Consider parallel append path. */
+ if (pa_subpaths_valid)
+ {
+ AppendPath *appendpath;
+ int parallel_workers;
+
+ parallel_workers = get_append_num_workers(pa_partial_subpaths,
+  pa_nonpartial_subpaths);
+ appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+ pa_partial_subpaths,
+ NULL, parallel_workers, true,
+ partitioned_rels);
+ add_partial_path(rel, (Path *) appendpath);
+ }
+
/*
- * Consider an append of partial unordered, unparameterized partial paths.
+ * Consider non-parallel partial append path. But if the parallel append
+ * path is made out of all partial subpaths, don't create another partial
+ * path; we will keep only the parallel append path in that case.
*/
- if (partial_subpaths_valid)
+ if (partial_subpaths_valid && !pa_all_partial_subpaths)
{
AppendPath *appendpath;
ListCell   *lc;
int parallel_workers = 0;

/*
- * Decide on the number of workers to request for this append path.
- * For now, we just use the maximum value from among the members.  It
- * might be useful to use a higher number if the Append node were
- * smart enough to spread out the workers, but it currently isn't.
+ * To decide the number of workers, just use the maximum value from
+ * among the children.
*/
foreach(lc, partial_subpaths)
{
@@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
}
Assert(parallel_workers > 0);

- /* Generate a partial append path. */
- appendpath = create_append_path(rel, partial_subpaths, NULL,
- parallel_workers, partitioned_rels);
+ appendpath = create_append_path(rel, NIL, partial_subpaths,
+ NULL, parallel_workers, false,
+ partitioned_rels);
add_partial_path(rel, (Path *) appendpath);
}
..
}

I think it might be better to add a sentence why we choose a different
way to decide a number of workers in the second case
(non-parallel-aware append).

Yes, I agree. Will do that after we conclude with your next point below ...

Do you think non-parallel-aware Append
will be better in any case when there is a parallel-aware append? I
mean to say let's try to create non-parallel-aware append only when
parallel-aware append is not possible.

By non-parallel-aware append, I am assuming you meant partial
non-parallel-aware Append. Yes, if the parallel-aware Append path has
*all* partial subpaths chosen, then we do omit a partial non-parallel
Append path, as seen in this code in the patch :

/*
* Consider non-parallel partial append path. But if the parallel append
* path is made out of all partial subpaths, don't create another partial
* path; we will keep only the parallel append path in that case.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
{
......
}

But if the parallel-Append path has a mix of partial and non-partial
subpaths, then we can't really tell which of the two could be cheapest
until we calculate the cost. It can be that the non-parallel-aware
partial Append can be cheaper as well.

3.
+ * evaluates to a value just a bit greater than max(w1,w2, w3). So, we

The spacing between w1, w2, w3 is not same.

Right, will note this down for the next updated patch.

4.
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
Why you have changed the existing test. It seems count(*) will also
give what you are expecting.

Needed to do cover some data testing with Parallel Append execution.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#97

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#96)

Re: Parallel Append implementation

On Mon, Sep 11, 2017 at 4:49 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 8 September 2017 at 19:17, Amit Kapila <amit.kapila16@gmail.com> wrote:

In that case, why can't we keep the workers also process in same
order, what is the harm in that?

Because of the way the logic of queuing works, the workers finish
earlier if they start with expensive plans first. For e.g. : with 3
plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
w2 will finish the remaining 2 plans in 8 units. Whereas if the plans
are ordered like : 4, 4, 8, then the workers will finish in 12 time
units (w1 and w2 will finish each of the 1st two plans in 4 units, and
then w1 or w2 will take up plan 3 and finish in 8 units, while the
other worker remains idle).

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

Few more comments:
1.
+ else if (IsA(subpath, MergeAppendPath))
+ {
+ MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+ /*
+ * If at all MergeAppend is partial, all its child plans have to be
+ * partial : we don't currently support a mix of partial and
+ * non-partial MergeAppend subpaths.
+ */
+ if (is_partial)
+ return list_concat(partial_subpaths, list_copy(mpath->subpaths));
In which situation partial MergeAppendPath is generated? Can you
provide one example of such path?
Actually currently we don't support partial paths for MergeAppendPath.
That code just has that if condition (is_partial) but currently that
condition won't be true for MergeAppendPath.

I think then it is better to have Assert for MergeAppend. It is
generally not a good idea to add code which we can never hit.

2.
add_paths_to_append_rel()

I think it might be better to add a sentence why we choose a different
way to decide a number of workers in the second case
(non-parallel-aware append).

Yes, I agree. Will do that after we conclude with your next point below ...

Do you think non-parallel-aware Append
will be better in any case when there is a parallel-aware append? I
mean to say let's try to create non-parallel-aware append only when
parallel-aware append is not possible.

By non-parallel-aware append, I am assuming you meant partial
non-parallel-aware Append. Yes, if the parallel-aware Append path has
*all* partial subpaths chosen, then we do omit a partial non-parallel
Append path, as seen in this code in the patch :

/*
* Consider non-parallel partial append path. But if the parallel append
* path is made out of all partial subpaths, don't create another partial
* path; we will keep only the parallel append path in that case.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
{
......
}

But if the parallel-Append path has a mix of partial and non-partial
subpaths, then we can't really tell which of the two could be cheapest
until we calculate the cost. It can be that the non-parallel-aware
partial Append can be cheaper as well.

How? See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy. Can you
please try to give me some example so that it will be clear.

4.
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
Why you have changed the existing test. It seems count(*) will also
give what you are expecting.
Needed to do cover some data testing with Parallel Append execution.

Okay.

One more thing, I think you might want to update comment atop
add_paths_to_append_rel to reflect the new type of parallel-aware
append path being generated. In particular, I am referring to below
part of the comment:

* Similarly it collects partial paths from
* non-dummy children to create partial append paths.
*/
static void
add_paths_to_append_rel()

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#98

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#97)

Re: Parallel Append implementation

On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

The general picture here is that we don't want the leader to get stuck
inside some long-running operation because then it won't be available
to read tuples from the workers. On the other hand, we don't want to
just have the leader do no work because that might be slow. And in
most cast cases, the leader will be the first participant to arrive at
the Append node, because of the worker startup time. So the idea is
that the workers should pick expensive things first, and the leader
should pick cheap things first. This may not always work out
perfectly and certainly the details of the algorithm may need some
refinement, but I think the basic concept is good. Of course, that
may be because I proposed it...

Note that there's a big difference between the leader picking a
partial path and the leader picking a non-partial path. If the leader
picks a partial path, it isn't committed to executing that path to
completion. Other workers can help. If the leader picks a
non-partial path, though, the workers are locked out of that path,
because a single process must run it all the way through. So the
leader should avoid choosing a non-partial path unless there are no
partial paths remaining.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#98)

Re: Parallel Append implementation

On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

The general picture here is that we don't want the leader to get stuck
inside some long-running operation because then it won't be available
to read tuples from the workers. On the other hand, we don't want to
just have the leader do no work because that might be slow. And in
most cast cases, the leader will be the first participant to arrive at
the Append node, because of the worker startup time. So the idea is
that the workers should pick expensive things first, and the leader
should pick cheap things first.

At a broader level, the idea is good, but I think it won't turn out
exactly like that considering your below paragraph which indicates
that it is okay if leader picks a partial path that is costly among
other partial paths as a leader won't be locked with that.
Considering this is a good design for parallel append, the question is
do we really need worker and leader to follow separate strategy for
choosing next path. I think the patch will be simpler if we can come
up with a way for the worker and leader to use the same strategy to
pick next path to process. How about we arrange the list of paths
such that first, all partial paths will be there and then non-partial
paths and probably both in decreasing order of cost. Now, both leader
and worker can start from the beginning of the list. In most cases,
the leader will start at the first partial path and will only ever
need to scan non-partial path if there is no other partial path left.
This is not bulletproof as it is possible that some worker starts
before leader in which case leader might scan non-partial path before
all partial paths are finished, but I think we can avoid that as well
if we are too worried about such cases.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#100

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#99)

Re: Parallel Append implementation

On 16 September 2017 at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

The general picture here is that we don't want the leader to get stuck
inside some long-running operation because then it won't be available
to read tuples from the workers. On the other hand, we don't want to
just have the leader do no work because that might be slow. And in
most cast cases, the leader will be the first participant to arrive at
the Append node, because of the worker startup time. So the idea is
that the workers should pick expensive things first, and the leader
should pick cheap things first.

At a broader level, the idea is good, but I think it won't turn out
exactly like that considering your below paragraph which indicates
that it is okay if leader picks a partial path that is costly among
other partial paths as a leader won't be locked with that.
Considering this is a good design for parallel append, the question is
do we really need worker and leader to follow separate strategy for
choosing next path. I think the patch will be simpler if we can come
up with a way for the worker and leader to use the same strategy to
pick next path to process. How about we arrange the list of paths
such that first, all partial paths will be there and then non-partial
paths and probably both in decreasing order of cost. Now, both leader
and worker can start from the beginning of the list. In most cases,
the leader will start at the first partial path and will only ever
need to scan non-partial path if there is no other partial path left.
This is not bulletproof as it is possible that some worker starts
before leader in which case leader might scan non-partial path before
all partial paths are finished, but I think we can avoid that as well
if we are too worried about such cases.

If there are no partial subpaths, then again the leader is likely to
take up the expensive subpath. And this scenario would not be
uncommon. And for this scenario at least, we anyway have to make it
start from cheapest one, so will have to maintain code for that logic.
Now since we anyway have to maintain that logic, why not use the same
logic for leader for all cases. Otherwise, if we try to come up with a
common logic that conditionally chooses different next plan for leader
or worker, then that logic would most probably get complicated than
the current state. Also, I don't see any performance issue if there is
a leader is running backwards while the others are going forwards.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#101

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#97)

1 attachment(s)

Re: Parallel Append implementation

On 11 September 2017 at 18:55, Amit Kapila <amit.kapila16@gmail.com> wrote:

1.
+ else if (IsA(subpath, MergeAppendPath))
+ {
+ MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+ /*
+ * If at all MergeAppend is partial, all its child plans have to be
+ * partial : we don't currently support a mix of partial and
+ * non-partial MergeAppend subpaths.
+ */
+ if (is_partial)
+ return list_concat(partial_subpaths, list_copy(mpath->subpaths));
In which situation partial MergeAppendPath is generated? Can you
provide one example of such path?
Actually currently we don't support partial paths for MergeAppendPath.
That code just has that if condition (is_partial) but currently that
condition won't be true for MergeAppendPath.
I think then it is better to have Assert for MergeAppend. It is
generally not a good idea to add code which we can never hit.

Done.

One more thing, I think you might want to update comment atop
add_paths_to_append_rel to reflect the new type of parallel-aware
append path being generated. In particular, I am referring to below
part of the comment:

* Similarly it collects partial paths from
* non-dummy children to create partial append paths.
*/
static void
add_paths_to_append_rel()

Added comments.

Attached revised patch v15. There is still the open point being
discussed : whether to have non-parallel-aware partial Append path, or
always have parallel-aware Append paths.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v15.patchapplication/octet-stream; name=ParallelAppend_v15.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5f59a38..26f6f45 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3671,6 +3671,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 38bf636..1dc84aa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="60"><literal>LWLock</></entry>
+        <entry morerows="61"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1109,6 +1109,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed..2e54230 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -699,6 +710,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecBitmapHeapReInitializeDSM((BitmapHeapScanState *) planstate,
 											  pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_SortState:
 			/* even when not parallel-aware */
 			ExecSortReInitializeDSM((SortState *) planstate, pcxt);
@@ -969,6 +984,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..e57e1c0 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,46 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +230,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +249,14 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +285,31 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
+		 * Note: Parallel-aware Append follows different logic for choosing the
+		 * next subplan.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc = node->as_padesc;
+
+	padesc->pa_first_plan = padesc->pa_next_plan = 0;
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared field 'pa_next_plan' to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this field to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader.
+ *		With more workers, the leader is known to do more work servicing the
+ *		worker tuple queue, and less work contributing to parallel processing.
+ *		Hence, it should not take expensive plans, otherwise it will affect the
+ *		total time to finish Append. Since we have non-partial plans sorted in
+ *		descending cost, let the leader scan backwards from the last plan, i.e.
+ *		the cheapest plan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1bed14..4f5cd1a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b83d919..539b5eb 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -394,6 +394,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index fbf8330..c79732f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1594,6 +1594,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5b746a9..7bc3b84 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -102,6 +102,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1276,7 +1279,10 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * non-dummy children. For every such parameterization or ordering, it creates
  * an append path collecting one path from each non-dummy child with given
  * parameterization or ordering. Similarly it collects partial paths from
- * non-dummy children to create partial append paths.
+ * non-dummy children to create partial append paths. Furthermore, it creates
+ * a parallel-aware partial Append path that can contain a mix of partial and
+ * non-partial paths of its children.
+ *
  */
 static void
 add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
@@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1353,7 +1363,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1431,23 +1499,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider non-parallel partial append path. But if the parallel append
+	 * path is made out of all partial subpaths, don't create another partial
+	 * path; we will keep only the parallel append path in that case.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
 		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
+		 * To decide the number of workers, just use the maximum value from
+		 * among the children.
 		 */
 		foreach(lc, partial_subpaths)
 		{
@@ -1457,9 +1541,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers, false,
+										partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1512,7 +1596,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1730,6 +1815,69 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		partial_subpaths = list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/* We don't create partial MergeAppend path */
+		Assert(!is_partial);
+
+		/*
+		 * Since MergePath itself is non-partial, treat all its subpaths
+		 * non-partial.
+		 */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(mpath->subpaths));
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			partial_subpaths = lappend(partial_subpaths, subpath);
+		else
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+	}
+
+	return partial_subpaths;
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1749,7 +1897,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..79c08aa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee2350..0eee647 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 2821662..055ac64 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1049,7 +1050,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5274,7 +5276,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5285,6 +5287,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7f146d6..bbb3eed 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3645,8 +3645,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 3e0c3de..9c7c748 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..846f33e 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,70 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths, List *nonpartial_subpaths)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/* Choose the higher of the results of the two formulae */
+	num_workers = rint(Max(log2w, max_per_plan_workers) + 1);
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1265,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,43 +1279,50 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f9..f2b4474 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bc9f09a..162993d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -915,6 +915,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 53aa006..9123162 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -297,6 +297,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..78e5943 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index c6d3021..fe1cb5f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -997,12 +998,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331..1678497 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d50ff55..39e3e6c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1167,6 +1167,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1178,6 +1181,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 63feba0..8e66cf0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e372f88..d578a5d 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+								  List *nonpartial_subpaths);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed..e190155 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faf..79ea775 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..5426a92 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc..f51e72b 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..8915f46 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
-explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+-- test Parallel Append.
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#102

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#101)

1 attachment(s)

Re: Parallel Append implementation

On 20 September 2017 at 11:32, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

There is still the open point being
discussed : whether to have non-parallel-aware partial Append path, or
always have parallel-aware Append paths.

Attached is the revised patch v16. In previous versions, we used to
add a non-parallel-aware Partial Append path having all partial
subpaths if the Parallel Append path already added does not contain
all-partial subpaths. Now in the patch, when we add such Append Path
containing all partial subpaths, we make it parallel-aware (unless
enable_parallelappend is false). So in this case, there will be a
parallel-aware Append path containing one or more non-partial
subpaths, and there will be another parallel-aware Append path
containing all-partial subpaths.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v16.patchapplication/octet-stream; name=ParallelAppend_v16.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4b265d9..e0127dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3671,6 +3671,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
       <term><varname>enable_seqscan</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 18fb9c2..8add2fa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1117,6 +1117,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed..2e54230 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -699,6 +710,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecBitmapHeapReInitializeDSM((BitmapHeapScanState *) planstate,
 											  pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_SortState:
 			/* even when not parallel-aware */
 			ExecSortReInitializeDSM((SortState *) planstate, pcxt);
@@ -969,6 +984,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..e57e1c0 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,46 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +230,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +249,14 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +285,31 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
+		 * Note: Parallel-aware Append follows different logic for choosing the
+		 * next subplan.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			/*
+			 */
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc = node->as_padesc;
+
+	padesc->pa_first_plan = padesc->pa_next_plan = 0;
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared field 'pa_next_plan' to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this field to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader.
+ *		With more workers, the leader is known to do more work servicing the
+ *		worker tuple queue, and less work contributing to parallel processing.
+ *		Hence, it should not take expensive plans, otherwise it will affect the
+ *		total time to finish Append. Since we have non-partial plans sorted in
+ *		descending cost, let the leader scan backwards from the last plan, i.e.
+ *		the cheapest plan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1bed14..4f5cd1a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b83d919..539b5eb 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -394,6 +394,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index fbf8330..c79732f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1594,6 +1594,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a7866a9..775a7c6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -102,6 +102,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1276,7 +1279,10 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * non-dummy children. For every such parameterization or ordering, it creates
  * an append path collecting one path from each non-dummy child with given
  * parameterization or ordering. Similarly it collects partial paths from
- * non-dummy children to create partial append paths.
+ * non-dummy children to create partial append paths. Furthermore, it creates
+ * a parallel-aware partial Append path that can contain a mix of partial and
+ * non-partial paths of its children.
+ *
  */
 static void
 add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
@@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
+	bool		pa_all_partial_subpaths = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1353,7 +1363,65 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+
+			if (chosen_path && chosen_path != cheapest_partial_path)
+				pa_all_partial_subpaths = false;
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1431,35 +1499,45 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths,
+												  true);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * If all the child rels have partial paths, and if the above Parallel
+	 * Append path has a mix of partial and non-partial subpaths, then consider
+	 * another Parallel Append path which will have *all* partial subpaths.
+	 * If enable_parallelappend is off, make this one non-parallel-aware.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid && !pa_all_partial_subpaths)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths,
+												  NIL,
+												  enable_parallelappend);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers,
+										enable_parallelappend,
+										partitioned_rels);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1512,7 +1590,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1730,6 +1809,69 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		partial_subpaths = list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/* We don't create partial MergeAppend path */
+		Assert(!is_partial);
+
+		/*
+		 * Since MergePath itself is non-partial, treat all its subpaths
+		 * non-partial.
+		 */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(mpath->subpaths));
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			partial_subpaths = lappend(partial_subpaths, subpath);
+		else
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+	}
+
+	return partial_subpaths;
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1749,7 +1891,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 0baf978..94b2696 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -127,6 +127,7 @@ bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -159,6 +160,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee2350..0eee647 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 2821662..055ac64 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1049,7 +1050,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5274,7 +5276,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5285,6 +5287,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7f146d6..bbb3eed 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3645,8 +3645,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 3e0c3de..9c7c748 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..34f32d9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,6 +46,7 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
 
 
 /*****************************************************************************
@@ -1193,6 +1194,82 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths,
+					   List *nonpartial_subpaths,
+					   bool parallel_aware)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/*
+	 * For non-parallel-aware Append, all workers run a common subplan at a
+	 * time, so it makes sense to just choose the per-subplan max workers as
+	 * the Append workers. For parallel-aware Append, choose the higher of the
+	 * results of the two formulae.
+	 */
+	num_workers = (parallel_aware ?
+				   rint(Max(log2w, max_per_plan_workers) + 1)
+				   : max_per_plan_workers);
+
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	Assert(num_workers > 0);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1200,8 +1277,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1211,43 +1291,50 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	/* For parallel append, non-partial paths are sorted by descending costs */
+	if (pathnode->path.parallel_aware)
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f9..f2b4474 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 47a5f25..3ab7455 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -911,6 +911,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8ba6b1d..e353848 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -297,6 +297,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..78e5943 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index c6d3021..fe1cb5f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -997,12 +998,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331..1678497 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 48e6012..d3e9a6f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1221,6 +1221,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1232,6 +1235,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 63feba0..8e66cf0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,6 +67,7 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -105,6 +106,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e372f88..b9f8c00 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,14 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+					   List *nonpartial_subpaths,
+					   bool parallel_aware);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed..e190155 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faf..79ea775 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..5426a92 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,15 +11,16 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
          ->  Partial Aggregate
-               ->  Append
+               ->  Parallel Append
                      ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
@@ -28,12 +29,40 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 568b783..97a9843 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -70,21 +70,22 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-         name         | setting 
-----------------------+---------
- enable_bitmapscan    | on
- enable_gathermerge   | on
- enable_hashagg       | on
- enable_hashjoin      | on
- enable_indexonlyscan | on
- enable_indexscan     | on
- enable_material      | on
- enable_mergejoin     | on
- enable_nestloop      | on
- enable_seqscan       | on
- enable_sort          | on
- enable_tidscan       | on
-(12 rows)
+         name          | setting 
+-----------------------+---------
+ enable_bitmapscan     | on
+ enable_gathermerge    | on
+ enable_hashagg        | on
+ enable_hashjoin       | on
+ enable_indexonlyscan  | on
+ enable_indexscan      | on
+ enable_material       | on
+ enable_mergejoin      | on
+ enable_nestloop       | on
+ enable_parallelappend | on
+ enable_seqscan        | on
+ enable_sort           | on
+ enable_tidscan        | on
+(13 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc..f51e72b 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..8915f46 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,18 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
-explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+-- test Parallel Append.
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#103

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#100)

Re: Parallel Append implementation

On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 16 September 2017 at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

The general picture here is that we don't want the leader to get stuck
inside some long-running operation because then it won't be available
to read tuples from the workers. On the other hand, we don't want to
just have the leader do no work because that might be slow. And in
most cast cases, the leader will be the first participant to arrive at
the Append node, because of the worker startup time. So the idea is
that the workers should pick expensive things first, and the leader
should pick cheap things first.

At a broader level, the idea is good, but I think it won't turn out
exactly like that considering your below paragraph which indicates
that it is okay if leader picks a partial path that is costly among
other partial paths as a leader won't be locked with that.
Considering this is a good design for parallel append, the question is
do we really need worker and leader to follow separate strategy for
choosing next path. I think the patch will be simpler if we can come
up with a way for the worker and leader to use the same strategy to
pick next path to process. How about we arrange the list of paths
such that first, all partial paths will be there and then non-partial
paths and probably both in decreasing order of cost. Now, both leader
and worker can start from the beginning of the list. In most cases,
the leader will start at the first partial path and will only ever
need to scan non-partial path if there is no other partial path left.
This is not bulletproof as it is possible that some worker starts
before leader in which case leader might scan non-partial path before
all partial paths are finished, but I think we can avoid that as well
if we are too worried about such cases.

If there are no partial subpaths, then again the leader is likely to
take up the expensive subpath.

I think in general the non-partial paths should be cheaper as compared
to partial paths as that is the reason planner choose not to make a
partial plan at first place. I think the idea patch is using will help
because the leader will choose to execute partial path in most cases
(when there is a mix of partial and non-partial paths) and for that
case, the leader is not bound to complete the execution of that path.
However, if all the paths are non-partial, then I am not sure much
difference it would be to choose one path over other.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#104

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#103)

Re: Parallel Append implementation

On Fri, Sep 29, 2017 at 7:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think in general the non-partial paths should be cheaper as compared
to partial paths as that is the reason planner choose not to make a
partial plan at first place. I think the idea patch is using will help
because the leader will choose to execute partial path in most cases
(when there is a mix of partial and non-partial paths) and for that
case, the leader is not bound to complete the execution of that path.
However, if all the paths are non-partial, then I am not sure much
difference it would be to choose one path over other.

The case where all plans are non-partial is the case where it matters
most! If the leader is going to take a share of the work, we want it
to take the smallest share possible.

It's a lot fuzzier what is best when there are only partial plans.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#105

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#104)

Re: Parallel Append implementation

On Sat, Sep 30, 2017 at 4:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 29, 2017 at 7:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think in general the non-partial paths should be cheaper as compared
to partial paths as that is the reason planner choose not to make a
partial plan at first place. I think the idea patch is using will help
because the leader will choose to execute partial path in most cases
(when there is a mix of partial and non-partial paths) and for that
case, the leader is not bound to complete the execution of that path.
However, if all the paths are non-partial, then I am not sure much
difference it would be to choose one path over other.

The case where all plans are non-partial is the case where it matters
most! If the leader is going to take a share of the work, we want it
to take the smallest share possible.

Okay, but the point is whether it will make any difference
practically. Let us try to see with an example, consider there are
two children (just taking two for simplicity, we can extend it to
many) and first having 1000 pages to scan and second having 900 pages
to scan, then it might not make much difference which child plan
leader chooses. Now, it might matter if the first child relation has
1000 pages to scan and second has just 1 page to scan, but not sure
how much difference will it be in practice considering that is almost
the maximum possible theoretical difference between two non-partial
paths (if we have pages greater than 1024 pages
(min_parallel_table_scan_size) then it will have a partial path).

It's a lot fuzzier what is best when there are only partial plans.

The point that bothers me a bit is whether it is a clear win if we
allow the leader to choose a different strategy to pick the paths or
is this just our theoretical assumption. Basically, I think the patch
will become simpler if pick some simple strategy to choose paths.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#100)

Re: Parallel Append implementation

On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 16 September 2017 at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:

At a broader level, the idea is good, but I think it won't turn out
exactly like that considering your below paragraph which indicates
that it is okay if leader picks a partial path that is costly among
other partial paths as a leader won't be locked with that.
Considering this is a good design for parallel append, the question is
do we really need worker and leader to follow separate strategy for
choosing next path. I think the patch will be simpler if we can come
up with a way for the worker and leader to use the same strategy to
pick next path to process. How about we arrange the list of paths
such that first, all partial paths will be there and then non-partial
paths and probably both in decreasing order of cost. Now, both leader
and worker can start from the beginning of the list. In most cases,
the leader will start at the first partial path and will only ever
need to scan non-partial path if there is no other partial path left.
This is not bulletproof as it is possible that some worker starts
before leader in which case leader might scan non-partial path before
all partial paths are finished, but I think we can avoid that as well
if we are too worried about such cases.

If there are no partial subpaths, then again the leader is likely to
take up the expensive subpath. And this scenario would not be
uncommon.

While thinking about how common the case of no partial subpaths would
be, it occurred to me that as of now we always create a partial path
for the inheritance child if it is parallel-safe and the user has not
explicitly set the value of parallel_workers to zero (refer
compute_parallel_worker). So, unless you are planning to change that,
I think it will be quite uncommon to have no partial subpaths.

Few nitpicks in your latest patch:
1.
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+

Looks like a spurious line.

2.
@@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
..
+ if (chosen_path && chosen_path != cheapest_partial_path)
+ pa_all_partial_subpaths = false;

It will keep on setting pa_all_partial_subpaths as false for
non-partial paths which don't seem to be the purpose of this variable.
I think you want it to be set even when there is one non-partial path,
so isn't it better to write as below or something similar:
if (pa_nonpartial_subpaths && pa_all_partial_subpaths)
pa_all_partial_subpaths = false;

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#107

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#105)

Re: Parallel Append implementation

On Sat, Sep 30, 2017 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but the point is whether it will make any difference
practically. Let us try to see with an example, consider there are
two children (just taking two for simplicity, we can extend it to
many) and first having 1000 pages to scan and second having 900 pages
to scan, then it might not make much difference which child plan
leader chooses. Now, it might matter if the first child relation has
1000 pages to scan and second has just 1 page to scan, but not sure
how much difference will it be in practice considering that is almost
the maximum possible theoretical difference between two non-partial
paths (if we have pages greater than 1024 pages
(min_parallel_table_scan_size) then it will have a partial path).

But that's comparing two non-partial paths for the same relation --
the point here is to compare across relations. Also keep in mind
scenarios like this:

SELECT ... FROM relation UNION ALL SELECT ... FROM generate_series(...);

It's a lot fuzzier what is best when there are only partial plans.

The point that bothers me a bit is whether it is a clear win if we
allow the leader to choose a different strategy to pick the paths or
is this just our theoretical assumption. Basically, I think the patch
will become simpler if pick some simple strategy to choose paths.

Well, that's true, but is it really that much complexity?

And I actually don't see how this is very debatable. If the only
paths that are reasonably cheap are GIN index scans, then the only
strategy is to dole them out across the processes you've got. Giving
the leader the cheapest one seems to be to be clearly smarter than any
other strategy. Am I missing something?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#107)

Re: Parallel Append implementation

On Sat, Sep 30, 2017 at 9:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Sep 30, 2017 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but the point is whether it will make any difference
practically. Let us try to see with an example, consider there are
two children (just taking two for simplicity, we can extend it to
many) and first having 1000 pages to scan and second having 900 pages
to scan, then it might not make much difference which child plan
leader chooses. Now, it might matter if the first child relation has
1000 pages to scan and second has just 1 page to scan, but not sure
how much difference will it be in practice considering that is almost
the maximum possible theoretical difference between two non-partial
paths (if we have pages greater than 1024 pages
(min_parallel_table_scan_size) then it will have a partial path).

But that's comparing two non-partial paths for the same relation --
the point here is to compare across relations.

Isn't it for both? I mean it is about comparing the non-partial paths
for child relations of the same relation and also when there are
different relations involved as in Union All kind of query. In any
case, the point I was trying to say is that generally non-partial
relations will have relatively smaller scan size, so probably should
take lesser time to complete.

Also keep in mind
scenarios like this:

SELECT ... FROM relation UNION ALL SELECT ... FROM generate_series(...);

I think for the FunctionScan case, non-partial paths can be quite costly.

It's a lot fuzzier what is best when there are only partial plans.

The point that bothers me a bit is whether it is a clear win if we
allow the leader to choose a different strategy to pick the paths or
is this just our theoretical assumption. Basically, I think the patch
will become simpler if pick some simple strategy to choose paths.

Well, that's true, but is it really that much complexity?

And I actually don't see how this is very debatable. If the only
paths that are reasonably cheap are GIN index scans, then the only
strategy is to dole them out across the processes you've got. Giving
the leader the cheapest one seems to be to be clearly smarter than any
other strategy.

Sure, I think it is quite good if we can achieve that but it seems to
me that we will not be able to achieve that in all scenario's with the
patch and rather I think in some situations it can result in leader
ended up picking the costly plan (in case when there are all partial
plans or mix of partial and non-partial plans). Now, we are ignoring
such cases based on the assumption that other workers might help to
complete master backend. I think it is quite possible that the worker
backends picks up some plans which emit rows greater than tuple queue
size and they instead wait on the master backend which itself is busy
in completing its plan. So master backend will end up taking too much
time. If we want to go with a strategy of master (leader) backend and
workers taking a different strategy to pick paths to work on, then it
might be better if we should try to ensure that master backend always
starts from the place which has cheapest plans irrespective of whether
the path is partial or non-partial.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#109

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#108)

Re: Parallel Append implementation

On Sun, Oct 1, 2017 at 9:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Isn't it for both? I mean it is about comparing the non-partial paths
for child relations of the same relation and also when there are
different relations involved as in Union All kind of query. In any
case, the point I was trying to say is that generally non-partial
relations will have relatively smaller scan size, so probably should
take lesser time to complete.

I don't think that's a valid inference. It's true that a relation
could fail to have a partial path because it's small, but that's only
one reason among very many. The optimal index type could be one that
doesn't support parallel index scans, for example.

Sure, I think it is quite good if we can achieve that but it seems to
me that we will not be able to achieve that in all scenario's with the
patch and rather I think in some situations it can result in leader
ended up picking the costly plan (in case when there are all partial
plans or mix of partial and non-partial plans). Now, we are ignoring
such cases based on the assumption that other workers might help to
complete master backend. I think it is quite possible that the worker
backends picks up some plans which emit rows greater than tuple queue
size and they instead wait on the master backend which itself is busy
in completing its plan. So master backend will end up taking too much
time. If we want to go with a strategy of master (leader) backend and
workers taking a different strategy to pick paths to work on, then it
might be better if we should try to ensure that master backend always
starts from the place which has cheapest plans irrespective of whether
the path is partial or non-partial.

I agree that it's complicated to get this right in all cases; I'm
realizing that's probably an unattainable ideal.

However, I don't think ignoring the distinction between partial and
non-partial plans is an improvement, because the argument that other
workers may be able to help the leader is a correct one. You are
correct in saying that the workers might fill up their tuple queues
while the leader is working, but once the leader returns one tuple it
will switch to reading from the queues. Every other tuple can be
returned by some worker. On the other hand, if the leader picks a
non-partial plan, it must run that plan through to completion.

Imagine (a) a non-partial path with a cost of 1000 returning 100
tuples and (b) a partial path with a cost of 10,000 returning 1,000
tuples. No matter which path the leader picks, it has about 10 units
of work to do to return 1 tuple. However, if it picks the first path,
it is committed to doing 990 more units of work later, regardless of
whether the workers have filled the tuple queues or not. If it picks
the second path, it again has about 10 units of work to do to return 1
tuple, but it hasn't committed to doing all the rest of the work of
that path. So that's better.

Now it's hard to get all of the cases right. If the partial path in
the previous example had a cost of 1 crore then even returning 1 tuple
is more costly than running the whole non-partial path. When you
introduce partition-wise join and parallel hash, there are even more
problems. Now the partial path might have a large startup cost, so
returning the first tuple may be very expensive, and that work may
help other workers (if this is a parallel hash) or it may not (if this
is a non-parallel hash). I don't know that we can get all of these
cases right, or that we should try. However, I still think that
picking partial paths preferentially is sensible -- we don't know all
the details, but we do know that they're typically going to at least
try to divide up the work in a fine-grained fashion, while a
non-partial path, once started, the leader must run it to completion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#110

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#109)

Re: Parallel Append implementation

On Mon, Oct 2, 2017 at 6:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Oct 1, 2017 at 9:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Isn't it for both? I mean it is about comparing the non-partial paths
for child relations of the same relation and also when there are
different relations involved as in Union All kind of query. In any
case, the point I was trying to say is that generally non-partial
relations will have relatively smaller scan size, so probably should
take lesser time to complete.

I don't think that's a valid inference. It's true that a relation
could fail to have a partial path because it's small, but that's only
one reason among very many. The optimal index type could be one that
doesn't support parallel index scans, for example.

Valid point.

Sure, I think it is quite good if we can achieve that but it seems to
me that we will not be able to achieve that in all scenario's with the
patch and rather I think in some situations it can result in leader
ended up picking the costly plan (in case when there are all partial
plans or mix of partial and non-partial plans). Now, we are ignoring
such cases based on the assumption that other workers might help to
complete master backend. I think it is quite possible that the worker
backends picks up some plans which emit rows greater than tuple queue
size and they instead wait on the master backend which itself is busy
in completing its plan. So master backend will end up taking too much
time. If we want to go with a strategy of master (leader) backend and
workers taking a different strategy to pick paths to work on, then it
might be better if we should try to ensure that master backend always
starts from the place which has cheapest plans irrespective of whether
the path is partial or non-partial.

I agree that it's complicated to get this right in all cases; I'm
realizing that's probably an unattainable ideal.

However, I don't think ignoring the distinction between partial and
non-partial plans is an improvement, because the argument that other
workers may be able to help the leader is a correct one. You are
correct in saying that the workers might fill up their tuple queues
while the leader is working, but once the leader returns one tuple it
will switch to reading from the queues. Every other tuple can be
returned by some worker. On the other hand, if the leader picks a
non-partial plan, it must run that plan through to completion.

Imagine (a) a non-partial path with a cost of 1000 returning 100
tuples and (b) a partial path with a cost of 10,000 returning 1,000
tuples. No matter which path the leader picks, it has about 10 units
of work to do to return 1 tuple. However, if it picks the first path,
it is committed to doing 990 more units of work later, regardless of
whether the workers have filled the tuple queues or not. If it picks
the second path, it again has about 10 units of work to do to return 1
tuple, but it hasn't committed to doing all the rest of the work of
that path. So that's better.

Now it's hard to get all of the cases right. If the partial path in
the previous example had a cost of 1 crore then even returning 1 tuple
is more costly than running the whole non-partial path. When you
introduce partition-wise join and parallel hash, there are even more
problems. Now the partial path might have a large startup cost, so
returning the first tuple may be very expensive, and that work may
help other workers (if this is a parallel hash) or it may not (if this
is a non-parallel hash).

Yeah, these were the type of cases I am also worried.

I don't know that we can get all of these
cases right, or that we should try. However, I still think that
picking partial paths preferentially is sensible -- we don't know all
the details, but we do know that they're typically going to at least
try to divide up the work in a fine-grained fashion, while a
non-partial path, once started, the leader must run it to completion.

Okay, but can't we try to pick the cheapest partial path for master
backend and maybe master backend can try to work on a partial path
which is already picked up by some worker.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#111

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#106)

Re: Parallel Append implementation

On 30 September 2017 at 19:21, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 16 September 2017 at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:

At a broader level, the idea is good, but I think it won't turn out
exactly like that considering your below paragraph which indicates
that it is okay if leader picks a partial path that is costly among
other partial paths as a leader won't be locked with that.
Considering this is a good design for parallel append, the question is
do we really need worker and leader to follow separate strategy for
choosing next path. I think the patch will be simpler if we can come
up with a way for the worker and leader to use the same strategy to
pick next path to process. How about we arrange the list of paths
such that first, all partial paths will be there and then non-partial
paths and probably both in decreasing order of cost. Now, both leader
and worker can start from the beginning of the list. In most cases,
the leader will start at the first partial path and will only ever
need to scan non-partial path if there is no other partial path left.
This is not bulletproof as it is possible that some worker starts
before leader in which case leader might scan non-partial path before
all partial paths are finished, but I think we can avoid that as well
if we are too worried about such cases.

If there are no partial subpaths, then again the leader is likely to
take up the expensive subpath. And this scenario would not be
uncommon.

While thinking about how common the case of no partial subpaths would
be, it occurred to me that as of now we always create a partial path
for the inheritance child if it is parallel-safe and the user has not
explicitly set the value of parallel_workers to zero (refer
compute_parallel_worker). So, unless you are planning to change that,
I think it will be quite uncommon to have no partial subpaths.

There are still some paths that can have non-partial paths cheaper
than the partial paths. Also, there can be UNION ALL queries which
could have non-partial subpaths. I guess this has already been
discussed in the other replies.

Few nitpicks in your latest patch:
1.
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+

Looks like a spurious line.
2.
@@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
..
+ if (chosen_path && chosen_path != cheapest_partial_path)
+ pa_all_partial_subpaths = false;
It will keep on setting pa_all_partial_subpaths as false for
non-partial paths which don't seem to be the purpose of this variable.
I think you want it to be set even when there is one non-partial path,
so isn't it better to write as below or something similar:
if (pa_nonpartial_subpaths && pa_all_partial_subpaths)
pa_all_partial_subpaths = false;

Ok. How about removing pa_all_partial_subpaths altogether , and
instead of the below condition :

/*
* If all the child rels have partial paths, and if the above Parallel
* Append path has a mix of partial and non-partial subpaths, then consider
* another Parallel Append path which will have *all* partial subpaths.
* If enable_parallelappend is off, make this one non-parallel-aware.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
......

Use this condition :
if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
......

----

Regarding a mix of partial and non-partial paths, I feel it always
makes sense for the leader to choose the partial path. If it chooses a
non-partial path, no other worker would be able to help finish that
path. Among the partial paths, whether it chooses the cheapest one or
expensive one does not matter, I think. We have the partial paths
unordered. So whether it starts from the last partial path or the
first partial path should not matter.

Regarding scenario where all paths are non-partial, here is an e.g. :
Suppose we have 4 child paths with costs : 10 5 5 3, and with 2
workers plus one leader. And suppose the leader takes additionally
1/4th of these costs to process the returned tuples.

If leader takes least expensive one (3) :
2 workers will finish 10, 5, 5 in 10 units,
and leader simultaneously chooses the plan with cost 3, and so it
takes 3 + (1/4)(10 + 5 + 5 + 3) = 9 units.
So the total time taken by Append is : 10.

Whereas if leader takes most expensive one (10) :
10 + .25 (total) = 10 + 6 = 16
The 2 workers will finish 2nd, 3rd and 4th plan (5,5,3) in 8 units.
and simultaneously leader will finish 1st plan (10) in 10 units, plus
tuple processing cost i.e. 10 + (1/4)(10 + 5 + 5 + 3) = 15 units.
So the total time taken by Append is : 15.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#112

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#110)

Re: Parallel Append implementation

On Thu, Oct 5, 2017 at 6:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but can't we try to pick the cheapest partial path for master
backend and maybe master backend can try to work on a partial path
which is already picked up by some worker.

Well, the master backend is typically going to be the first process to
arrive at the Parallel Append because it's already running, whereas
the workers have to start. So in the case that really matters, no
paths will have been picked yet. Later on, we could have the master
try to choose a path on which some other worker is already working,
but I really doubt that's optimal. Either the workers are generating
a lot of tuples (in which case the leader probably won't do much work
on any path because it will be busy reading tuples) or they are
generating only a few tuples (in which case the leader is probably
better off working on an a path not yet chosen, to reduce contention).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#113

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#112)

Re: Parallel Append implementation

On Thu, Oct 5, 2017 at 6:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 5, 2017 at 6:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but can't we try to pick the cheapest partial path for master
backend and maybe master backend can try to work on a partial path
which is already picked up by some worker.

Well, the master backend is typically going to be the first process to
arrive at the Parallel Append because it's already running, whereas
the workers have to start.

Sure, the leader can pick the cheapest partial path to start with.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#114

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#111)

Re: Parallel Append implementation

On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ok. How about removing pa_all_partial_subpaths altogether , and
instead of the below condition :

/*
* If all the child rels have partial paths, and if the above Parallel
* Append path has a mix of partial and non-partial subpaths, then consider
* another Parallel Append path which will have *all* partial subpaths.
* If enable_parallelappend is off, make this one non-parallel-aware.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
......

Use this condition :
if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
......

Sounds good to me.

One minor point:

+ if (!node->as_padesc)
+ {
+ /*
+ */
+ if (!exec_append_seq_next(node))
+ return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ }

It seems either you want to add a comment in above part of patch or
you just left /**/ mistakenly.

----

Regarding a mix of partial and non-partial paths, I feel it always
makes sense for the leader to choose the partial path.

Okay, but why not cheapest partial path?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#115

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#114)

Re: Parallel Append implementation

On 6 October 2017 at 08:49, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ok. How about removing pa_all_partial_subpaths altogether , and
instead of the below condition :

/*
* If all the child rels have partial paths, and if the above Parallel
* Append path has a mix of partial and non-partial subpaths, then consider
* another Parallel Append path which will have *all* partial subpaths.
* If enable_parallelappend is off, make this one non-parallel-aware.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
......

Use this condition :
if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
......

Sounds good to me.

One minor point:
+ if (!node->as_padesc)
+ {
+ /*
+ */
+ if (!exec_append_seq_next(node))
+ return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ }
It seems either you want to add a comment in above part of patch or
you just left /**/ mistakenly.

Oops. Yeah, the comment wrapper remained there when I moved its
content "This is Parallel-aware append. Follow it's own logic ..." out
of the if block. Since this is too small a change for an updated
patch, I will do this along with any other changes that would be
required as the review progresses.

----

Regarding a mix of partial and non-partial paths, I feel it always
makes sense for the leader to choose the partial path.

Okay, but why not cheapest partial path?

I gave some thought on this point. Overall I feel it does not matter
which partial path it should pick up. RIght now the partial paths are
not ordered. But for non-partial paths sake, we are just choosing the
very last path. So in case of mixed paths, leader will get a partial
path, but that partial path would not be the cheapest path. But if we
also order the partial paths, the same logic would then pick up
cheapest partial path. The question is, should we also order the
partial paths for the leader ?

The only scenario I see where leader choosing cheapest partial path
*might* show some benefit, is if there are some partial paths that
need to do some startup work using only one worker. I think currently,
parallel hash join is one case where it builds the hash table, but I
guess here also, we support parallel hash build, but not sure about
the status. For such plan, if leader starts it, it would be slow, and
no other worker would be able to help it, so its actual startup cost
would be drastically increased. (Another path is parallel bitmap heap
scan where the leader has to do something and the other workers wait.
But here, I think it's not much work for the leader to do). So
overall, to handle such cases, it's better for leader to choose a
cheapest path, or may be, a path with cheapest startup cost. We can
also consider sorting partial paths with decreasing startup cost.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#116

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#115)

Re: Parallel Append implementation

On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 6 October 2017 at 08:49, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but why not cheapest partial path?

I gave some thought on this point. Overall I feel it does not matter
which partial path it should pick up. RIght now the partial paths are
not ordered. But for non-partial paths sake, we are just choosing the
very last path. So in case of mixed paths, leader will get a partial
path, but that partial path would not be the cheapest path. But if we
also order the partial paths, the same logic would then pick up
cheapest partial path. The question is, should we also order the
partial paths for the leader ?

The only scenario I see where leader choosing cheapest partial path
*might* show some benefit, is if there are some partial paths that
need to do some startup work using only one worker. I think currently,
parallel hash join is one case where it builds the hash table, but I
guess here also, we support parallel hash build, but not sure about
the status.

You also need to consider how merge join is currently work in parallel
(each worker need to perform the whole of work of right side). I
think there could be more scenario's where the startup cost is high
and parallel worker needs to do that work independently.

For such plan, if leader starts it, it would be slow, and

no other worker would be able to help it, so its actual startup cost
would be drastically increased. (Another path is parallel bitmap heap
scan where the leader has to do something and the other workers wait.
But here, I think it's not much work for the leader to do). So
overall, to handle such cases, it's better for leader to choose a
cheapest path, or may be, a path with cheapest startup cost. We can
also consider sorting partial paths with decreasing startup cost.

Yeah, that sounds reasonable.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#117

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#116)

1 attachment(s)

Re: Parallel Append implementation

On 9 October 2017 at 16:03, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 6 October 2017 at 08:49, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but why not cheapest partial path?

I gave some thought on this point. Overall I feel it does not matter
which partial path it should pick up. RIght now the partial paths are
not ordered. But for non-partial paths sake, we are just choosing the
very last path. So in case of mixed paths, leader will get a partial
path, but that partial path would not be the cheapest path. But if we
also order the partial paths, the same logic would then pick up
cheapest partial path. The question is, should we also order the
partial paths for the leader ?

The only scenario I see where leader choosing cheapest partial path
*might* show some benefit, is if there are some partial paths that
need to do some startup work using only one worker. I think currently,
parallel hash join is one case where it builds the hash table, but I
guess here also, we support parallel hash build, but not sure about
the status.

You also need to consider how merge join is currently work in parallel
(each worker need to perform the whole of work of right side).

Yes, here if the leader happens to take the right side, it may slow
down the overall merge join. But this seems to be a different case
than the case of high startup costs.

I think there could be more scenario's where the startup cost is high
and parallel worker needs to do that work independently.

True.

For such plan, if leader starts it, it would be slow, and

no other worker would be able to help it, so its actual startup cost
would be drastically increased. (Another path is parallel bitmap heap
scan where the leader has to do something and the other workers wait.
But here, I think it's not much work for the leader to do). So
overall, to handle such cases, it's better for leader to choose a
cheapest path, or may be, a path with cheapest startup cost. We can
also consider sorting partial paths with decreasing startup cost.

Yeah, that sounds reasonable.

Attached patch sorts partial paths by descending startup cost.

On 6 October 2017 at 08:49, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ok. How about removing pa_all_partial_subpaths altogether , and
instead of the below condition :

/*
* If all the child rels have partial paths, and if the above Parallel
* Append path has a mix of partial and non-partial subpaths, then consider
* another Parallel Append path which will have *all* partial subpaths.
* If enable_parallelappend is off, make this one non-parallel-aware.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
......

Use this condition :
if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
......

Sounds good to me.

Did this. Here is the new condition I used along with the comments
explaining it :

+        * If parallel append has not been added above, or the added
one has a mix
+        * of partial and non-partial subpaths, then consider another Parallel
+        * Append path which will have *all* partial subpaths. We can add such a
+        * path only if all childrels have partial paths in the first
place. This
+        * new path will be parallel-aware unless enable_parallelappend is off.
         */
-       if (partial_subpaths_valid && !pa_all_partial_subpaths)
+       if (partial_subpaths_valid &&
+               (!pa_subpaths_valid || pa_nonpartial_subpaths != NIL))

Also added some test scenarios.

On 6 October 2017 at 12:03, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 6 October 2017 at 08:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
One minor point:
+ if (!node->as_padesc)
+ {
+ /*
+ */
+ if (!exec_append_seq_next(node))
+ return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ }
It seems either you want to add a comment in above part of patch or
you just left /**/ mistakenly.
Oops. Yeah, the comment wrapper remained there when I moved its
content "This is Parallel-aware append. Follow it's own logic ..." out
of the if block.

Removed the comment wrapper.

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v17.patchapplication/octet-stream; name=ParallelAppend_v17.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b012a26..a917667 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3632,6 +3632,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallelappend" xreflabel="enable_parallelappend">
+      <term><varname>enable_parallelappend</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallelappend</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 18fb9c2..8add2fa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1117,6 +1117,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</></entry>
          <entry><literal>relation</></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed..2e54230 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -699,6 +710,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecBitmapHeapReInitializeDSM((BitmapHeapScanState *) planstate,
 											  pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_SortState:
 			/* even when not parallel-aware */
 			ExecSortReInitializeDSM((SortState *) planstate, pcxt);
@@ -969,6 +984,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..2bc95d5 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,46 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_first_plan;	/* plan to choose while wrapping around plans */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished : workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new
+	 * worker picks this subplan. For non-partial subplan, a worker which picks
+	 * up that subplan should immediately set to true, so as to make sure
+	 * there are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
+
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
 
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
+static int exec_append_get_next_plan(int curplan, int first_plan,
+									  int last_plan);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +110,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +230,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +249,14 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * Check if we are already finished plans from parallel append. This
+	 * can happen if all the subplans are finished when this worker
+	 * has not even started returning tuples.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +285,29 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
+		 * Note: Parallel-aware Append follows different logic for choosing the
+		 * next subplan.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -299,5 +365,290 @@ ExecReScanAppend(AppendState *node)
 			ExecReScan(subnode);
 	}
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc = node->as_padesc;
+
+	padesc->pa_first_plan = padesc->pa_next_plan = 0;
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+
+	/* Choose the first subplan to be executed. */
+	(void) exec_append_parallel_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared field 'pa_next_plan' to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this field to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!IsParallelWorker())
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one. In the
+	 * process, also keep track of the first unfinished non-partial subplan. As
+	 * the non-partial subplans are taken one by one, the first unfinished
+	 * subplan index will shift ahead, so that we don't have to visit the
+	 * finished non-partial ones anymore.
+	 */
+
+	found = false;
+	for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
+	{
+		/*
+		 * Ignore plans that are already done processing. These also include
+		 * non-partial subplans which have already been taken by a worker.
+		 */
+		if (!padesc->pa_finished[whichplan])
+		{
+			found = true;
+			break;
+		}
+
+		/*
+		 * Note: There is a chance that just after the child plan node is
+		 * chosen above, some other worker finishes this node and sets
+		 * pa_finished to true. In that case, this worker will go ahead and
+		 * call ExecProcNode(child_node), which will return NULL tuple since it
+		 * is already finished, and then once again this worker will try to
+		 * choose next subplan; but this is ok : it's just an extra
+		 * "choose_next_subplan" operation.
+		 */
+
+		/* Either go to the next plan, or wrap around to the first one */
+		whichplan = exec_append_get_next_plan(whichplan, padesc->pa_first_plan,
+								   state->as_nplans - 1);
+
+		/*
+		 * If we have wrapped around and returned to the same index again, we
+		 * are done scanning.
+		 */
+		if (whichplan == initial_plan)
+			break;
+	}
+
+	if (!found)
+	{
+		/*
+		 * We didn't find any plan to execute, stop executing, and indicate
+		 * the same for other workers to know that there is no next plan.
+		 */
+		padesc->pa_next_plan = state->as_whichplan = PA_INVALID_PLAN;
+	}
+	else
+	{
+		/*
+		 * If this a non-partial plan, immediately mark it finished, and shift
+		 * ahead pa_first_plan.
+		 */
+		if (whichplan < first_partial_plan)
+		{
+			padesc->pa_finished[whichplan] = true;
+			padesc->pa_first_plan = whichplan + 1;
+		}
+
+		/*
+		 * Set the chosen plan, and the next plan to be picked by other
+		 * workers.
+		 */
+		state->as_whichplan = whichplan;
+		padesc->pa_next_plan = exec_append_get_next_plan(whichplan,
+														 padesc->pa_first_plan,
+														 state->as_nplans - 1);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader.
+ *		With more workers, the leader is known to do more work servicing the
+ *		worker tuple queue, and less work contributing to parallel processing.
+ *		Hence, it should not take expensive plans, otherwise it will affect the
+ *		total time to finish Append. Since we have non-partial plans sorted in
+ *		descending cost, let the leader scan backwards from the last plan, i.e.
+ *		the cheapest plan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		first_plan;
+	int		whichplan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* The parallel leader should start from the last subplan. */
+	first_plan = padesc->pa_first_plan;
+
+	for (whichplan = state->as_nplans - 1; whichplan >= first_plan;
+		 whichplan--)
+	{
+		if (!padesc->pa_finished[whichplan])
+		{
+			/* If this a non-partial plan, immediately mark it finished */
+			if (whichplan < first_partial_plan)
+				padesc->pa_finished[whichplan] = true;
+
+			break;
+		}
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	/* Return false only if we didn't find any plan to execute */
+	if (whichplan < first_plan)
+	{
+		state->as_whichplan = PA_INVALID_PLAN;
+		return false;
+	}
+	else
+	{
+		state->as_whichplan = whichplan;
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_get_next_plan
+ *
+ *		Either go to the next index, or wrap around to the first unfinished one.
+ *		Returns this next index. While wrapping around, if the first unfinished
+ *		one itself is past the last plan, returns PA_INVALID_PLAN.
+ * ----------------------------------------------------------------
+ */
+static int
+exec_append_get_next_plan(int curplan, int first_plan, int last_plan)
+{
+	Assert(curplan <= last_plan);
+
+	if (curplan < last_plan)
+		return curplan + 1;
+	else
+	{
+		/*
+		 * We are already at the last plan. If the first_plan itsef is the last
+		 * plan or if it is past the last plan, that means there is no next
+		 * plan remaining. Return Invalid.
+		 */
+		if (first_plan >= last_plan)
+			return PA_INVALID_PLAN;
+
+		return first_plan;
+	}
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c1a83ca..5852484 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2532edc..b525866 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -394,6 +394,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 07ba691..5e7b4a2 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1593,6 +1593,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5535b63..8366b4c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -102,6 +102,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1310,7 +1313,10 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * non-dummy children. For every such parameterization or ordering, it creates
  * an append path collecting one path from each non-dummy child with given
  * parameterization or ordering. Similarly it collects partial paths from
- * non-dummy children to create partial append paths.
+ * non-dummy children to create partial append paths. Furthermore, it creates
+ * a parallel-aware partial Append path that can contain a mix of partial and
+ * non-partial paths of its children.
+ *
  */
 static void
 add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
@@ -1319,7 +1325,10 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallelappend;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1401,7 +1410,62 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallelappend)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1479,35 +1543,47 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths,
+												  true);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * If parallel append has not been added above, or the added one has a mix
+	 * of partial and non-partial subpaths, then consider another Parallel
+	 * Append path which will have *all* partial subpaths. We can add such a
+	 * path only if all childrels have partial paths in the first place. This
+	 * new path will be parallel-aware unless enable_parallelappend is off.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid &&
+		(!pa_subpaths_valid || pa_nonpartial_subpaths != NIL))
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths,
+												  NIL,
+												  enable_parallelappend);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers,
+										enable_parallelappend,
+										partitioned_rels);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1560,7 +1636,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1778,6 +1855,69 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		partial_subpaths = list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/* We don't create partial MergeAppend path */
+		Assert(!is_partial);
+
+		/*
+		 * Since MergePath itself is non-partial, treat all its subpaths
+		 * non-partial.
+		 */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(mpath->subpaths));
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			partial_subpaths = lappend(partial_subpaths, subpath);
+		else
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+	}
+
+	return partial_subpaths;
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1797,7 +1937,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a4..639ebd9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallelappend = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1742,6 +1745,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c5..9f457d4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 792ea84..d5a2220 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1050,7 +1051,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5281,7 +5283,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5292,6 +5294,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd728..446d6f6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3655,8 +3655,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c84a2c..6ea7029 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d491eb..454a09d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
+static int append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1201,6 +1203,82 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths,
+					   List *nonpartial_subpaths,
+					   bool parallel_aware)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/*
+	 * For non-parallel-aware Append, all workers run a common subplan at a
+	 * time, so it makes sense to just choose the per-subplan max workers as
+	 * the Append workers. For parallel-aware Append, choose the higher of the
+	 * results of the two formulae.
+	 */
+	num_workers = (parallel_aware ?
+				   rint(Max(log2w, max_per_plan_workers) + 1)
+				   : max_per_plan_workers);
+
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	Assert(num_workers > 0);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1208,8 +1286,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1219,44 +1300,83 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	/* For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
 /*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
  *	  pathnode.
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f9..f2b4474 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ae22185..46de6f7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallelappend", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallelappend,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 368b280..601f1c3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallelappend = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..78e5943 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index c461134..2f6f900 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -998,12 +999,15 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331..1678497 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cef..ec5da88 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 306d923..d65b66e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallelappend;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16a..51b5096 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,14 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+					   List *nonpartial_subpaths,
+					   bool parallel_aware);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed..e190155 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faf..79ea775 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallelappend;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..0efa14c 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,13 +11,38 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallelappend to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
          ->  Partial Aggregate
                ->  Append
                      ->  Parallel Seq Scan on a_star
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallelappend to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallelappend to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallelappend;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f3..150b316 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallelappend      | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc..f51e72b 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallelappend = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallelappend;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..1bf96dd 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
-explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+-- test Parallel Append.
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallelappend to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallelappend to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallelappend to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallelappend;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#118

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#117)

Re: Parallel Append implementation

On Wed, Oct 11, 2017 at 8:51 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

[ new patch ]

+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>

Probably needs to update a morerows values of some earlier entry.

+ <primary><varname>enable_parallelappend</> configuration
parameter</primary>

How about enable_parallel_append?

+ * pa_finished : workers currently executing the subplan. A worker which

The way the colon is used here is not a standard comment style for PostgreSQL.

+         * Go on to the "next" subplan. If no more subplans, return the empty
+         * slot set up for us by ExecInitAppend.
+         * Note: Parallel-aware Append follows different logic for choosing the
+         * next subplan.

Formatting looks wrong, and moreover I don't think this is the right
way of handling this comment anyway. Move the existing comment inside
the if (!node->padesc) block and leave it unchanged; the else block
explains the differences for parallel append.

+ *        ExecAppendEstimate
+ *
+ *        estimates the space required to serialize Append node.

Ugh, this is wrong, but I notice it follows various other
equally-wrong comments for other parallel-aware node types. I guess
I'll go fix that. We are not in serializing the Append node.

I do not think that it's a good idea to call
exec_append_parallel_next() from ExecAppendInitializeDSM,
ExecAppendReInitializeDSM, and ExecAppendInitializeWorker. We want to
postpone selecting which plan to run until we're actually ready to run
that plan. Otherwise, for example, the leader might seize a
non-partial plan (if only such plans are included in the Parallel
Append) when it isn't really necessary for it to do so. If the
workers would've reached the plans and started returning tuples to the
leader before it grabbed a plan, oh well, too bad. The leader's still
claimed that plan and must now run it.

I concede that's not a high-probability scenario, but I still maintain
that it is better for processes not to claim a subplan until the last
possible moment. I think we need to initialize as_whichplan as
PA_INVALID plan and then fix it when ExecProcNode() is called for the
first time.

+ if (!IsParallelWorker())

This is not a great test, because it would do the wrong thing if we
ever allowed an SQL function called from a parallel worker to run a
parallel query of its own. Currently that's not allowed but we might
want to allow it someday. What we really want to test is whether
we're the leader for *this* query. Maybe use a flag in the
AppendState for that, and set it correctly in
ExecAppendInitializeWorker.

I think maybe the loop in exec_append_parallel_next should look more like this:

/* Pick the next plan. */
state->as_whichplan = padesc->pa_nextplan;
if (state->as_whichplan != PA_INVALID_PLAN)
{
int nextplan = state->as_whichplan;

/* Mark non-partial plans done immediately so that they can't be
picked again. */
if (nextplan < first_partial_plan)
padesc->pa_finished[nextplan] = true;

/* Figure out what plan the next worker should pick. */
do
{
/* If we've run through all the plans, loop back through
partial plans only. */
if (++nextplan >= state->as_nplans)
nextplan = first_partial_plan;

/* No plans remaining or tried them all? Then give up. */
if (nextplan == state->as_whichplan || nextplan >= state->as_nplans)
{
nextplan = PA_INVALID_PLAN;
break;
}
} while (padesc->pa_finished[nextplan]);

/* Store calculated next plan back into shared memory. */
padesc->pa_next_plan = nextplan;
}

This might not be exactly right and the comments may need work, but
here are a couple of points:

- As you have it coded, the loop exit condition is whichplan !=
PA_INVALID_PLAN, but that's probably an uncommon case and you have two
other ways out of the loop. It's easier to understand the code if the
loop condition corresponds to the most common way of exiting the loop,
and any break is for some corner case.

- Don't need a separate call to exec_append_get_next_plan; it's all
handled here (and, I think, pretty compactly).

- No need for pa_first_plan any more. Looping back to
first_partial_plan is a fine substitute, because by the time anybody
loops around, pa_first_plan would equal first_partial_plan anyway
(unless there's a bug).

- In your code, the value in shared memory is the point at which to
start the search for the next plan. Here, I made it the value that
the next worker should adopt without question. Another option would
be to make it the value that the last worker adopted. I think that
either that option or what I did above are slightly better than what
you have, because as you have it, you've got to use the
increment-with-looping logic in two different places whereas either of
those options only need it in one place, which makes this a little
simpler.

None of this is really a big deal I suppose but I find the logic here
rather sprawling right now and I think we should try to tighten it up
as much as possible.

I only looked over the executor changes on this pass, not the planner stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#119

Amit Khandekar

amitdkhan.pg@gmail.com

about 8 years ago

In reply to: Robert Haas (#118)

1 attachment(s)

Re: Parallel Append implementation

On 13 October 2017 at 00:29, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 11, 2017 at 8:51 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

[ new patch ]
+         <entry><literal>parallel_append</></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
Probably needs to update a morerows values of some earlier entry.

From what I observed from the other places, the morerows value is one
less than the number of following entries. I have changed it to 10
since it has 11 entries.

+ <primary><varname>enable_parallelappend</> configuration
parameter</primary>

How about enable_parallel_append?

Done.

+ * pa_finished : workers currently executing the subplan. A worker which

The way the colon is used here is not a standard comment style for PostgreSQL.

Changed it to "pa_finished:".

+         * Go on to the "next" subplan. If no more subplans, return the empty
+         * slot set up for us by ExecInitAppend.
+         * Note: Parallel-aware Append follows different logic for choosing the
+         * next subplan.
Formatting looks wrong, and moreover I don't think this is the right
way of handling this comment anyway. Move the existing comment inside
the if (!node->padesc) block and leave it unchanged; the else block
explains the differences for parallel append.

I think the first couple of lines do apply to both parallel-append and
sequential append plans. I have moved the remaining couple of lines
inside the else block.

+ *        ExecAppendEstimate
+ *
+ *        estimates the space required to serialize Append node.
Ugh, this is wrong, but I notice it follows various other
equally-wrong comments for other parallel-aware node types. I guess
I'll go fix that. We are not in serializing the Append node.

I didn't clealy get this. Do you think it should be "space required to
copy the Append node into the shared memory" ?

I do not think that it's a good idea to call
exec_append_parallel_next() from ExecAppendInitializeDSM,
ExecAppendReInitializeDSM, and ExecAppendInitializeWorker. We want to
postpone selecting which plan to run until we're actually ready to run
that plan. Otherwise, for example, the leader might seize a
non-partial plan (if only such plans are included in the Parallel
Append) when it isn't really necessary for it to do so. If the
workers would've reached the plans and started returning tuples to the
leader before it grabbed a plan, oh well, too bad. The leader's still
claimed that plan and must now run it.

I concede that's not a high-probability scenario, but I still maintain
that it is better for processes not to claim a subplan until the last
possible moment. I think we need to initialize as_whichplan as
PA_INVALID plan and then fix it when ExecProcNode() is called for the
first time.

Done. Set as_whichplan to PA_INVALID_PLAN in
ExecAppendInitializeDSM(), ExecAppendReInitializeDSM() and
ExecAppendInitializeWorker(). Then when ExecAppend() is called for the
first time, we notice that as_whichplan is PA_INVALID_PLAN, that means
we need to choose the plan.

+ if (!IsParallelWorker())

This is not a great test, because it would do the wrong thing if we
ever allowed an SQL function called from a parallel worker to run a
parallel query of its own. Currently that's not allowed but we might
want to allow it someday. What we really want to test is whether
we're the leader for *this* query. Maybe use a flag in the
AppendState for that, and set it correctly in
ExecAppendInitializeWorker.

Done. Set a new AppendState->is_parallel_worker field to true in
ExecAppendInitializeWorker().

I think maybe the loop in exec_append_parallel_next should look more like this:

/* Pick the next plan. */
state->as_whichplan = padesc->pa_nextplan;
if (state->as_whichplan != PA_INVALID_PLAN)
{
int nextplan = state->as_whichplan;

/* Mark non-partial plans done immediately so that they can't be
picked again. */
if (nextplan < first_partial_plan)
padesc->pa_finished[nextplan] = true;

/* Figure out what plan the next worker should pick. */
do
{
/* If we've run through all the plans, loop back through
partial plans only. */
if (++nextplan >= state->as_nplans)
nextplan = first_partial_plan;

/* No plans remaining or tried them all? Then give up. */
if (nextplan == state->as_whichplan || nextplan >= state->as_nplans)
{
nextplan = PA_INVALID_PLAN;
break;
}
} while (padesc->pa_finished[nextplan]);

/* Store calculated next plan back into shared memory. */
padesc->pa_next_plan = nextplan;
}

This might not be exactly right and the comments may need work, but
here are a couple of points:

- As you have it coded, the loop exit condition is whichplan !=
PA_INVALID_PLAN, but that's probably an uncommon case and you have two
other ways out of the loop. It's easier to understand the code if the
loop condition corresponds to the most common way of exiting the loop,
and any break is for some corner case.

- Don't need a separate call to exec_append_get_next_plan; it's all
handled here (and, I think, pretty compactly).

Got rid of exec_append_get_next_plan() and having to do that logic twice.

- No need for pa_first_plan any more. Looping back to
first_partial_plan is a fine substitute, because by the time anybody
loops around, pa_first_plan would equal first_partial_plan anyway
(unless there's a bug).

Yeah, I agree. Got rid of pa_first_plan.

- In your code, the value in shared memory is the point at which to
start the search for the next plan. Here, I made it the value that
the next worker should adopt without question.

I was considering this option, but found out that we *have* to return
from exec_append_parallel_next() with this next worker chosen. Now if
the leader happens to reach this plan and finish it, and then for the
workers the padesc->pa_next_plan happens to point to this same plan,
we need to return some other plan.

Another option would
be to make it the value that the last worker adopted.

Here, we need to think of an initial value of pa_next_plan when
workers haven't yet started. It can be PA_INVALID_PLAN, but I felt
this does not convey clearly whether none of the plans have started
yet, or all plans have ended.

I think that
either that option or what I did above are slightly better than what
you have, because as you have it, you've got to use the
increment-with-looping logic in two different places whereas either of
those options only need it in one place, which makes this a little
simpler.

The way I have now used the logic more or less looks like the code you
showed above. The differences are :

The padesc->pa_next_plan still points to the plan from which to search
for an unfinished plan. But what's changed is : I keep track of
whichplan and also a nextplan position while searching for the plan.
So even if we find an unfinished plan, there will be a nextplan
pointing appropriately. If the whichplan is a finished one, in the
next iteration nextplan value is assigned to whichplan. This way I
avoided having to separately call the wrap-around logic again outside
of the search loop.

Another additional logic added is : While searching, if whichplan
still points to a non-partial plan, and the backend has already
finished the partial plans and the remaining non-partial plan, then
this condition is not enough to break out of the loop :

if (++nextplan >= state->as_nplans)
nextplan = first_partial_plan;
/* No plans remaining or tried them all? Then give up. */
if (nextplan == state->as_whichplan || nextplan >= state->as_nplans)
{
nextplan = PA_INVALID_PLAN;
break;
}

This is because, the initial plan with which we started is a
non-partial plan. So above, nextplan never becomes state->as_whichplan
because state->as_whichplan would always be lesser than
first_partial_plan.

So I have split the break condition into two conditions, one of which
is for wrap-around case :

if (whichplan + 1 == state->as_nplans)
{
nextplan = first_partial_plan;
/*
* If we had started from a non-partial plan, that means we have
* searched all the nonpartial and partial plans.
*/
if (initial_plan <= first_partial_plan)
break;
}
else
{
nextplan = whichplan + 1;

/* Have we made a full circle ? */
if (nextplan == initial_plan)
break;
}

Also, we need to consider the possibility that the next plan to be
chosen can even be the same plan that we have started with. This
happens when there is only one unfinished partial plan remaining. So
we should not unconditionally do "nextplan = PA_INVALID_PLAN" if
(nextplan == state->as_whichplan). The changes in the patch considers
this (this was considered also in the previous versions).

Where we set node->as_padesc->pa_finished to true for a partial plan,
I have wrapped it with LWLock lock and release calls. This is
especially because now we use this field while deciding whether the
nextplan is to be set to PA_INVALID_PLAN. I guess this might not be
required for correctness, but it looks less safe with pa_finished
value getting changed while we make decisions depending on it.

Attached v18 patch.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

ParallelAppend_v18.patchapplication/octet-stream; name=ParallelAppend_v18.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index aeda826..c6044e1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3632,6 +3632,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallel-append" xreflabel="enable_parallel_append">
+      <term><varname>enable_parallel_append</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallel_append</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6f82033..12a8635 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1117,7 +1117,12 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
-         <entry morerows="9"><literal>Lock</literal></entry>
+         <entry><literal>parallel_append</literal></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
+         <entry morerows="10"><literal>Lock</literal></entry>
          <entry><literal>relation</literal></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
         </row>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 1b477ba..1445dd4 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -699,6 +710,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecBitmapHeapReInitializeDSM((BitmapHeapScanState *) planstate,
 											  pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_SortState:
 			/* even when not parallel-aware */
 			ExecSortReInitializeDSM((SortState *) planstate, pcxt);
@@ -969,6 +984,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8..fafcccf 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -60,10 +60,43 @@
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "storage/spin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
+/*
+ * Shared state for Parallel Append.
+ *
+ * Each backend participating in a Parallel Append has its own
+ * descriptor in backend-private memory, and those objects all contain
+ * a pointer to this structure.
+ */
+typedef struct ParallelAppendDescData
+{
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_next_plan;	/* next plan to choose by any worker */
+
+	/*
+	 * pa_finished: workers currently executing the subplan. A worker which
+	 * finishes a subplan should set pa_finished to true, so that no new worker
+	 * picks this subplan. For non-partial subplan, a worker which picks up
+	 * that subplan should immediately set to true, so as to make sure there
+	 * are no more than 1 worker assigned to this subplan.
+	 */
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;
 
+typedef ParallelAppendDescData *ParallelAppendDesc;
+
+/*
+ * Special value of AppendState->as_whichplan for Parallel Append, which
+ * indicates there are no plans left to be executed.
+ */
+#define PA_INVALID_PLAN -1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool exec_append_seq_next(AppendState *appendstate);
+static bool exec_append_parallel_next(AppendState *state);
+static bool exec_append_leader_next(AppendState *state);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -74,11 +107,20 @@ static bool exec_append_initialize_next(AppendState *appendstate);
  * ----------------------------------------------------------------
  */
 static bool
-exec_append_initialize_next(AppendState *appendstate)
+exec_append_seq_next(AppendState *appendstate)
 {
 	int			whichplan;
 
 	/*
+	 * Not parallel-aware. Fine, just go on to the next subplan in the
+	 * appropriate direction.
+	 */
+	if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
+		appendstate->as_whichplan++;
+	else
+		appendstate->as_whichplan--;
+
+	/*
 	 * get information from the append node
 	 */
 	whichplan = appendstate->as_whichplan;
@@ -185,10 +227,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Initialize to scan first subplan (but note that we'll override this
+	 * later in the case of a parallel append).
 	 */
 	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
 
 	return appendstate;
 }
@@ -204,6 +246,16 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/*
+	 * If this is the first time we are executing a Parallel Append node,
+	 * we need to choose a subplan first.
+	 */
+	if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
+	{
+		if (!exec_append_parallel_next(node))
+			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+	}
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -232,16 +284,34 @@ ExecAppend(PlanState *pstate)
 		}
 
 		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
+		 * Go on to the "next" subplan. If no more subplans, return the empty
+		 * slot set up for us by ExecInitAppend.
 		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
+		if (!node->as_padesc)
+		{
+			if (!exec_append_seq_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			/*
+			 * Parallel-aware Append follows different logic for choosing the
+			 * next subplan.
+			 */
+
+			/*
+			 * We are done with this subplan. There might be other workers
+			 * still processing the last chunk of rows for this same subplan,
+			 * but there's no point for new workers to run this subplan, so
+			 * mark this subplan as finished.
+			 */
+			LWLockAcquire(&node->as_padesc->pa_lock, LW_EXCLUSIVE);
+			node->as_padesc->pa_finished[node->as_whichplan] = true;
+			LWLockRelease(&node->as_padesc->pa_lock);
+
+			if (!exec_append_parallel_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		}
 
 		/* Else loop back and try to get a tuple from the new subplan */
 	}
@@ -299,5 +369,265 @@ ExecReScanAppend(AppendState *node)
 			ExecReScan(subnode);
 	}
 	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		estimates the space required to serialize Append node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pappend_len =
+		add_size(offsetof(struct ParallelAppendDescData, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pappend_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up a Parallel Append descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc;
+
+	padesc = shm_toc_allocate(pcxt->toc, node->pappend_len);
+
+	/*
+	 * Just setting all the fields to 0 is enough. The logic of choosing the
+	 * next plan in workers will take care of everything else.
+	 */
+	memset(padesc, 0, sizeof(ParallelAppendDescData));
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+
+	LWLockInitialize(&padesc->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+
+	node->as_padesc = padesc;
+	node->as_whichplan = PA_INVALID_PLAN;
+
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, padesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendDesc padesc = node->as_padesc;
+
+	padesc->pa_next_plan = 0;
+	memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+	node->as_whichplan = PA_INVALID_PLAN;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_padesc = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+	node->as_whichplan = PA_INVALID_PLAN;
+	node->is_parallel_worker = true;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_parallel_next
+ *
+ *		Determine the next subplan that should be executed. Each worker uses a
+ *		shared field 'pa_next_plan' to start looking for unfinished plan,
+ *		executes the subplan, then shifts ahead this field to the next
+ *		subplan, so that other workers know which next plan to choose. This
+ *		way, workers choose the subplans in round robin order, and thus they
+ *		get evenly distributed among the subplans.
+ *
+ *		Returns false if and only if all subplans are already finished
+ *		processing.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_parallel_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int		nextplan;
+	int		initial_plan;
+	int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+	bool	found;
+
+	Assert(padesc != NULL);
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(state->ps.state->es_direction));
+
+	/* The parallel leader chooses its next subplan differently */
+	if (!state->is_parallel_worker)
+		return exec_append_leader_next(state);
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	/* If all the plans are already done, we have nothing to do */
+	if (padesc->pa_next_plan == PA_INVALID_PLAN)
+	{
+		LWLockRelease(&padesc->pa_lock);
+		return false;
+	}
+
+	/* Make a note of which subplan we have started with */
+	initial_plan = nextplan = padesc->pa_next_plan;
+
+	/*
+	 * Keep going to the next plan until we find an unfinished one or we made a
+	 * full circle. Finished ones also include non-partial subplans which are
+	 * already taken by a worker.
+	 */
+	do
+	{
+		whichplan = nextplan;
+		/*
+		 * Either go to the next plan, or if we are at the last plan, wrap
+		 * around to the first partial one.  We don't have to go back to the
+		 * non-partial plans.  Due to the round-robin traversal, the fact that
+		 * we are wrapping around means that all the non-partial plans are
+		 * already taken.
+		 */
+		if (whichplan + 1 == state->as_nplans)
+		{
+			nextplan = first_partial_plan;
+			/*
+			 * If we had started from a non-partial plan, that means we have
+			 * searched all the nonpartial and partial plans.
+			 */
+			if (initial_plan <= first_partial_plan)
+				break;
+		}
+		else
+		{
+			nextplan = whichplan + 1;
+
+			/* Have we made a full circle ? */
+			if (nextplan == initial_plan)
+				break;
+		}
+	} while (padesc->pa_finished[whichplan]);
+
+	/*
+	 * Note: There is a chance that just after the child plan node is chosen
+	 * above, some other worker finishes this node and sets pa_finished to
+	 * true. In that case, this worker will go ahead and call
+	 * ExecProcNode(child_node), which will return NULL tuple since it is
+	 * already finished, and then once again this worker will try to choose
+	 * next subplan; but this is ok : it's just an extra "choose_next_subplan"
+	 * operation.
+	 */
+
+	Assert(0 <= whichplan && whichplan < state->as_nplans);
+	found = !padesc->pa_finished[whichplan];
+
+	/* If we found no plans, indicate the same to other workers */
+	if (!found)
+		state->as_whichplan = padesc->pa_next_plan = PA_INVALID_PLAN;
+	else
+	{
+		/* Set the chosen plan */
+		state->as_whichplan = whichplan;
+
+		/* If this a non-partial plan, immediately mark it finished */
+		if (whichplan < first_partial_plan)
+			padesc->pa_finished[whichplan] = true;
+
+		/*
+		 * nextplan can be state->as_nplans if we wrapped around to the first
+		 * partial plan but there were no partial plans.
+		 */
+		padesc->pa_next_plan = (nextplan == state->as_nplans ?
+								PA_INVALID_PLAN : nextplan);
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	return found;
+}
+
+/* ----------------------------------------------------------------
+ *		exec_append_leader_next
+ *
+ *		To be used only if it's a parallel leader.
+ *		With more workers, the leader is known to do more work servicing the
+ *		worker tuple queue, and less work contributing to parallel processing.
+ *		Hence, it should not take expensive plans, otherwise it will affect the
+ *		total time to finish Append. Since we have non-partial plans sorted in
+ *		descending cost, let the leader scan backwards from the last plan, i.e.
+ *		the cheapest plan.
+ * ----------------------------------------------------------------
+ */
+static bool
+exec_append_leader_next(AppendState *state)
+{
+	ParallelAppendDesc padesc = state->as_padesc;
+	int		whichplan;
+	int worker_next_plan;
+	bool	found = false;
+
+	LWLockAcquire(&padesc->pa_lock, LW_EXCLUSIVE);
+
+	worker_next_plan = padesc->pa_next_plan;
+
+	if (worker_next_plan != PA_INVALID_PLAN)
+	{
+		int		first_partial_plan = ((Append *)state->ps.plan)->first_partial_plan;
+		bool	nonpartial_plans_finished = (worker_next_plan >= first_partial_plan);
+
+		/* The parallel leader should start from the last subplan. */
+		for (whichplan = state->as_nplans - 1; whichplan >= 0; whichplan--)
+		{
+			if (!padesc->pa_finished[whichplan])
+			{
+				found = true;
+				/* If this a non-partial plan, immediately mark it finished */
+				if (whichplan < first_partial_plan)
+					padesc->pa_finished[whichplan] = true;
+				break;
+			}
+
+			/*
+			 * If we are into non-partial plans but they are already done, no
+			 * point in going back further.
+			 */
+			if (whichplan < first_partial_plan && nonpartial_plans_finished)
+				break;
+		}
+
+	}
+
+	LWLockRelease(&padesc->pa_lock);
+
+	state->as_whichplan = (found ? whichplan : PA_INVALID_PLAN);
+
+	/* Return false only if we didn't find any plan to execute */
+	return found;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c1a83ca..5852484 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..75761a9 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,45 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the original
+ * list are re-used. Caller has to pass a copy of the list if the original list
+ * needs to be untouched. Effectively, the comparator function is passed
+ * pointers to ListCell* pointers.
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = T_List;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len-1];
+
+	for (i = 0; i < len-1; i++)
+		list_arr[i]->next = list_arr[i+1];
+
+	list_arr[len-1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 43d6206..3236d58 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ccb6a1f..23fcc1b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1600,6 +1600,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5535b63..902ddc1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -102,6 +102,9 @@ static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
 static List *accumulate_append_subpath(List *subpaths, Path *path);
+static List *accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1310,7 +1313,10 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * non-dummy children. For every such parameterization or ordering, it creates
  * an append path collecting one path from each non-dummy child with given
  * parameterization or ordering. Similarly it collects partial paths from
- * non-dummy children to create partial append paths.
+ * non-dummy children to create partial append paths. Furthermore, it creates
+ * a parallel-aware partial Append path that can contain a mix of partial and
+ * non-partial paths of its children.
+ *
  */
 static void
 add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
@@ -1319,7 +1325,10 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallel_append;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1401,7 +1410,62 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
-		/* Same idea, but for a partial plan. */
+		/* Same idea, but for a parallel append path. */
+		if (pa_subpaths_valid && enable_parallel_append)
+		{
+			Path	*chosen_path = NULL;
+			Path	*cheapest_partial_path = NULL;
+			Path 	*cheapest_parallel_safe_path = NULL;
+
+			/*
+			 * Extract the cheapest unparameterized, parallel-safe one among
+			 * the child paths.
+			 */
+			cheapest_parallel_safe_path =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			/* Get the cheapest partial path */
+			if (childrel->partial_pathlist != NIL)
+				cheapest_partial_path = linitial(childrel->partial_pathlist);
+
+			if (!cheapest_parallel_safe_path && !cheapest_partial_path)
+			{
+				/*
+				 * This child rel neither has a partial path, nor has a
+				 * parallel-safe path. Drop the idea for parallel append.
+				 */
+				pa_subpaths_valid = false;
+			}
+			else if (cheapest_partial_path && cheapest_parallel_safe_path)
+			{
+				/* Both are valid. Choose the cheaper out of the two */
+				if (cheapest_parallel_safe_path->total_cost <
+					cheapest_partial_path->total_cost)
+					chosen_path = cheapest_parallel_safe_path;
+				else
+					chosen_path = cheapest_partial_path;
+			}
+			else
+			{
+				/* Either one is valid. Choose the valid one */
+				chosen_path = cheapest_partial_path ?
+								 cheapest_partial_path :
+								 cheapest_parallel_safe_path;
+			}
+
+			/* If we got a valid path, add it */
+			if (chosen_path)
+			{
+				pa_partial_subpaths =
+					accumulate_partialappend_subpath(
+										pa_partial_subpaths,
+										chosen_path,
+										chosen_path == cheapest_partial_path,
+										&pa_nonpartial_subpaths);
+			}
+		}
+
+		/* Same idea, but for a non-parallel partial plan. */
 		if (childrel->partial_pathlist != NIL)
 			partial_subpaths = accumulate_append_subpath(partial_subpaths,
 														 linitial(childrel->partial_pathlist));
@@ -1479,35 +1543,47 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
 												  partitioned_rels));
 
+	/* Consider parallel append path. */
+	if (pa_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		int			parallel_workers;
+
+		parallel_workers = get_append_num_workers(pa_partial_subpaths,
+												  pa_nonpartial_subpaths,
+												  true);
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels);
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * If parallel append has not been added above, or the added one has a mix
+	 * of partial and non-partial subpaths, then consider another Parallel
+	 * Append path which will have *all* partial subpaths. We can add such a
+	 * path only if all childrels have partial paths in the first place. This
+	 * new path will be parallel-aware unless enable_parallel_append is off.
 	 */
-	if (partial_subpaths_valid)
+	if (partial_subpaths_valid &&
+		(!pa_subpaths_valid || pa_nonpartial_subpaths != NIL))
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
+		int			parallel_workers;
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+		parallel_workers = get_append_num_workers(partial_subpaths,
+												  NIL,
+												  enable_parallel_append);
+		appendpath = create_append_path(rel, NIL, partial_subpaths,
+										NULL, parallel_workers,
+										enable_parallel_append,
+										partitioned_rels);
 
-		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1560,7 +1636,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
 										partitioned_rels));
 	}
 }
@@ -1778,6 +1855,69 @@ accumulate_append_subpath(List *subpaths, Path *path)
 }
 
 /*
+ * accumulate_partialappend_subpath:
+ *		Add a subpath to the list being built for a partial Append.
+ *
+ * This is same as accumulate_append_subpath, except that two separate lists
+ * are created, one containing only partial subpaths, and the other containing
+ * only non-partial subpaths. Also, the non-partial paths are kept ordered
+ * by descending total cost.
+ *
+ * is_partial is true if the subpath being added is a partial subpath.
+ */
+static List *
+accumulate_partialappend_subpath(List *partial_subpaths,
+								 Path *subpath, bool is_partial,
+								 List **nonpartial_subpaths)
+{
+	/* list_copy is important here to avoid sharing list substructure */
+
+	if (IsA(subpath, AppendPath))
+	{
+		AppendPath *apath = (AppendPath *) subpath;
+		List	   *apath_partial_paths;
+		List	   *apath_nonpartial_paths;
+
+		/* Split the Append subpaths into partial and non-partial paths */
+		apath_nonpartial_paths = list_truncate(list_copy(apath->subpaths),
+											   apath->first_partial_path);
+		apath_partial_paths = list_copy_tail(apath->subpaths,
+											 apath->first_partial_path);
+
+		/* Add non-partial subpaths, if any. */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(apath_nonpartial_paths));
+
+		/* Add partial subpaths, if any. */
+		partial_subpaths = list_concat(partial_subpaths, apath_partial_paths);
+	}
+	else if (IsA(subpath, MergeAppendPath))
+	{
+		MergeAppendPath *mpath = (MergeAppendPath *) subpath;
+
+		/* We don't create partial MergeAppend path */
+		Assert(!is_partial);
+
+		/*
+		 * Since MergePath itself is non-partial, treat all its subpaths
+		 * non-partial.
+		 */
+		*nonpartial_subpaths = list_concat(*nonpartial_subpaths,
+										   list_copy(mpath->subpaths));
+	}
+	else
+	{
+		/* Just add it to the right list depending upon whether it's partial */
+		if (is_partial)
+			partial_subpaths = lappend(partial_subpaths, subpath);
+		else
+			*nonpartial_subpaths = lappend(*nonpartial_subpaths, subpath);
+	}
+
+	return partial_subpaths;
+}
+
+/*
  * set_dummy_rel_pathlist
  *	  Build a dummy path for a relation that's been excluded by constraints
  *
@@ -1797,7 +1937,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a4..5ae02c8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallel_append = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+								   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1742,6 +1745,189 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Determines and returns the cost of non-partial paths of Append node.
+ *
+ * It is the total cost units taken by all the workers to finish all the
+ * non-partial subpaths.
+ * subpaths contains non-partial paths followed by partial paths.
+ * numpaths tells the number of non-partial paths.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Build the cost array containing costs of first n number of subpaths,
+	 * where n = parallel_workers. Also, its size is kept only as long as the
+	 * number of subpaths, or parallel_workers, whichever is minimum.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	    *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since the subpaths are non-partial paths, the array is initially sorted
+	 * by decreasing cost. So choose the last one for the index with minimum
+	 * cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path    *subpath = (Path *) lfirst(l);
+		int		i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(Path *path, List *subpaths, int num_nonpartial_subpaths)
+{
+	ListCell *l;
+
+	path->rows = 0;
+	path->startup_cost = 0;
+	path->total_cost = 0;
+
+	if (list_length(subpaths) == 0)
+		return;
+
+	if (!path->parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		path->startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			path->rows += subpath->rows;
+			path->total_cost += subpath->total_cost;
+		}
+	}
+	else /* parallel-aware */
+	{
+		double	max_rows = 0;
+		double	nonpartial_rows = 0;
+		int		i = 0;
+
+		/* Include the non-partial paths total cost */
+		path->total_cost += append_nonpartial_cost(subpaths,
+												   num_nonpartial_subpaths,
+												   path->parallel_workers);
+
+		/* Calculate startup cost; also add up all the rows for later use */
+		foreach(l, subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append would start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i < path->parallel_workers)
+			{
+				path->startup_cost = Min(path->startup_cost,
+										 subpath->startup_cost);
+			}
+
+			if (i < num_nonpartial_subpaths)
+			{
+				nonpartial_rows += subpath->rows;
+
+				/* Also keep track of max rows for any given subpath */
+				max_rows = Max(max_rows, subpath->rows);
+			}
+
+			i++;
+		}
+
+		/*
+		 * As an approximation, non-partial rows are calculated as total rows
+		 * divided by number of workers. But if there are highly unequal number
+		 * of rows across the paths, this figure might not reflect correctly.
+		 * So we make a note that it also should not be less than the maximum
+		 * of all the path rows.
+		 */
+		nonpartial_rows /= path->parallel_workers;
+		path->rows += Max(nonpartial_rows, max_rows);
+
+		/* Calculate partial paths cost. */
+		if (list_length(subpaths) > num_nonpartial_subpaths)
+		{
+			/* Compute rows and costs as sums of subplan rows and costs. */
+			for_each_cell(l, list_nth_cell(subpaths, num_nonpartial_subpaths))
+			{
+				Path	   *subpath = (Path *) lfirst(l);
+
+				path->rows += subpath->rows;
+				path->total_cost += subpath->total_cost;
+			}
+		}
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c5..9f457d4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c802d61..93a5296 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+						   List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1050,7 +1051,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5285,7 +5287,7 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan, List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5296,6 +5298,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd728..446d6f6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3655,8 +3655,10 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
+								   false,
 								   NIL);
 			path->pathtarget = target;
 		}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c84a2c..6ea7029 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d491eb..454a09d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int append_total_cost_compare(const void *a, const void *b);
+static int append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1201,6 +1203,82 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * get_append_num_workers
+ *    Return the number of workers to request for partial append path.
+ */
+int
+get_append_num_workers(List *partial_subpaths,
+					   List *nonpartial_subpaths,
+					   bool parallel_aware)
+{
+	ListCell   *lc;
+	double		log2w;
+	int			num_workers;
+	int			max_per_plan_workers;
+
+	/*
+	 * log2(number_of_subpaths)+1 formula seems to give an appropriate number of
+	 * workers for Append path either having high number of children (> 100) or
+	 * having all non-partial subpaths or subpaths with 1-2 parallel_workers.
+	 * Whereas, if the subpaths->parallel_workers is high, this formula is not
+	 * suitable, because it does not take into account per-subpath workers.
+	 * For e.g., with 3 subplans having per-subplan workers such as (2, 8, 8),
+	 * the Append workers should be at least 8, whereas the formula gives 2. In
+	 * this case, it seems better to follow the method used for calculating
+	 * parallel_workers of an unpartitioned table : log3(table_size). So we
+	 * treat a partitioned table as if the data belongs to a single
+	 * unpartitioned table, and then derive its workers. So it will be :
+	 * logb(b^w1 + b^w2 + b^w3) where w1, w2.. are per-subplan workers and
+	 * b is some logarithmic base such as 2 or 3. It turns out that this
+	 * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
+	 * just use the maximum of workers formula. But this formula gives too few
+	 * workers when all paths have single worker (meaning they are non-partial)
+	 * For e.g. with workers : (1, 1, 1, 1, 1, 1), it is better to allocate 3
+	 * workers, whereas this method allocates only 1.
+	 * So we use whichever method that gives higher number of workers.
+	 */
+
+	/* Get log2(num_subpaths) */
+	log2w = fls(list_length(partial_subpaths) +
+				list_length(nonpartial_subpaths));
+
+	/* Avoid further calculations if we already crossed max workers limit */
+	if (max_parallel_workers_per_gather <= log2w + 1)
+		return max_parallel_workers_per_gather;
+
+
+	/*
+	 * Get the parallel_workers value of the partial subpath having the highest
+	 * parallel_workers.
+	 */
+	max_per_plan_workers = 1;
+	foreach(lc, partial_subpaths)
+	{
+		Path	   *subpath = lfirst(lc);
+		max_per_plan_workers = Max(max_per_plan_workers,
+								   subpath->parallel_workers);
+	}
+
+	/*
+	 * For non-parallel-aware Append, all workers run a common subplan at a
+	 * time, so it makes sense to just choose the per-subplan max workers as
+	 * the Append workers. For parallel-aware Append, choose the higher of the
+	 * results of the two formulae.
+	 */
+	num_workers = (parallel_aware ?
+				   rint(Max(log2w, max_per_plan_workers) + 1)
+				   : max_per_plan_workers);
+
+
+	/* In no case use more than max_parallel_workers_per_gather workers. */
+	num_workers = Min(num_workers, max_parallel_workers_per_gather);
+
+	Assert(num_workers > 0);
+
+	return num_workers;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
@@ -1208,8 +1286,11 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -1219,44 +1300,83 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = (parallel_aware && parallel_workers > 0);
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	/* For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
+									   subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	cost_append(&pathnode->path, pathnode->subpaths,
+				pathnode->first_partial_path);
+
 	return pathnode;
 }
 
 /*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
  *	  pathnode.
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f9..f2b4474 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ae22185..5251259 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallel_append,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 368b280..87c54f0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..78e5943 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 52d3532..d23ff47 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -998,12 +999,16 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
+struct ParallelAppendDescData;
 typedef struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
+	bool		is_parallel_worker;
+	struct ParallelAppendDescData *as_padesc; /* parallel coordination info */
+	Size		pappend_len;	/* size of parallel coordination info */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dd74efa..08d486f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cef..ec5da88 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 306d923..7d3a547 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallel_append;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,8 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path, List *subpaths,
+						int num_nonpartial_subpaths);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16a..51b5096 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,14 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern int get_append_num_workers(List *partial_subpaths,
+					   List *nonpartial_subpaths,
+					   bool parallel_aware);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed..e190155 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faf..9692155 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallel_append;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..b4cf7cb 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,13 +11,38 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
          ->  Partial Aggregate
                ->  Append
                      ->  Parallel Seq Scan on a_star
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 4
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f3..2b738aa 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallel_append     | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc..3fafc5f 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallel_append;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..8de082b 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
-explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+-- test Parallel Append.
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#120

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Khandekar (#119)

Re: Parallel Append implementation

On Thu, Oct 19, 2017 at 9:05 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

+ *        ExecAppendEstimate
+ *
+ *        estimates the space required to serialize Append node.
Ugh, this is wrong, but I notice it follows various other
equally-wrong comments for other parallel-aware node types. I guess
I'll go fix that. We are not in serializing the Append node.
I didn't clealy get this. Do you think it should be "space required to
copy the Append node into the shared memory" ?

No, because the Append node is *NOT* getting copied into shared
memory. I have pushed a comment update to the existing functions; you
can use the same comment for this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#121

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Robert Haas (#120)

1 attachment(s)

Re: Parallel Append implementation

On Sat, Oct 28, 2017 at 5:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:

No, because the Append node is *NOT* getting copied into shared
memory. I have pushed a comment update to the existing functions; you
can use the same comment for this patch.

I spent the last several days working on this patch, which had a
number of problems both cosmetic and functional. I think the attached
is in better shape now, but it could certainly use some more review
and testing since I only just finished modifying it, and I modified it
pretty heavily. Changes:

- I fixed the "morerows" entries in the documentation. If you had
built the documentation the way you had it and loaded up in a web
browser, you would have seen that the way you had it was not correct.

- I moved T_AppendState to a different position in the switch inside
ExecParallelReInitializeDSM, so as to keep that switch in the same
order as all of the other switch statements in that file.

- I rewrote the comment for pa_finished. It previously began with
"workers currently executing the subplan", which is not an accurate
description. I suspect this was a holdover from a previous version of
the patch in which this was an array of integers rather than an array
of type bool. I also fixed the comment in ExecAppendEstimate and
added, removed, or rewrote various other comments as well.

- I renamed PA_INVALID_PLAN to INVALID_SUBPLAN_INDEX, which I think is
more clear and allows for the possibility that this sentinel value
might someday be used for non-parallel-aware Append plans.

- I largely rewrote the code for picking the next subplan. A
superficial problem with the way that you had it is that you had
renamed exec_append_initialize_next to exec_append_seq_next but not
updated the header comment to match. Also, the logic was spread out
all over the file. There are three cases: not parallel aware, leader,
worker. You had the code for the first case at the top of the file
and the other two cases at the bottom of the file and used multiple
"if" statements to pick the right one in each case. I replaced all
that with a function pointer stored in the AppendState, moved the code
so it's all together, and rewrote it in a way that I find easier to
understand. I also changed the naming convention.

- I renamed pappend_len to pstate_len and ParallelAppendDescData to
ParallelAppendState. I think the use of the word "descriptor" is a
carryover from the concept of a scan descriptor. There's nothing
really wrong with inventing the concept of an "append descriptor", but
it seems more clear to just refer to shared state.

- I fixed ExecAppendReInitializeDSM not to reset node->as_whichplan.
Per commit 41b0dd987d44089dc48e9c70024277e253b396b7, that's wrong;
instead, local state should be reset in ExecReScanAppend. I installed
what I believe to be the correct logic in that function instead.

- I fixed list_qsort() so that it copies the type of the old list into
the new list. Otherwise, sorting a list of type T_IntList or
T_OidList would turn it into just plain T_List, which is wrong.

- I removed get_append_num_workers and integrated the logic into the
callers. This function was coded quite strangely: it assigned the
return value of fls() to a double and then eventually rounded the
result back to an integer. But fls() returns an integer, so this
doesn't make much sense. On a related note, I made it use fls(# of
subpaths) instead of fls(# of subpaths)+1. Adding 1 doesn't make
sense to me here because it leads to a decision to use 2 workers for a
single, non-partial subpath. I suspect both of these mistakes stem
from thinking that fls() returns the base-2 logarithm, but in fact it
doesn't, quite: log2(1) = 0.0 but fls(1) = 1.

- In the process of making the changes described in the previous
point, I added a couple of assertions, one of which promptly failed.
It turns out the reason is that your patch didn't update
accumulate_append_subpaths(), which can result in flattening
non-partial paths from a Parallel Append into a parent Append's list
of partial paths, which is bad. The easiest way to fix that would be
to just teach accumulate_append_subpaths() not to flatten a Parallel
Append into a parent Append or MergeAppend node, but it seemed to me
that there was a fair amount of duplication between
accumulate_partialappend_subpath() and accumulate_append_subpaths, so
what I did instead is folded all of the necessarily logic directly
into accumulate_append_subpaths(). This approach also avoids some
assumptions that your code made, such as the assumption that we will
never have a partial MergeAppend path.

- I changed create_append_path() so that it uses the parallel_aware
argument as the only determinant of whether the output path is flagged
as parallel-aware. Your version also considered whether
parallel_workers > 0, but I think that's not a good idea; the caller
should pass the correct value for parallel_aware rather than relying
on this function to fix it. Possibly you indirectly encountered the
problem mentioned in the previous point and worked around it like
this, or maybe there was some other reason, but it doesn't seem to be
necessary.

- I changed things around to enforce the rule that all partial paths
added to an appendrel must use the same row count estimate. (This is
not a new coding rule, but this patch provides a new way to violate
it.) I did that by forcing the row-count for any parallel append
mixing partial and non-partial paths to use the same row count as the
row already added. I also changed the way the row count is calculated
in the case where the only parallel append path mixes partial and
non-partial plans; I think this way is more consistent with what we've
done elsewhere. This amounts to the assumption that we're trying to
estimate the average number of rows per worker rather than the largest
possible number; I'm not sure what the best thing to do here is in
theory, but one advantage of this approach is that I think it will
produce answers closer to the value we get for an all-partial-paths
append. That's good, because we don't want the row-count estimate to
change precipitously based on whether an all-partial-paths append is
possible.

- I fixed some whitespace problems by running pgindent on various
files and manually breaking some long lines.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

parallel-append-v19.patchapplication/octet-stream; name=parallel-append-v19.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d360fc4d58..4f59953107 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3632,6 +3632,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallel-append" xreflabel="enable_parallel_append">
+      <term><varname>enable_parallel_append</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallel_append</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6f8203355e..4d8f17de02 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="62"><literal>LWLock</literal></entry>
+        <entry morerows="63"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1117,6 +1117,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</literal></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</literal></entry>
          <entry><literal>relation</literal></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 1b477baecb..0010e87fd6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -25,6 +25,7 @@
 
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -244,6 +245,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -316,6 +322,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -689,6 +700,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecForeignScanReInitializeDSM((ForeignScanState *) planstate,
 											   pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanReInitializeDSM((CustomScanState *) planstate,
@@ -969,6 +984,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												toc);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, toc);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index bed9bb8713..c6a283a9d9 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -61,51 +61,26 @@
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
-
-
-/* ----------------------------------------------------------------
- *		exec_append_initialize_next
- *
- *		Sets up the append state node for the "next" scan.
- *
- *		Returns t iff there is a "next" scan to process.
- * ----------------------------------------------------------------
- */
-static bool
-exec_append_initialize_next(AppendState *appendstate)
+/* Shared state for parallel-aware Append. */
+struct ParallelAppendState
 {
-	int			whichplan;
-
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 	/*
-	 * get information from the append node
+	 * pa_finished[i] should be true if no more workers should select
+	 * subplan i.  for a non-partial plan, this should be set to true as soon
+	 * as a worker selects the plan; for a partial plan, it remains false
+	 * until some worker executes the plan to completion.
 	 */
-	whichplan = appendstate->as_whichplan;
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+};
 
-	if (whichplan < 0)
-	{
-		/*
-		 * if scanning in reverse, we start at the last scan in the list and
-		 * then proceed back to the first.. in any case we inform ExecAppend
-		 * that we are at the end of the line by returning FALSE
-		 */
-		appendstate->as_whichplan = 0;
-		return FALSE;
-	}
-	else if (whichplan >= appendstate->as_nplans)
-	{
-		/*
-		 * as above, end the scan if we go beyond the last scan in our list..
-		 */
-		appendstate->as_whichplan = appendstate->as_nplans - 1;
-		return FALSE;
-	}
-	else
-	{
-		return TRUE;
-	}
-}
+#define INVALID_SUBPLAN_INDEX		-1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool choose_next_subplan_locally(AppendState *node);
+static bool choose_next_subplan_for_leader(AppendState *node);
+static bool choose_next_subplan_for_worker(AppendState *node);
 
 /* ----------------------------------------------------------------
  *		ExecInitAppend
@@ -185,10 +160,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Parallel-aware append plans must choose the first subplan to
+	 * execute by looking at shared memory, but non-parallel-aware
+	 * append plans can always start with the first subplan.
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	appendstate->as_whichplan =
+		appendstate->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+
+	/* If parallel-aware, this will be overridden later. */
+	appendstate->choose_next_subplan = choose_next_subplan_locally;
 
 	return appendstate;
 }
@@ -204,6 +184,11 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/* If no subplan has been chosen, we must choose one before proceeding. */
+	if (node->as_whichplan == INVALID_SUBPLAN_INDEX &&
+		!node->choose_next_subplan(node))
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -231,19 +216,9 @@ ExecAppend(PlanState *pstate)
 			return result;
 		}
 
-		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
-		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
+		/* choose new subplan; if none, we're done */
+		if (!node->choose_next_subplan(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
-		/* Else loop back and try to get a tuple from the new subplan */
 	}
 }
 
@@ -298,6 +273,233 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	node->as_whichplan =
+		node->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pstate_len =
+		add_size(offsetof(ParallelAppendState, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up shared state for Parallel Append.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate;
+
+	pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
+	memset(pstate, 0, node->pstate_len);
+	LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+
+	node->as_pstate = pstate;
+	node->choose_next_subplan = choose_next_subplan_for_leader;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+
+	pstate->pa_next_plan = 0;
+	memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, shm_toc *toc)
+{
+	node->as_pstate = shm_toc_lookup(toc, node->ps.plan->plan_node_id, false);
+	node->choose_next_subplan = choose_next_subplan_for_worker;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_locally
+ *
+ *		Choose next subplan for a non-parallel-aware Append,
+ *		returning false if there are no more.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_locally(AppendState *node)
+{
+	int			whichplan = node->as_whichplan;
+
+	/* We should never see INVALID_SUBPLAN_INDEX in this case. */
+	Assert(whichplan >= 0 && whichplan <= node->as_nplans);
+
+	if (ScanDirectionIsForward(node->ps.state->es_direction))
+	{
+		if (whichplan >= node->as_nplans - 1)
+			return false;
+		node->as_whichplan++;
+	}
+	else
+	{
+		if (whichplan <= 0)
+			return false;
+		node->as_whichplan--;
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_leader
+ *
+ *      Try to pick a plan which doesn't commit us to doing much
+ *      work locally, so that as much work as possible is done in
+ *      the workers.  Cheapest subplans are at the end.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_leader(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+	{
+		/* Mark just-completed subplan as finished. */
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+	}
+	else
+	{
+		/* Start with last subplan. */
+		node->as_whichplan = node->as_nplans - 1;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[node->as_whichplan])
+	{
+		if (node->as_whichplan == 0)
+		{
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			node->as_whichplan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+		node->as_whichplan--;
+	}
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_worker
+ *
+ *		Choose next subplan for a parallel-aware Append, returning
+ *		false if there are no more.
+ *
+ *		We start from the first plan and advance through the list;
+ *		when we get back to the end, we loop back to the first
+ *		nonpartial plan.  This assigns the non-partial plans first
+ *		in order of descending cost and then spreads out the
+ *		workers as evenly as possible across the remaining partial
+ *		plans.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_worker(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	/* Mark just-completed subplan as finished. */
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	/* If all the plans are already done, we have nothing to do */
+	if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
+	{
+		LWLockRelease(&pstate->pa_lock);
+		return false;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[pstate->pa_next_plan])
+	{
+		if (pstate->pa_next_plan >= node->as_nplans - 1)
+			pstate->pa_next_plan = append->first_partial_plan;
+		else
+			pstate->pa_next_plan++;
+		if (pstate->pa_next_plan == node->as_whichplan)
+		{
+			/* We've tried everything! */
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+	}
+
+	/* Pick the plan we found, and advance pa_next_plan one more time. */
+	node->as_whichplan = pstate->pa_next_plan;
+	if (pstate->pa_next_plan == node->as_nplans)
+		pstate->pa_next_plan = append->first_partial_plan;
+	else
+		pstate->pa_next_plan++;
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c1a83ca909..585248405e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5315..bee6244adc 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,44 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the
+ * original list are re-used.  The comparator function receives arguments of
+ * type ListCell **
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = list->type;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len - 1];
+
+	for (i = 0; i < len - 1; i++)
+		list_arr[i]->next = list_arr[i + 1];
+
+	list_arr[len - 1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 43d62062bc..3236d58722 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ccb6a1f4ac..23fcc1bffc 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1600,6 +1600,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a6efb4e1d3..6c1bb6ec50 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,7 +101,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static void accumulate_append_subpath(Path *path,
+						  List **subpaths, List **special_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1331,13 +1332,17 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallel_append;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
 	RangeTblEntry *rte;
 	bool		build_partitioned_rels = false;
+	double		partial_rows = -1;
 
 	if (IS_SIMPLE_REL(rel))
 	{
@@ -1388,6 +1393,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	{
 		RelOptInfo *childrel = lfirst(l);
 		ListCell   *lcp;
+		Path	   *cheapest_partial_path = NULL;
 
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
@@ -1408,19 +1414,70 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 * If not, there's no workable unparameterized path.
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
-			subpaths = accumulate_append_subpath(subpaths,
-												 childrel->cheapest_total_path);
+			accumulate_append_subpath(childrel->cheapest_total_path,
+									  &subpaths, NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
-			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-														 linitial(childrel->partial_pathlist));
+		{
+			cheapest_partial_path = linitial(childrel->partial_pathlist);
+			accumulate_append_subpath(cheapest_partial_path,
+									  &partial_subpaths, NULL);
+		}
 		else
 			partial_subpaths_valid = false;
 
 		/*
+		 * Same idea, but for a parallel append mixing partial and non-partial
+		 * paths.
+		 */
+		if (pa_subpaths_valid)
+		{
+			Path	   *nppath = NULL;
+
+			nppath =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			if (cheapest_partial_path == NULL && nppath == NULL)
+			{
+				/* Neither a partial nor a parallel-safe path?  Forget it. */
+				pa_subpaths_valid = false;
+			}
+			else if (nppath == NULL ||
+					 (cheapest_partial_path != NULL &&
+					  cheapest_partial_path->total_cost < nppath->total_cost))
+			{
+				/* Partial path is cheaper or the only option. */
+				Assert(cheapest_partial_path != NULL);
+				accumulate_append_subpath(cheapest_partial_path,
+										  &pa_partial_subpaths,
+										  &pa_nonpartial_subpaths);
+
+			}
+			else
+			{
+				/*
+				 * Either we've got only a non-partial path, or we think that
+				 * a single backend can execute the best non-partial path
+				 * faster than all the parallel backends working together can
+				 * execute the best partial path.
+				 *
+				 * It might make sense to be more aggressive here.  Even if
+				 * the best non-partial path is more expensive than the best
+				 * partial path, it could still be better to choose the
+				 * non-partial path if there are several such paths that can
+				 * be given to different workers.  For now, we don't try to
+				 * figure that out.
+				 */
+				accumulate_append_subpath(nppath,
+										  &pa_nonpartial_subpaths,
+										  NULL);
+			}
+		}
+
+		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
 		 * heuristic to indicate which sort orderings and parameterizations we
@@ -1491,11 +1548,13 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
-												  partitioned_rels));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
+												  partitioned_rels, -1));
 
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider an append of unordered, unparameterized partial paths.  Make
+	 * it parallel-aware if possible.
 	 */
 	if (partial_subpaths_valid)
 	{
@@ -1503,12 +1562,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
+		/* Find the highest number of workers requested for any subpath. */
 		foreach(lc, partial_subpaths)
 		{
 			Path	   *path = lfirst(lc);
@@ -1517,9 +1571,78 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
+		/*
+		 * If the use of parallel append is permitted, always request at least
+		 * log2(# of children) paths.  We assume it can be useful to have
+		 * extra workers in this case because they will be spread out across
+		 * the children.  The precise formula is just a guess, but we don't
+		 * want to end up with a radically different answer for a table with N
+		 * partitions vs. an unpartitioned table with the same data, so the
+		 * use of some kind of log-scaling here seems to make some sense.
+		 */
+		if (enable_parallel_append)
+		{
+			parallel_workers = Max(parallel_workers,
+								   fls(list_length(live_childrels)));
+			parallel_workers = Min(parallel_workers,
+								   max_parallel_workers_per_gather);
+		}
+		Assert(parallel_workers > 0);
+
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths, NULL,
+										parallel_workers,
+										enable_parallel_append,
+										partitioned_rels, -1);
+
+		/*
+		 * Make sure any subsequent partial paths use the same row count
+		 * estimate.
+		 */
+		partial_rows = appendpath->path.rows;
+
+		/* Add the path. */
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
+	/*
+	 * Consider a parallel-aware append using a mix of partial and non-partial
+	 * paths.  (This only makes sense if there's at least one child which has
+	 * a non-partial path that is substantially cheaper than any partial path;
+	 * otherwise, we should use the append path added in the previous step.)
+	 */
+	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
+	{
+		AppendPath *appendpath;
+		ListCell   *lc;
+		int			parallel_workers = 0;
+
+		/*
+		 * Find the highest number of workers requested for any partial
+		 * subpath.
+		 */
+		foreach(lc, pa_partial_subpaths)
+		{
+			Path	   *path = lfirst(lc);
+
+			parallel_workers = Max(parallel_workers, path->parallel_workers);
+		}
+
+		/*
+		 * Same formula here as above.  It's even more important in this
+		 * instance because the non-partial paths won't contribute anything to
+		 * the planned number of parallel workers.
+		 */
+		parallel_workers = Max(parallel_workers,
+							   fls(list_length(live_childrels)));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+		Assert(parallel_workers > 0);
+
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels, partial_rows);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1567,13 +1690,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			accumulate_append_subpath(subpath, &subpaths, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
-										partitioned_rels));
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
+										partitioned_rels, -1));
 	}
 }
 
@@ -1657,10 +1781,10 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			if (cheapest_startup != cheapest_total)
 				startup_neq_total = true;
 
-			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
-			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+			accumulate_append_subpath(cheapest_startup,
+									  &startup_subpaths, NULL);
+			accumulate_append_subpath(cheapest_total,
+									  &total_subpaths, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1756,7 +1880,7 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * accumulate_append_subpath
- *		Add a subpath to the list being built for an Append or MergeAppend
+ *		Add a subpath to the list being built for an Append or MergeAppend.
  *
  * It's possible that the child is itself an Append or MergeAppend path, in
  * which case we can "cut out the middleman" and just add its child paths to
@@ -1767,26 +1891,53 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ * its result would be unsorted anyway.
+ *
+ * Normally, either path is a partial path and subpaths is a list of partial
+ * paths, or else path is a non-partial plan and subpaths is a list of those.
+ * However, if path is a parallel-aware Append, then we add its partial path
+ * children to subpaths and the rest to special_subpaths.  If the latter is
+ * NULL, we don't flatten the path at all (unless it contains only partial
+ * paths).
  */
-static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+static void
+accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
 {
 	if (IsA(path, AppendPath))
 	{
 		AppendPath *apath = (AppendPath *) path;
 
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		if (!apath->path.parallel_aware || apath->first_partial_path == 0)
+		{
+			/* list_copy is important here to avoid sharing list substructure */
+			*subpaths = list_concat(*subpaths, list_copy(apath->subpaths));
+			return;
+		}
+		else if (special_subpaths != NULL)
+		{
+			List	   *new_special_subpaths;
+
+			/* Split Parallel Append into partial and non-partial subpaths */
+			*subpaths = list_concat(*subpaths,
+									list_copy_tail(apath->subpaths,
+												   apath->first_partial_path));
+			new_special_subpaths =
+				list_truncate(list_copy(apath->subpaths),
+							  apath->first_partial_path);
+			*special_subpaths = list_concat(*special_subpaths,
+											new_special_subpaths);
+		}
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
 		MergeAppendPath *mpath = (MergeAppendPath *) path;
 
 		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		*subpaths = list_concat(*subpaths, list_copy(mpath->subpaths));
+		return;
 	}
-	else
-		return lappend(subpaths, path);
+
+	*subpaths = lappend(*subpaths, path);
 }
 
 /*
@@ -1809,7 +1960,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 98fb16e85a..b29a4f9ab4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallel_append = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+					   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1742,6 +1745,167 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Estimate the cost of the non-partial paths in a Parallel Append.
+ *	  The non-partial paths are assumed to be the first "numpaths" paths
+ *	  from the subpaths list, and to be in order of decreasing cost.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Array length is number of workers or number of relevants paths,
+	 * whichever is less.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+
+	/* The first few paths will each be claimed by a different worker. */
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since subpaths are sorted by decreasing cost, the last one will have
+	 * the minimum cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+		int			i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(AppendPath *apath)
+{
+	ListCell   *l;
+
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+
+	if (apath->subpaths == NIL)
+		return;
+
+	if (!apath->path.parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(apath->subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		apath->path.startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			apath->path.rows += subpath->rows;
+			apath->path.total_cost += subpath->total_cost;
+		}
+	}
+	else						/* parallel-aware */
+	{
+		int			i = 0;
+		double		parallel_divisor = get_parallel_divisor(&apath->path);
+
+		/* Calculate startup cost. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append will start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i == 0)
+				apath->path.startup_cost = subpath->startup_cost;
+			else if (i < apath->path.parallel_workers)
+				apath->path.startup_cost = Min(apath->path.startup_cost,
+											   subpath->startup_cost);
+
+			/*
+			 * Apply parallel divisor to non-partial subpaths.  Also add the
+			 * cost of partial paths to the total cost, but ignore non-partial
+			 * paths for now.
+			 */
+			if (i < apath->first_partial_path)
+				apath->path.rows += subpath->rows / parallel_divisor;
+			else
+			{
+				apath->path.rows += subpath->rows;
+				apath->path.total_cost += subpath->total_cost;
+			}
+
+			i++;
+		}
+
+		/* Add cost for non-partial subpaths. */
+		apath->path.total_cost +=
+			append_nonpartial_cost(apath->subpaths,
+								   apath->first_partial_path,
+								   apath->path.parallel_workers);
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..04f4bad9e3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4b497486a0..bde89ae2f3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1059,7 +1060,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5294,7 +5296,8 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5305,6 +5308,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..527d27582a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3677,9 +3677,12 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
-								   NIL);
+								   false,
+								   NIL,
+								   -1);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index f620243ab4..a24e8acfa6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 36ec025b05..fe9218a384 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int	append_total_cost_compare(const void *a, const void *b);
+static int	append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1208,44 +1210,50 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels, double rows)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
 
+	Assert(!parallel_aware || parallel_workers > 0);
+
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = parallel_aware;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
 	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	 * For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1253,10 +1261,54 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	Assert(!parallel_aware || pathnode->path.parallel_safe);
+
+	cost_append(pathnode);
+
+	/* If the caller provided a row estimate, override the computed value. */
+	if (rows >= 0)
+		pathnode->path.rows = rows;
+
 	return pathnode;
 }
 
 /*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
  *	  pathnode.
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f9675..f2b4474d19 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 65372d7cc5..d3475ab2a9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallel_append,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 368b280c8a..87c54f0648 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a1380e..78e594382b 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, shm_toc *toc);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d209ec012c..1d09edde44 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -1000,13 +1001,22 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
-typedef struct AppendState
+
+struct AppendState;
+typedef struct AppendState AppendState;
+struct ParallelAppendState;
+typedef struct ParallelAppendState ParallelAppendState;
+
+struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
-} AppendState;
+	ParallelAppendState *as_pstate;	/* parallel coordination info */
+	Size		pstate_len;		/* size of parallel coordination info */
+	bool		(*choose_next_subplan) (AppendState *);
+};
 
 /* ----------------
  *	 MergeAppendState information
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e269c..711db92576 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dd74efa9a4..08d486fcfd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ec5da88cae 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6c2317df39..5a1fbf97c3 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallel_append;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(AppendPath *path);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..b4083c6a55 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels, double rows);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed7f9..e1901558a7 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..96921552bb 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallel_append;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index ac9ad0668d..f278a4d69e 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,8 +11,33 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f301d..2b738aae7c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallel_append     | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc0f5..3fafc5f61a 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallel_append;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 495f0335dc..c285374cd2 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
-explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+-- test Parallel Append.
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test that parallel_restricted function doesn't run in worker
 alter table tenk1 set (parallel_workers = 4);

#122

Amit Khandekar

amitdkhan.pg@gmail.com

about 8 years ago

In reply to: Robert Haas (#121)

Re: Parallel Append implementation

Thanks a lot Robert for the patch. I will have a look. Quickly tried
to test some aggregate queries with a partitioned pgbench_accounts
table, and it is crashing. Will get back with the fix, and any other
review comments.

Thanks
-Amit Khandekar

On 9 November 2017 at 23:44, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Oct 28, 2017 at 5:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:

No, because the Append node is *NOT* getting copied into shared
memory. I have pushed a comment update to the existing functions; you
can use the same comment for this patch.

I spent the last several days working on this patch, which had a
number of problems both cosmetic and functional. I think the attached
is in better shape now, but it could certainly use some more review
and testing since I only just finished modifying it, and I modified it
pretty heavily. Changes:

- I fixed the "morerows" entries in the documentation. If you had
built the documentation the way you had it and loaded up in a web
browser, you would have seen that the way you had it was not correct.

- I moved T_AppendState to a different position in the switch inside
ExecParallelReInitializeDSM, so as to keep that switch in the same
order as all of the other switch statements in that file.

- I rewrote the comment for pa_finished. It previously began with
"workers currently executing the subplan", which is not an accurate
description. I suspect this was a holdover from a previous version of
the patch in which this was an array of integers rather than an array
of type bool. I also fixed the comment in ExecAppendEstimate and
added, removed, or rewrote various other comments as well.

- I renamed PA_INVALID_PLAN to INVALID_SUBPLAN_INDEX, which I think is
more clear and allows for the possibility that this sentinel value
might someday be used for non-parallel-aware Append plans.

- I largely rewrote the code for picking the next subplan. A
superficial problem with the way that you had it is that you had
renamed exec_append_initialize_next to exec_append_seq_next but not
updated the header comment to match. Also, the logic was spread out
all over the file. There are three cases: not parallel aware, leader,
worker. You had the code for the first case at the top of the file
and the other two cases at the bottom of the file and used multiple
"if" statements to pick the right one in each case. I replaced all
that with a function pointer stored in the AppendState, moved the code
so it's all together, and rewrote it in a way that I find easier to
understand. I also changed the naming convention.

- I renamed pappend_len to pstate_len and ParallelAppendDescData to
ParallelAppendState. I think the use of the word "descriptor" is a
carryover from the concept of a scan descriptor. There's nothing
really wrong with inventing the concept of an "append descriptor", but
it seems more clear to just refer to shared state.

- I fixed ExecAppendReInitializeDSM not to reset node->as_whichplan.
Per commit 41b0dd987d44089dc48e9c70024277e253b396b7, that's wrong;
instead, local state should be reset in ExecReScanAppend. I installed
what I believe to be the correct logic in that function instead.

- I fixed list_qsort() so that it copies the type of the old list into
the new list. Otherwise, sorting a list of type T_IntList or
T_OidList would turn it into just plain T_List, which is wrong.

- I removed get_append_num_workers and integrated the logic into the
callers. This function was coded quite strangely: it assigned the
return value of fls() to a double and then eventually rounded the
result back to an integer. But fls() returns an integer, so this
doesn't make much sense. On a related note, I made it use fls(# of
subpaths) instead of fls(# of subpaths)+1. Adding 1 doesn't make
sense to me here because it leads to a decision to use 2 workers for a
single, non-partial subpath. I suspect both of these mistakes stem
from thinking that fls() returns the base-2 logarithm, but in fact it
doesn't, quite: log2(1) = 0.0 but fls(1) = 1.

- In the process of making the changes described in the previous
point, I added a couple of assertions, one of which promptly failed.
It turns out the reason is that your patch didn't update
accumulate_append_subpaths(), which can result in flattening
non-partial paths from a Parallel Append into a parent Append's list
of partial paths, which is bad. The easiest way to fix that would be
to just teach accumulate_append_subpaths() not to flatten a Parallel
Append into a parent Append or MergeAppend node, but it seemed to me
that there was a fair amount of duplication between
accumulate_partialappend_subpath() and accumulate_append_subpaths, so
what I did instead is folded all of the necessarily logic directly
into accumulate_append_subpaths(). This approach also avoids some
assumptions that your code made, such as the assumption that we will
never have a partial MergeAppend path.

- I changed create_append_path() so that it uses the parallel_aware
argument as the only determinant of whether the output path is flagged
as parallel-aware. Your version also considered whether
parallel_workers > 0, but I think that's not a good idea; the caller
should pass the correct value for parallel_aware rather than relying
on this function to fix it. Possibly you indirectly encountered the
problem mentioned in the previous point and worked around it like
this, or maybe there was some other reason, but it doesn't seem to be
necessary.

- I changed things around to enforce the rule that all partial paths
added to an appendrel must use the same row count estimate. (This is
not a new coding rule, but this patch provides a new way to violate
it.) I did that by forcing the row-count for any parallel append
mixing partial and non-partial paths to use the same row count as the
row already added. I also changed the way the row count is calculated
in the case where the only parallel append path mixes partial and
non-partial plans; I think this way is more consistent with what we've
done elsewhere. This amounts to the assumption that we're trying to
estimate the average number of rows per worker rather than the largest
possible number; I'm not sure what the best thing to do here is in
theory, but one advantage of this approach is that I think it will
produce answers closer to the value we get for an all-partial-paths
append. That's good, because we don't want the row-count estimate to
change precipitously based on whether an all-partial-paths append is
possible.

- I fixed some whitespace problems by running pgindent on various
files and manually breaking some long lines.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#123

Rafia Sabih

rafia.sabih@enterprisedb.com

about 8 years ago

In reply to: Amit Khandekar (#122)

Re: [HACKERS] Parallel Append implementation

On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks a lot Robert for the patch. I will have a look. Quickly tried
to test some aggregate queries with a partitioned pgbench_accounts
table, and it is crashing. Will get back with the fix, and any other
review comments.

Thanks
-Amit Khandekar

I was trying to get the performance of this patch at commit id -
11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20
with the following parameter settings,
work_mem = 1 GB
shared_buffers = 10GB
effective_cache_size = 10GB
max_parallel_workers_per_gather = 4
enable_partitionwise_join = on

and the details of the partitioning scheme is as follows,
tables partitioned = lineitem on l_orderkey and orders on o_orderkey
number of partitions in each table = 10

As per the explain outputs PA was used in following queries- 1, 3, 4,
5, 6, 7, 8, 10, 12, 14, 15, 18, and 21.
Unfortunately, at the time of executing any of these query, it is
crashing with the following information in core dump of each of the
workers,

Program terminated with signal 11, Segmentation fault.
#0 0x0000000010600984 in pg_atomic_read_u32_impl (ptr=0x3ffffec29294)
at ../../../../src/include/port/atomics/generic.h:48
48 return ptr->value;

In case this a different issue as you pointed upthread, you may want
to have a look at this as well.
Please let me know if you need any more information in this regard.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#124

Amit Khandekar

amitdkhan.pg@gmail.com

about 8 years ago

In reply to: Rafia Sabih (#123)

2 attachment(s)

Re: [HACKERS] Parallel Append implementation

On 21 November 2017 at 12:44, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:

On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks a lot Robert for the patch. I will have a look. Quickly tried
to test some aggregate queries with a partitioned pgbench_accounts
table, and it is crashing. Will get back with the fix, and any other
review comments.

Thanks
-Amit Khandekar

I was trying to get the performance of this patch at commit id -
11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20
with the following parameter settings,
work_mem = 1 GB
shared_buffers = 10GB
effective_cache_size = 10GB
max_parallel_workers_per_gather = 4
enable_partitionwise_join = on

and the details of the partitioning scheme is as follows,
tables partitioned = lineitem on l_orderkey and orders on o_orderkey
number of partitions in each table = 10

As per the explain outputs PA was used in following queries- 1, 3, 4,
5, 6, 7, 8, 10, 12, 14, 15, 18, and 21.
Unfortunately, at the time of executing any of these query, it is
crashing with the following information in core dump of each of the
workers,

Program terminated with signal 11, Segmentation fault.
#0 0x0000000010600984 in pg_atomic_read_u32_impl (ptr=0x3ffffec29294)
at ../../../../src/include/port/atomics/generic.h:48
48 return ptr->value;

In case this a different issue as you pointed upthread, you may want
to have a look at this as well.
Please let me know if you need any more information in this regard.

Right, for me the crash had occurred with a similar stack, although
the real crash happened in one of the workers. Attached is the script
file
pgbench_partitioned.sql to create a schema with which I had reproduced
the crash.

The query that crashed :
select sum(aid), avg(aid) from pgbench_accounts;

Set max_parallel_workers_per_gather to at least 5.

Also attached is v19 patch rebased.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pgbench_partitioned.sqlapplication/octet-stream; name=pgbench_partitioned.sqlDownload

ParallelAppend_v19_rebased.patchapplication/octet-stream; name=ParallelAppend_v19_rebased.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc1752f..553f1bb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3632,6 +3632,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallel-append" xreflabel="enable_parallel_append">
+      <term><varname>enable_parallel_append</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallel_append</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6f82033..4d8f17d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="62"><literal>LWLock</literal></entry>
+        <entry morerows="63"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1117,6 +1117,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
         <row>
+         <entry><literal>parallel_append</literal></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
+        <row>
          <entry morerows="9"><literal>Lock</literal></entry>
          <entry><literal>relation</literal></entry>
          <entry>Waiting to acquire a lock on a relation.</entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 2ead32d..7e60e85 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -26,6 +26,7 @@
 #include "executor/execExpr.h"
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -248,6 +249,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -447,6 +453,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -860,6 +871,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecForeignScanReInitializeDSM((ForeignScanState *) planstate,
 											   pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanReInitializeDSM((CustomScanState *) planstate,
@@ -1148,6 +1163,10 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												pwcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, pwcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 1d2fb35..49d63f4 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -61,51 +61,26 @@
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
-
-
-/* ----------------------------------------------------------------
- *		exec_append_initialize_next
- *
- *		Sets up the append state node for the "next" scan.
- *
- *		Returns t iff there is a "next" scan to process.
- * ----------------------------------------------------------------
- */
-static bool
-exec_append_initialize_next(AppendState *appendstate)
+/* Shared state for parallel-aware Append. */
+struct ParallelAppendState
 {
-	int			whichplan;
-
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 	/*
-	 * get information from the append node
+	 * pa_finished[i] should be true if no more workers should select
+	 * subplan i.  for a non-partial plan, this should be set to true as soon
+	 * as a worker selects the plan; for a partial plan, it remains false
+	 * until some worker executes the plan to completion.
 	 */
-	whichplan = appendstate->as_whichplan;
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+};
 
-	if (whichplan < 0)
-	{
-		/*
-		 * if scanning in reverse, we start at the last scan in the list and
-		 * then proceed back to the first.. in any case we inform ExecAppend
-		 * that we are at the end of the line by returning false
-		 */
-		appendstate->as_whichplan = 0;
-		return false;
-	}
-	else if (whichplan >= appendstate->as_nplans)
-	{
-		/*
-		 * as above, end the scan if we go beyond the last scan in our list..
-		 */
-		appendstate->as_whichplan = appendstate->as_nplans - 1;
-		return false;
-	}
-	else
-	{
-		return true;
-	}
-}
+#define INVALID_SUBPLAN_INDEX		-1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool choose_next_subplan_locally(AppendState *node);
+static bool choose_next_subplan_for_leader(AppendState *node);
+static bool choose_next_subplan_for_worker(AppendState *node);
 
 /* ----------------------------------------------------------------
  *		ExecInitAppend
@@ -185,10 +160,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Parallel-aware append plans must choose the first subplan to
+	 * execute by looking at shared memory, but non-parallel-aware
+	 * append plans can always start with the first subplan.
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	appendstate->as_whichplan =
+		appendstate->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+
+	/* If parallel-aware, this will be overridden later. */
+	appendstate->choose_next_subplan = choose_next_subplan_locally;
 
 	return appendstate;
 }
@@ -204,6 +184,11 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/* If no subplan has been chosen, we must choose one before proceeding. */
+	if (node->as_whichplan == INVALID_SUBPLAN_INDEX &&
+		!node->choose_next_subplan(node))
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -231,19 +216,9 @@ ExecAppend(PlanState *pstate)
 			return result;
 		}
 
-		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
-		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
+		/* choose new subplan; if none, we're done */
+		if (!node->choose_next_subplan(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
-		/* Else loop back and try to get a tuple from the new subplan */
 	}
 }
 
@@ -298,6 +273,233 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	node->as_whichplan =
+		node->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pstate_len =
+		add_size(offsetof(ParallelAppendState, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up shared state for Parallel Append.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate;
+
+	pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
+	memset(pstate, 0, node->pstate_len);
+	LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+
+	node->as_pstate = pstate;
+	node->choose_next_subplan = choose_next_subplan_for_leader;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+
+	pstate->pa_next_plan = 0;
+	memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt)
+{
+	node->as_pstate = shm_toc_lookup(pwcxt->toc, node->ps.plan->plan_node_id, false);
+	node->choose_next_subplan = choose_next_subplan_for_worker;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_locally
+ *
+ *		Choose next subplan for a non-parallel-aware Append,
+ *		returning false if there are no more.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_locally(AppendState *node)
+{
+	int			whichplan = node->as_whichplan;
+
+	/* We should never see INVALID_SUBPLAN_INDEX in this case. */
+	Assert(whichplan >= 0 && whichplan <= node->as_nplans);
+
+	if (ScanDirectionIsForward(node->ps.state->es_direction))
+	{
+		if (whichplan >= node->as_nplans - 1)
+			return false;
+		node->as_whichplan++;
+	}
+	else
+	{
+		if (whichplan <= 0)
+			return false;
+		node->as_whichplan--;
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_leader
+ *
+ *      Try to pick a plan which doesn't commit us to doing much
+ *      work locally, so that as much work as possible is done in
+ *      the workers.  Cheapest subplans are at the end.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_leader(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+	{
+		/* Mark just-completed subplan as finished. */
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+	}
+	else
+	{
+		/* Start with last subplan. */
+		node->as_whichplan = node->as_nplans - 1;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[node->as_whichplan])
+	{
+		if (node->as_whichplan == 0)
+		{
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			node->as_whichplan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+		node->as_whichplan--;
+	}
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_worker
+ *
+ *		Choose next subplan for a parallel-aware Append, returning
+ *		false if there are no more.
+ *
+ *		We start from the first plan and advance through the list;
+ *		when we get back to the end, we loop back to the first
+ *		nonpartial plan.  This assigns the non-partial plans first
+ *		in order of descending cost and then spreads out the
+ *		workers as evenly as possible across the remaining partial
+ *		plans.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_worker(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	/* Mark just-completed subplan as finished. */
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	/* If all the plans are already done, we have nothing to do */
+	if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
+	{
+		LWLockRelease(&pstate->pa_lock);
+		return false;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[pstate->pa_next_plan])
+	{
+		if (pstate->pa_next_plan >= node->as_nplans - 1)
+			pstate->pa_next_plan = append->first_partial_plan;
+		else
+			pstate->pa_next_plan++;
+		if (pstate->pa_next_plan == node->as_whichplan)
+		{
+			/* We've tried everything! */
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+	}
+
+	/* Pick the plan we found, and advance pa_next_plan one more time. */
+	node->as_whichplan = pstate->pa_next_plan;
+	if (pstate->pa_next_plan == node->as_nplans)
+		pstate->pa_next_plan = append->first_partial_plan;
+	else
+		pstate->pa_next_plan++;
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d9ff8a7..82a511a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5..bee6244 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1250,6 +1250,44 @@ list_copy_tail(const List *oldlist, int nskip)
 }
 
 /*
+ * Sort a list using qsort. A sorted list is built but the cells of the
+ * original list are re-used.  The comparator function receives arguments of
+ * type ListCell **
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = list->type;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len - 1];
+
+	for (i = 0; i < len - 1; i++)
+		list_arr[i]->next = list_arr[i + 1];
+
+	list_arr[len - 1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
+/*
  * Temporary compatibility functions
  *
  * In order to avoid warnings for these function definitions, we need
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c97ee24..b59a521 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 7eb67fc0..0d17ae8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1600,6 +1600,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08a..9cd2884 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,7 +101,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static void accumulate_append_subpath(Path *path,
+						  List **subpaths, List **special_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1331,13 +1332,17 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallel_append;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
 	RangeTblEntry *rte;
 	bool		build_partitioned_rels = false;
+	double		partial_rows = -1;
 
 	if (IS_SIMPLE_REL(rel))
 	{
@@ -1388,6 +1393,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	{
 		RelOptInfo *childrel = lfirst(l);
 		ListCell   *lcp;
+		Path	   *cheapest_partial_path = NULL;
 
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
@@ -1408,19 +1414,70 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 * If not, there's no workable unparameterized path.
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
-			subpaths = accumulate_append_subpath(subpaths,
-												 childrel->cheapest_total_path);
+			accumulate_append_subpath(childrel->cheapest_total_path,
+									  &subpaths, NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
-			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-														 linitial(childrel->partial_pathlist));
+		{
+			cheapest_partial_path = linitial(childrel->partial_pathlist);
+			accumulate_append_subpath(cheapest_partial_path,
+									  &partial_subpaths, NULL);
+		}
 		else
 			partial_subpaths_valid = false;
 
 		/*
+		 * Same idea, but for a parallel append mixing partial and non-partial
+		 * paths.
+		 */
+		if (pa_subpaths_valid)
+		{
+			Path	   *nppath = NULL;
+
+			nppath =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			if (cheapest_partial_path == NULL && nppath == NULL)
+			{
+				/* Neither a partial nor a parallel-safe path?  Forget it. */
+				pa_subpaths_valid = false;
+			}
+			else if (nppath == NULL ||
+					 (cheapest_partial_path != NULL &&
+					  cheapest_partial_path->total_cost < nppath->total_cost))
+			{
+				/* Partial path is cheaper or the only option. */
+				Assert(cheapest_partial_path != NULL);
+				accumulate_append_subpath(cheapest_partial_path,
+										  &pa_partial_subpaths,
+										  &pa_nonpartial_subpaths);
+
+			}
+			else
+			{
+				/*
+				 * Either we've got only a non-partial path, or we think that
+				 * a single backend can execute the best non-partial path
+				 * faster than all the parallel backends working together can
+				 * execute the best partial path.
+				 *
+				 * It might make sense to be more aggressive here.  Even if
+				 * the best non-partial path is more expensive than the best
+				 * partial path, it could still be better to choose the
+				 * non-partial path if there are several such paths that can
+				 * be given to different workers.  For now, we don't try to
+				 * figure that out.
+				 */
+				accumulate_append_subpath(nppath,
+										  &pa_nonpartial_subpaths,
+										  NULL);
+			}
+		}
+
+		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
 		 * heuristic to indicate which sort orderings and parameterizations we
@@ -1491,11 +1548,13 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
-												  partitioned_rels));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
+												  partitioned_rels, -1));
 
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider an append of unordered, unparameterized partial paths.  Make
+	 * it parallel-aware if possible.
 	 */
 	if (partial_subpaths_valid)
 	{
@@ -1503,12 +1562,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
+		/* Find the highest number of workers requested for any subpath. */
 		foreach(lc, partial_subpaths)
 		{
 			Path	   *path = lfirst(lc);
@@ -1517,9 +1571,78 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
+		/*
+		 * If the use of parallel append is permitted, always request at least
+		 * log2(# of children) paths.  We assume it can be useful to have
+		 * extra workers in this case because they will be spread out across
+		 * the children.  The precise formula is just a guess, but we don't
+		 * want to end up with a radically different answer for a table with N
+		 * partitions vs. an unpartitioned table with the same data, so the
+		 * use of some kind of log-scaling here seems to make some sense.
+		 */
+		if (enable_parallel_append)
+		{
+			parallel_workers = Max(parallel_workers,
+								   fls(list_length(live_childrels)));
+			parallel_workers = Min(parallel_workers,
+								   max_parallel_workers_per_gather);
+		}
+		Assert(parallel_workers > 0);
+
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths, NULL,
+										parallel_workers,
+										enable_parallel_append,
+										partitioned_rels, -1);
+
+		/*
+		 * Make sure any subsequent partial paths use the same row count
+		 * estimate.
+		 */
+		partial_rows = appendpath->path.rows;
+
+		/* Add the path. */
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
+	/*
+	 * Consider a parallel-aware append using a mix of partial and non-partial
+	 * paths.  (This only makes sense if there's at least one child which has
+	 * a non-partial path that is substantially cheaper than any partial path;
+	 * otherwise, we should use the append path added in the previous step.)
+	 */
+	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
+	{
+		AppendPath *appendpath;
+		ListCell   *lc;
+		int			parallel_workers = 0;
+
+		/*
+		 * Find the highest number of workers requested for any partial
+		 * subpath.
+		 */
+		foreach(lc, pa_partial_subpaths)
+		{
+			Path	   *path = lfirst(lc);
+
+			parallel_workers = Max(parallel_workers, path->parallel_workers);
+		}
+
+		/*
+		 * Same formula here as above.  It's even more important in this
+		 * instance because the non-partial paths won't contribute anything to
+		 * the planned number of parallel workers.
+		 */
+		parallel_workers = Max(parallel_workers,
+							   fls(list_length(live_childrels)));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+		Assert(parallel_workers > 0);
+
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels, partial_rows);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1567,13 +1690,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			accumulate_append_subpath(subpath, &subpaths, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
-										partitioned_rels));
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
+										partitioned_rels, -1));
 	}
 }
 
@@ -1657,10 +1781,10 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			if (cheapest_startup != cheapest_total)
 				startup_neq_total = true;
 
-			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
-			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+			accumulate_append_subpath(cheapest_startup,
+									  &startup_subpaths, NULL);
+			accumulate_append_subpath(cheapest_total,
+									  &total_subpaths, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1756,7 +1880,7 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * accumulate_append_subpath
- *		Add a subpath to the list being built for an Append or MergeAppend
+ *		Add a subpath to the list being built for an Append or MergeAppend.
  *
  * It's possible that the child is itself an Append or MergeAppend path, in
  * which case we can "cut out the middleman" and just add its child paths to
@@ -1767,26 +1891,53 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ * its result would be unsorted anyway.
+ *
+ * Normally, either path is a partial path and subpaths is a list of partial
+ * paths, or else path is a non-partial plan and subpaths is a list of those.
+ * However, if path is a parallel-aware Append, then we add its partial path
+ * children to subpaths and the rest to special_subpaths.  If the latter is
+ * NULL, we don't flatten the path at all (unless it contains only partial
+ * paths).
  */
-static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+static void
+accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
 {
 	if (IsA(path, AppendPath))
 	{
 		AppendPath *apath = (AppendPath *) path;
 
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		if (!apath->path.parallel_aware || apath->first_partial_path == 0)
+		{
+			/* list_copy is important here to avoid sharing list substructure */
+			*subpaths = list_concat(*subpaths, list_copy(apath->subpaths));
+			return;
+		}
+		else if (special_subpaths != NULL)
+		{
+			List	   *new_special_subpaths;
+
+			/* Split Parallel Append into partial and non-partial subpaths */
+			*subpaths = list_concat(*subpaths,
+									list_copy_tail(apath->subpaths,
+												   apath->first_partial_path));
+			new_special_subpaths =
+				list_truncate(list_copy(apath->subpaths),
+							  apath->first_partial_path);
+			*special_subpaths = list_concat(*special_subpaths,
+											new_special_subpaths);
+		}
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
 		MergeAppendPath *mpath = (MergeAppendPath *) path;
 
 		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		*subpaths = list_concat(*subpaths, list_copy(mpath->subpaths));
+		return;
 	}
-	else
-		return lappend(subpaths, path);
+
+	*subpaths = lappend(*subpaths, path);
 }
 
 /*
@@ -1809,7 +1960,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d11bf19..877827d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallel_append = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+					   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1742,6 +1745,167 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * append_nonpartial_cost
+ *	  Estimate the cost of the non-partial paths in a Parallel Append.
+ *	  The non-partial paths are assumed to be the first "numpaths" paths
+ *	  from the subpaths list, and to be in order of decreasing cost.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Array length is number of workers or number of relevants paths,
+	 * whichever is less.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+
+	/* The first few paths will each be claimed by a different worker. */
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since subpaths are sorted by decreasing cost, the last one will have
+	 * the minimum cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+		int			i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(AppendPath *apath)
+{
+	ListCell   *l;
+
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+
+	if (apath->subpaths == NIL)
+		return;
+
+	if (!apath->path.parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(apath->subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		apath->path.startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			apath->path.rows += subpath->rows;
+			apath->path.total_cost += subpath->total_cost;
+		}
+	}
+	else						/* parallel-aware */
+	{
+		int			i = 0;
+		double		parallel_divisor = get_parallel_divisor(&apath->path);
+
+		/* Calculate startup cost. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append will start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i == 0)
+				apath->path.startup_cost = subpath->startup_cost;
+			else if (i < apath->path.parallel_workers)
+				apath->path.startup_cost = Min(apath->path.startup_cost,
+											   subpath->startup_cost);
+
+			/*
+			 * Apply parallel divisor to non-partial subpaths.  Also add the
+			 * cost of partial paths to the total cost, but ignore non-partial
+			 * paths for now.
+			 */
+			if (i < apath->first_partial_path)
+				apath->path.rows += subpath->rows / parallel_divisor;
+			else
+			{
+				apath->path.rows += subpath->rows;
+				apath->path.total_cost += subpath->total_cost;
+			}
+
+			i++;
+		}
+
+		/* Add cost for non-partial subpaths. */
+		apath->path.total_cost +=
+			append_nonpartial_cost(apath->subpaths,
+								   apath->first_partial_path,
+								   apath->path.parallel_workers);
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 453f259..5e03f8b 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d445477..f6c83d0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1059,7 +1060,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5294,7 +5296,8 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5305,6 +5308,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f6b8bbf..bdd09ce 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3686,9 +3686,12 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
-								   NIL);
+								   false,
+								   NIL,
+								   -1);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index f620243..a24e8ac 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 68dee0f..75c53fb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int	append_total_cost_compare(const void *a, const void *b);
+static int	append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1208,44 +1210,50 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels, double rows)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
 
+	Assert(!parallel_aware || parallel_workers > 0);
+
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = parallel_aware;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
 	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	 * For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1253,10 +1261,54 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	Assert(!parallel_aware || pathnode->path.parallel_safe);
+
+	cost_append(pathnode);
+
+	/* If the caller provided a row estimate, override the computed value. */
+	if (rows >= 0)
+		pathnode->path.rows = rows;
+
 	return pathnode;
 }
 
 /*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
  *	  pathnode.
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5c3e86..46f5c42 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6dcd738..0f7a96d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallel_append,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 7f942cc..d878960 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a13..d42d506 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e05bc04..3a2e031 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -1000,13 +1001,22 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
-typedef struct AppendState
+
+struct AppendState;
+typedef struct AppendState AppendState;
+struct ParallelAppendState;
+typedef struct ParallelAppendState ParallelAppendState;
+
+struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
-} AppendState;
+	ParallelAppendState *as_pstate;	/* parallel coordination info */
+	Size		pstate_len;		/* size of parallel coordination info */
+	bool		(*choose_next_subplan) (AppendState *);
+};
 
 /* ----------------
  *	 MergeAppendState information
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e2..711db92 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9b38d44..02fb366 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9e68e65..3e6e02c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6c2317d..5a1fbf9 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallel_append;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(AppendPath *path);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16a..b4083c6 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels, double rows);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 596fdad..460843d 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faf..9692155 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallel_append;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d1d5b22..1aa2fd0 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,8 +11,33 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test with leader participation disabled
 set parallel_leader_participation = off;
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f3..2b738aa 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallel_append     | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc..3fafc5f 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallel_append;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index bb4e34a..07937fa 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test with leader participation disabled
 set parallel_leader_participation = off;

#125

amul sul

sulamul@gmail.com

about 8 years ago

In reply to: Amit Khandekar (#124)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Tue, Nov 21, 2017 at 2:22 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 21 November 2017 at 12:44, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:

On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks a lot Robert for the patch. I will have a look. Quickly tried
to test some aggregate queries with a partitioned pgbench_accounts
table, and it is crashing. Will get back with the fix, and any other
review comments.

Thanks
-Amit Khandekar

I was trying to get the performance of this patch at commit id -
11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20
with the following parameter settings,
work_mem = 1 GB
shared_buffers = 10GB
effective_cache_size = 10GB
max_parallel_workers_per_gather = 4
enable_partitionwise_join = on

and the details of the partitioning scheme is as follows,
tables partitioned = lineitem on l_orderkey and orders on o_orderkey
number of partitions in each table = 10

As per the explain outputs PA was used in following queries- 1, 3, 4,
5, 6, 7, 8, 10, 12, 14, 15, 18, and 21.
Unfortunately, at the time of executing any of these query, it is
crashing with the following information in core dump of each of the
workers,

Program terminated with signal 11, Segmentation fault.
#0 0x0000000010600984 in pg_atomic_read_u32_impl (ptr=0x3ffffec29294)
at ../../../../src/include/port/atomics/generic.h:48
48 return ptr->value;

In case this a different issue as you pointed upthread, you may want
to have a look at this as well.
Please let me know if you need any more information in this regard.

Right, for me the crash had occurred with a similar stack, although
the real crash happened in one of the workers. Attached is the script
file
pgbench_partitioned.sql to create a schema with which I had reproduced
the crash.

The query that crashed :
select sum(aid), avg(aid) from pgbench_accounts;

Set max_parallel_workers_per_gather to at least 5.

Also attached is v19 patch rebased.

I've spent little time to debug this crash. The crash happens in ExecAppend()
due to subnode in node->appendplans array is referred using incorrect
array index (out of bound value) in the following code:

/*
* figure out which subplan we are currently processing
*/
subnode = node->appendplans[node->as_whichplan];

This incorrect value to node->as_whichplan is get assigned in the
choose_next_subplan_for_worker().

By doing following change on the v19 patch does the fix for me:

--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
    }

    /* Pick the plan we found, and advance pa_next_plan one more time. */
-   node->as_whichplan = pstate->pa_next_plan;
+   node->as_whichplan = pstate->pa_next_plan++;
    if (pstate->pa_next_plan == node->as_nplans)
        pstate->pa_next_plan = append->first_partial_plan;
-   else
-       pstate->pa_next_plan++;

/* If non-partial, immediately mark as finished. */
if (node->as_whichplan < append->first_partial_plan)

Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch.

Regards,
Amul

Attachments:

fix_crash.patchapplication/octet-stream; name=fix_crash.patchDownload

diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 49d63f4dfe..e3b17cf0e2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
 	}
 
 	/* Pick the plan we found, and advance pa_next_plan one more time. */
-	node->as_whichplan = pstate->pa_next_plan;
+	node->as_whichplan = pstate->pa_next_plan++;
 	if (pstate->pa_next_plan == node->as_nplans)
 		pstate->pa_next_plan = append->first_partial_plan;
-	else
-		pstate->pa_next_plan++;
 
 	/* If non-partial, immediately mark as finished. */
 	if (node->as_whichplan < append->first_partial_plan)

#126

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: amul sul (#125)

Re: [HACKERS] Parallel Append implementation

On Tue, Nov 21, 2017 at 6:57 AM, amul sul <sulamul@gmail.com> wrote:

By doing following change on the v19 patch does the fix for me:
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
}
/* Pick the plan we found, and advance pa_next_plan one more time. */
-   node->as_whichplan = pstate->pa_next_plan;
+   node->as_whichplan = pstate->pa_next_plan++;
if (pstate->pa_next_plan == node->as_nplans)
pstate->pa_next_plan = append->first_partial_plan;
-   else
-       pstate->pa_next_plan++;
/* If non-partial, immediately mark as finished. */
if (node->as_whichplan < append->first_partial_plan)

Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch.

Yes, that looks like a correct fix. Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#127

amul sul

sulamul@gmail.com

about 8 years ago

In reply to: Robert Haas (#126)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Wed, Nov 22, 2017 at 1:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 21, 2017 at 6:57 AM, amul sul <sulamul@gmail.com> wrote:
By doing following change on the v19 patch does the fix for me:
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
}
/* Pick the plan we found, and advance pa_next_plan one more time. */
-   node->as_whichplan = pstate->pa_next_plan;
+   node->as_whichplan = pstate->pa_next_plan++;
if (pstate->pa_next_plan == node->as_nplans)
pstate->pa_next_plan = append->first_partial_plan;
-   else
-       pstate->pa_next_plan++;
/* If non-partial, immediately mark as finished. */
if (node->as_whichplan < append->first_partial_plan)

Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch.
Yes, that looks like a correct fix. Thanks.

Attaching updated version of "ParallelAppend_v19_rebased" includes this fix.

Regards,
Amul

Attachments:

ParallelAppend_v20.patchapplication/octet-stream; name=ParallelAppend_v20.patchDownload

From bb72f387513df8a77b423c43fa6c0eb23d6df29b Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Thu, 23 Nov 2017 09:34:28 +0530
Subject: [PATCH] ParallelAppend_v20

v20:
 Added crash fix [2]

v19_rebased:
 Patch given by Amit Khandekar [1]

 --------
  Ref:
 --------
 1] http://postgr.es/m/CAJ3gD9es=aSpwSkRW4ei-fRB119dUkS75iivUnh_BoCF7a9Bgw@mail.gmail.com
 2] http://postgr.es/m/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=KAHh3w@mail.gmail.com
---
 doc/src/sgml/config.sgml                      |  14 ++
 doc/src/sgml/monitoring.sgml                  |   7 +-
 src/backend/executor/execParallel.c           |  19 ++
 src/backend/executor/nodeAppend.c             | 318 +++++++++++++++++++++-----
 src/backend/nodes/copyfuncs.c                 |   1 +
 src/backend/nodes/list.c                      |  38 +++
 src/backend/nodes/outfuncs.c                  |   1 +
 src/backend/nodes/readfuncs.c                 |   1 +
 src/backend/optimizer/path/allpaths.c         | 216 ++++++++++++++---
 src/backend/optimizer/path/costsize.c         | 164 +++++++++++++
 src/backend/optimizer/path/joinrels.c         |   3 +-
 src/backend/optimizer/plan/createplan.c       |  10 +-
 src/backend/optimizer/plan/planner.c          |   5 +-
 src/backend/optimizer/prep/prepunion.c        |   7 +-
 src/backend/optimizer/util/pathnode.c         |  88 +++++--
 src/backend/storage/lmgr/lwlock.c             |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/nodeAppend.h             |   5 +
 src/include/nodes/execnodes.h                 |  14 +-
 src/include/nodes/pg_list.h                   |   3 +
 src/include/nodes/plannodes.h                 |   1 +
 src/include/nodes/relation.h                  |   6 +
 src/include/optimizer/cost.h                  |   2 +
 src/include/optimizer/pathnode.h              |   9 +-
 src/include/storage/lwlock.h                  |   1 +
 src/test/regress/expected/inherit.out         |   2 +
 src/test/regress/expected/select_parallel.out |  86 ++++++-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/inherit.sql              |   2 +
 src/test/regress/sql/select_parallel.sql      |  23 +-
 31 files changed, 930 insertions(+), 130 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc1752fb3f..553f1bb225 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3632,6 +3632,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallel-append" xreflabel="enable_parallel_append">
+      <term><varname>enable_parallel_append</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallel_append</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6f8203355e..4d8f17de02 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="62"><literal>LWLock</literal></entry>
+        <entry morerows="63"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1116,6 +1116,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>tbm</literal></entry>
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
+        <row>
+         <entry><literal>parallel_append</literal></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
         <row>
          <entry morerows="9"><literal>Lock</literal></entry>
          <entry><literal>relation</literal></entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53c5254be1..ab832767eb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -26,6 +26,7 @@
 #include "executor/execExpr.h"
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -249,6 +250,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -448,6 +454,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -862,6 +873,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecForeignScanReInitializeDSM((ForeignScanState *) planstate,
 											   pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanReInitializeDSM((CustomScanState *) planstate,
@@ -1150,6 +1165,10 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												pwcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, pwcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 1d2fb35d55..49d63f4dfe 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -61,51 +61,26 @@
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
-
-
-/* ----------------------------------------------------------------
- *		exec_append_initialize_next
- *
- *		Sets up the append state node for the "next" scan.
- *
- *		Returns t iff there is a "next" scan to process.
- * ----------------------------------------------------------------
- */
-static bool
-exec_append_initialize_next(AppendState *appendstate)
+/* Shared state for parallel-aware Append. */
+struct ParallelAppendState
 {
-	int			whichplan;
-
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 	/*
-	 * get information from the append node
+	 * pa_finished[i] should be true if no more workers should select
+	 * subplan i.  for a non-partial plan, this should be set to true as soon
+	 * as a worker selects the plan; for a partial plan, it remains false
+	 * until some worker executes the plan to completion.
 	 */
-	whichplan = appendstate->as_whichplan;
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+};
 
-	if (whichplan < 0)
-	{
-		/*
-		 * if scanning in reverse, we start at the last scan in the list and
-		 * then proceed back to the first.. in any case we inform ExecAppend
-		 * that we are at the end of the line by returning false
-		 */
-		appendstate->as_whichplan = 0;
-		return false;
-	}
-	else if (whichplan >= appendstate->as_nplans)
-	{
-		/*
-		 * as above, end the scan if we go beyond the last scan in our list..
-		 */
-		appendstate->as_whichplan = appendstate->as_nplans - 1;
-		return false;
-	}
-	else
-	{
-		return true;
-	}
-}
+#define INVALID_SUBPLAN_INDEX		-1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool choose_next_subplan_locally(AppendState *node);
+static bool choose_next_subplan_for_leader(AppendState *node);
+static bool choose_next_subplan_for_worker(AppendState *node);
 
 /* ----------------------------------------------------------------
  *		ExecInitAppend
@@ -185,10 +160,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Parallel-aware append plans must choose the first subplan to
+	 * execute by looking at shared memory, but non-parallel-aware
+	 * append plans can always start with the first subplan.
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	appendstate->as_whichplan =
+		appendstate->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+
+	/* If parallel-aware, this will be overridden later. */
+	appendstate->choose_next_subplan = choose_next_subplan_locally;
 
 	return appendstate;
 }
@@ -204,6 +184,11 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/* If no subplan has been chosen, we must choose one before proceeding. */
+	if (node->as_whichplan == INVALID_SUBPLAN_INDEX &&
+		!node->choose_next_subplan(node))
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -231,19 +216,9 @@ ExecAppend(PlanState *pstate)
 			return result;
 		}
 
-		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
-		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
+		/* choose new subplan; if none, we're done */
+		if (!node->choose_next_subplan(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
-		/* Else loop back and try to get a tuple from the new subplan */
 	}
 }
 
@@ -298,6 +273,233 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	node->as_whichplan =
+		node->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pstate_len =
+		add_size(offsetof(ParallelAppendState, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up shared state for Parallel Append.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate;
+
+	pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
+	memset(pstate, 0, node->pstate_len);
+	LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+
+	node->as_pstate = pstate;
+	node->choose_next_subplan = choose_next_subplan_for_leader;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+
+	pstate->pa_next_plan = 0;
+	memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt)
+{
+	node->as_pstate = shm_toc_lookup(pwcxt->toc, node->ps.plan->plan_node_id, false);
+	node->choose_next_subplan = choose_next_subplan_for_worker;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_locally
+ *
+ *		Choose next subplan for a non-parallel-aware Append,
+ *		returning false if there are no more.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_locally(AppendState *node)
+{
+	int			whichplan = node->as_whichplan;
+
+	/* We should never see INVALID_SUBPLAN_INDEX in this case. */
+	Assert(whichplan >= 0 && whichplan <= node->as_nplans);
+
+	if (ScanDirectionIsForward(node->ps.state->es_direction))
+	{
+		if (whichplan >= node->as_nplans - 1)
+			return false;
+		node->as_whichplan++;
+	}
+	else
+	{
+		if (whichplan <= 0)
+			return false;
+		node->as_whichplan--;
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_leader
+ *
+ *      Try to pick a plan which doesn't commit us to doing much
+ *      work locally, so that as much work as possible is done in
+ *      the workers.  Cheapest subplans are at the end.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_leader(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+	{
+		/* Mark just-completed subplan as finished. */
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+	}
+	else
+	{
+		/* Start with last subplan. */
+		node->as_whichplan = node->as_nplans - 1;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[node->as_whichplan])
+	{
+		if (node->as_whichplan == 0)
+		{
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			node->as_whichplan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+		node->as_whichplan--;
+	}
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_worker
+ *
+ *		Choose next subplan for a parallel-aware Append, returning
+ *		false if there are no more.
+ *
+ *		We start from the first plan and advance through the list;
+ *		when we get back to the end, we loop back to the first
+ *		nonpartial plan.  This assigns the non-partial plans first
+ *		in order of descending cost and then spreads out the
+ *		workers as evenly as possible across the remaining partial
+ *		plans.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_worker(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	/* Mark just-completed subplan as finished. */
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	/* If all the plans are already done, we have nothing to do */
+	if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
+	{
+		LWLockRelease(&pstate->pa_lock);
+		return false;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[pstate->pa_next_plan])
+	{
+		if (pstate->pa_next_plan >= node->as_nplans - 1)
+			pstate->pa_next_plan = append->first_partial_plan;
+		else
+			pstate->pa_next_plan++;
+		if (pstate->pa_next_plan == node->as_whichplan)
+		{
+			/* We've tried everything! */
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+	}
+
+	/* Pick the plan we found, and advance pa_next_plan one more time. */
+	node->as_whichplan = pstate->pa_next_plan;
+	if (pstate->pa_next_plan == node->as_nplans)
+		pstate->pa_next_plan = append->first_partial_plan;
+	else
+		pstate->pa_next_plan++;
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d9ff8a7e51..82a511a5dc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5315..bee6244adc 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1249,6 +1249,44 @@ list_copy_tail(const List *oldlist, int nskip)
 	return newlist;
 }
 
+/*
+ * Sort a list using qsort. A sorted list is built but the cells of the
+ * original list are re-used.  The comparator function receives arguments of
+ * type ListCell **
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = list->type;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len - 1];
+
+	for (i = 0; i < len - 1; i++)
+		list_arr[i]->next = list_arr[i + 1];
+
+	list_arr[len - 1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
 /*
  * Temporary compatibility functions
  *
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c97ee24ade..b59a5219a7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 7eb67fc040..0d17ae89b0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1600,6 +1600,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08ab37..9cd2884de5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,7 +101,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static void accumulate_append_subpath(Path *path,
+						  List **subpaths, List **special_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1331,13 +1332,17 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallel_append;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
 	RangeTblEntry *rte;
 	bool		build_partitioned_rels = false;
+	double		partial_rows = -1;
 
 	if (IS_SIMPLE_REL(rel))
 	{
@@ -1388,6 +1393,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	{
 		RelOptInfo *childrel = lfirst(l);
 		ListCell   *lcp;
+		Path	   *cheapest_partial_path = NULL;
 
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
@@ -1408,18 +1414,69 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 * If not, there's no workable unparameterized path.
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
-			subpaths = accumulate_append_subpath(subpaths,
-												 childrel->cheapest_total_path);
+			accumulate_append_subpath(childrel->cheapest_total_path,
+									  &subpaths, NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
-			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-														 linitial(childrel->partial_pathlist));
+		{
+			cheapest_partial_path = linitial(childrel->partial_pathlist);
+			accumulate_append_subpath(cheapest_partial_path,
+									  &partial_subpaths, NULL);
+		}
 		else
 			partial_subpaths_valid = false;
 
+		/*
+		 * Same idea, but for a parallel append mixing partial and non-partial
+		 * paths.
+		 */
+		if (pa_subpaths_valid)
+		{
+			Path	   *nppath = NULL;
+
+			nppath =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			if (cheapest_partial_path == NULL && nppath == NULL)
+			{
+				/* Neither a partial nor a parallel-safe path?  Forget it. */
+				pa_subpaths_valid = false;
+			}
+			else if (nppath == NULL ||
+					 (cheapest_partial_path != NULL &&
+					  cheapest_partial_path->total_cost < nppath->total_cost))
+			{
+				/* Partial path is cheaper or the only option. */
+				Assert(cheapest_partial_path != NULL);
+				accumulate_append_subpath(cheapest_partial_path,
+										  &pa_partial_subpaths,
+										  &pa_nonpartial_subpaths);
+
+			}
+			else
+			{
+				/*
+				 * Either we've got only a non-partial path, or we think that
+				 * a single backend can execute the best non-partial path
+				 * faster than all the parallel backends working together can
+				 * execute the best partial path.
+				 *
+				 * It might make sense to be more aggressive here.  Even if
+				 * the best non-partial path is more expensive than the best
+				 * partial path, it could still be better to choose the
+				 * non-partial path if there are several such paths that can
+				 * be given to different workers.  For now, we don't try to
+				 * figure that out.
+				 */
+				accumulate_append_subpath(nppath,
+										  &pa_nonpartial_subpaths,
+										  NULL);
+			}
+		}
+
 		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
@@ -1491,11 +1548,13 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
-												  partitioned_rels));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
+												  partitioned_rels, -1));
 
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider an append of unordered, unparameterized partial paths.  Make
+	 * it parallel-aware if possible.
 	 */
 	if (partial_subpaths_valid)
 	{
@@ -1503,12 +1562,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
+		/* Find the highest number of workers requested for any subpath. */
 		foreach(lc, partial_subpaths)
 		{
 			Path	   *path = lfirst(lc);
@@ -1517,9 +1571,78 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
+		/*
+		 * If the use of parallel append is permitted, always request at least
+		 * log2(# of children) paths.  We assume it can be useful to have
+		 * extra workers in this case because they will be spread out across
+		 * the children.  The precise formula is just a guess, but we don't
+		 * want to end up with a radically different answer for a table with N
+		 * partitions vs. an unpartitioned table with the same data, so the
+		 * use of some kind of log-scaling here seems to make some sense.
+		 */
+		if (enable_parallel_append)
+		{
+			parallel_workers = Max(parallel_workers,
+								   fls(list_length(live_childrels)));
+			parallel_workers = Min(parallel_workers,
+								   max_parallel_workers_per_gather);
+		}
+		Assert(parallel_workers > 0);
+
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths, NULL,
+										parallel_workers,
+										enable_parallel_append,
+										partitioned_rels, -1);
+
+		/*
+		 * Make sure any subsequent partial paths use the same row count
+		 * estimate.
+		 */
+		partial_rows = appendpath->path.rows;
+
+		/* Add the path. */
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
+	/*
+	 * Consider a parallel-aware append using a mix of partial and non-partial
+	 * paths.  (This only makes sense if there's at least one child which has
+	 * a non-partial path that is substantially cheaper than any partial path;
+	 * otherwise, we should use the append path added in the previous step.)
+	 */
+	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
+	{
+		AppendPath *appendpath;
+		ListCell   *lc;
+		int			parallel_workers = 0;
+
+		/*
+		 * Find the highest number of workers requested for any partial
+		 * subpath.
+		 */
+		foreach(lc, pa_partial_subpaths)
+		{
+			Path	   *path = lfirst(lc);
+
+			parallel_workers = Max(parallel_workers, path->parallel_workers);
+		}
+
+		/*
+		 * Same formula here as above.  It's even more important in this
+		 * instance because the non-partial paths won't contribute anything to
+		 * the planned number of parallel workers.
+		 */
+		parallel_workers = Max(parallel_workers,
+							   fls(list_length(live_childrels)));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+		Assert(parallel_workers > 0);
+
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels, partial_rows);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1567,13 +1690,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			accumulate_append_subpath(subpath, &subpaths, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
-										partitioned_rels));
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
+										partitioned_rels, -1));
 	}
 }
 
@@ -1657,10 +1781,10 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			if (cheapest_startup != cheapest_total)
 				startup_neq_total = true;
 
-			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
-			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+			accumulate_append_subpath(cheapest_startup,
+									  &startup_subpaths, NULL);
+			accumulate_append_subpath(cheapest_total,
+									  &total_subpaths, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1756,7 +1880,7 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * accumulate_append_subpath
- *		Add a subpath to the list being built for an Append or MergeAppend
+ *		Add a subpath to the list being built for an Append or MergeAppend.
  *
  * It's possible that the child is itself an Append or MergeAppend path, in
  * which case we can "cut out the middleman" and just add its child paths to
@@ -1767,26 +1891,53 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ * its result would be unsorted anyway.
+ *
+ * Normally, either path is a partial path and subpaths is a list of partial
+ * paths, or else path is a non-partial plan and subpaths is a list of those.
+ * However, if path is a parallel-aware Append, then we add its partial path
+ * children to subpaths and the rest to special_subpaths.  If the latter is
+ * NULL, we don't flatten the path at all (unless it contains only partial
+ * paths).
  */
-static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+static void
+accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
 {
 	if (IsA(path, AppendPath))
 	{
 		AppendPath *apath = (AppendPath *) path;
 
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		if (!apath->path.parallel_aware || apath->first_partial_path == 0)
+		{
+			/* list_copy is important here to avoid sharing list substructure */
+			*subpaths = list_concat(*subpaths, list_copy(apath->subpaths));
+			return;
+		}
+		else if (special_subpaths != NULL)
+		{
+			List	   *new_special_subpaths;
+
+			/* Split Parallel Append into partial and non-partial subpaths */
+			*subpaths = list_concat(*subpaths,
+									list_copy_tail(apath->subpaths,
+												   apath->first_partial_path));
+			new_special_subpaths =
+				list_truncate(list_copy(apath->subpaths),
+							  apath->first_partial_path);
+			*special_subpaths = list_concat(*special_subpaths,
+											new_special_subpaths);
+		}
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
 		MergeAppendPath *mpath = (MergeAppendPath *) path;
 
 		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		*subpaths = list_concat(*subpaths, list_copy(mpath->subpaths));
+		return;
 	}
-	else
-		return lappend(subpaths, path);
+
+	*subpaths = lappend(*subpaths, path);
 }
 
 /*
@@ -1809,7 +1960,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d11bf19e30..877827dcb5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallel_append = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+					   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,167 @@ cost_sort(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * append_nonpartial_cost
+ *	  Estimate the cost of the non-partial paths in a Parallel Append.
+ *	  The non-partial paths are assumed to be the first "numpaths" paths
+ *	  from the subpaths list, and to be in order of decreasing cost.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Array length is number of workers or number of relevants paths,
+	 * whichever is less.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+
+	/* The first few paths will each be claimed by a different worker. */
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since subpaths are sorted by decreasing cost, the last one will have
+	 * the minimum cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+		int			i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(AppendPath *apath)
+{
+	ListCell   *l;
+
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+
+	if (apath->subpaths == NIL)
+		return;
+
+	if (!apath->path.parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(apath->subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		apath->path.startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			apath->path.rows += subpath->rows;
+			apath->path.total_cost += subpath->total_cost;
+		}
+	}
+	else						/* parallel-aware */
+	{
+		int			i = 0;
+		double		parallel_divisor = get_parallel_divisor(&apath->path);
+
+		/* Calculate startup cost. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append will start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i == 0)
+				apath->path.startup_cost = subpath->startup_cost;
+			else if (i < apath->path.parallel_workers)
+				apath->path.startup_cost = Min(apath->path.startup_cost,
+											   subpath->startup_cost);
+
+			/*
+			 * Apply parallel divisor to non-partial subpaths.  Also add the
+			 * cost of partial paths to the total cost, but ignore non-partial
+			 * paths for now.
+			 */
+			if (i < apath->first_partial_path)
+				apath->path.rows += subpath->rows / parallel_divisor;
+			else
+			{
+				apath->path.rows += subpath->rows;
+				apath->path.total_cost += subpath->total_cost;
+			}
+
+			i++;
+		}
+
+		/* Add cost for non-partial subpaths. */
+		apath->path.total_cost +=
+			append_nonpartial_cost(apath->subpaths,
+								   apath->first_partial_path,
+								   apath->path.parallel_workers);
+	}
+}
+
 /*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 453f25964a..5e03f8bc21 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d4454779ee..f6c83d0477 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1059,7 +1060,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5294,7 +5296,8 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5305,6 +5308,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f6b8bbf5fa..bdd09ce78b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3686,9 +3686,12 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
-								   NIL);
+								   false,
+								   NIL,
+								   -1);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index f620243ab4..a24e8acfa6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 68dee0f501..75c53fbeb2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int	append_total_cost_compare(const void *a, const void *b);
+static int	append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1208,44 +1210,50 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels, double rows)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
 
+	Assert(!parallel_aware || parallel_workers > 0);
+
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = parallel_aware;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
 	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	 * For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1253,9 +1261,53 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	Assert(!parallel_aware || pathnode->path.parallel_safe);
+
+	cost_append(pathnode);
+
+	/* If the caller provided a row estimate, override the computed value. */
+	if (rows >= 0)
+		pathnode->path.rows = rows;
+
 	return pathnode;
 }
 
+/*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5c3e86709..46f5c4277d 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6dcd738be6..0f7a96d85c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallel_append,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 16ffbbeea8..842cf3601a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a1380e..d42d50614c 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e05bc04f52..3a2e0313e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -1000,13 +1001,22 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
-typedef struct AppendState
+
+struct AppendState;
+typedef struct AppendState AppendState;
+struct ParallelAppendState;
+typedef struct ParallelAppendState ParallelAppendState;
+
+struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
-} AppendState;
+	ParallelAppendState *as_pstate;	/* parallel coordination info */
+	Size		pstate_len;		/* size of parallel coordination info */
+	bool		(*choose_next_subplan) (AppendState *);
+};
 
 /* ----------------
  *	 MergeAppendState information
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e269c..711db92576 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9b38d44ba0..02fb366680 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9e68e65cc6..3e6e02c1f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6c2317df39..5a1fbf97c3 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallel_append;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(AppendPath *path);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..b4083c6a55 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels, double rows);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 596fdadc63..460843d73e 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..96921552bb 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallel_append;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d1d5b228ce..1aa2fd0497 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,8 +11,33 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test with leader participation disabled
 set parallel_leader_participation = off;
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f301d..2b738aae7c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallel_append     | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc0f5..3fafc5f61a 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallel_append;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index bb4e34adbe..07937fa4be 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test with leader participation disabled
 set parallel_leader_participation = off;
-- 
2.14.1

#128

Rajkumar Raghuwanshi

rajkumar.raghuwanshi@enterprisedb.com

about 8 years ago

In reply to: amul sul (#127)

Re: [HACKERS] Parallel Append implementation

On Thu, Nov 23, 2017 at 9:45 AM, amul sul <sulamul@gmail.com> wrote:

Attaching updated version of "ParallelAppend_v19_rebased" includes this
fix.

Hi,

I have applied attached patch and got a crash with below query. please take
a look.

CREATE TABLE tbl (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE tbl_p1 PARTITION OF tbl FOR VALUES IN ('0000', '0001', '0002',
'0003');
CREATE TABLE tbl_p2 PARTITION OF tbl FOR VALUES IN ('0004', '0005', '0006',
'0007');
CREATE TABLE tbl_p3 PARTITION OF tbl FOR VALUES IN ('0008', '0009', '0010',
'0011');
INSERT INTO tbl SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30
FROM generate_series(0, 9999999) i;
ANALYZE tbl;

EXPLAIN ANALYZE SELECT c, sum(a), avg(b), COUNT(*) FROM tbl GROUP BY c
HAVING avg(d) < 15 ORDER BY 1, 2, 3;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

stack-trace is given below.

Reading symbols from /lib64/libnss_files.so.2...Reading symbols from
/usr/lib/debug/lib64/libnss_files-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `postgres: parallel worker for PID
104999 '.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
238 if (node->chgParam != NULL) /* something changed? */
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
#1 0x00000000006dc72e in ExecAppend (pstate=0x1947ed0) at nodeAppend.c:207
#2 0x00000000006d1e7c in ExecProcNodeInstr (node=0x1947ed0) at
execProcnode.c:446
#3 0x00000000006dcef1 in ExecProcNode (node=0x1947ed0) at
../../../src/include/executor/executor.h:241
#4 0x00000000006dd398 in fetch_input_tuple (aggstate=0x1947fe8) at
nodeAgg.c:699
#5 0x00000000006e02f7 in agg_fill_hash_table (aggstate=0x1947fe8) at
nodeAgg.c:2536
#6 0x00000000006dfb37 in ExecAgg (pstate=0x1947fe8) at nodeAgg.c:2148
#7 0x00000000006d1e7c in ExecProcNodeInstr (node=0x1947fe8) at
execProcnode.c:446
#8 0x00000000006d1e4d in ExecProcNodeFirst (node=0x1947fe8) at
execProcnode.c:430
#9 0x00000000006c9439 in ExecProcNode (node=0x1947fe8) at
../../../src/include/executor/executor.h:241
#10 0x00000000006cbd73 in ExecutePlan (estate=0x1947590,
planstate=0x1947fe8, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x192acb0, execute_once=1 '\001')
at execMain.c:1718
#11 0x00000000006c9a12 in standard_ExecutorRun (queryDesc=0x194ffc0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:361
#12 0x00000000006c982e in ExecutorRun (queryDesc=0x194ffc0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:304
#13 0x00000000006d096c in ParallelQueryMain (seg=0x18aa2a8,
toc=0x7f899a227000) at execParallel.c:1271
#14 0x000000000053272d in ParallelWorkerMain (main_arg=1218206688) at
parallel.c:1149
#15 0x00000000007e8ca5 in StartBackgroundWorker () at bgworker.c:841
#16 0x00000000007fc035 in do_start_bgworker (rw=0x18ced00) at
postmaster.c:5741
#17 0x00000000007fc377 in maybe_start_bgworkers () at postmaster.c:5945
#18 0x00000000007fb406 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5134
#19 <signal handler called>
#20 0x0000003dd26e1603 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#21 0x00000000007f6bfa in ServerLoop () at postmaster.c:1721
#22 0x00000000007f63e9 in PostmasterMain (argc=3, argv=0x18a8180) at
postmaster.c:1365
#23 0x000000000072cb4c in main (argc=3, argv=0x18a8180) at main.c:228
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

#129

amul sul

sulamul@gmail.com

about 8 years ago

In reply to: Rajkumar Raghuwanshi (#128)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

Look like it is the same crash what v20 claim to be fixed, indeed I
missed to add fix[1] in v20 patch, sorry about that. Attached updated
patch includes aforementioned fix.

1] /messages/by-id/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=KAHh3w@mail.gmail.com

Regards,
Amul

On Thu, Nov 23, 2017 at 1:50 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:

Show quoted text

On Thu, Nov 23, 2017 at 9:45 AM, amul sul <sulamul@gmail.com> wrote:

Attaching updated version of "ParallelAppend_v19_rebased" includes this
fix.

Hi,

I have applied attached patch and got a crash with below query. please take
a look.

CREATE TABLE tbl (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE tbl_p1 PARTITION OF tbl FOR VALUES IN ('0000', '0001', '0002',
'0003');
CREATE TABLE tbl_p2 PARTITION OF tbl FOR VALUES IN ('0004', '0005', '0006',
'0007');
CREATE TABLE tbl_p3 PARTITION OF tbl FOR VALUES IN ('0008', '0009', '0010',
'0011');
INSERT INTO tbl SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30
FROM generate_series(0, 9999999) i;
ANALYZE tbl;

EXPLAIN ANALYZE SELECT c, sum(a), avg(b), COUNT(*) FROM tbl GROUP BY c
HAVING avg(d) < 15 ORDER BY 1, 2, 3;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

stack-trace is given below.

Reading symbols from /lib64/libnss_files.so.2...Reading symbols from
/usr/lib/debug/lib64/libnss_files-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `postgres: parallel worker for PID 104999
'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
238 if (node->chgParam != NULL) /* something changed? */
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
#1 0x00000000006dc72e in ExecAppend (pstate=0x1947ed0) at nodeAppend.c:207
#2 0x00000000006d1e7c in ExecProcNodeInstr (node=0x1947ed0) at
execProcnode.c:446
#3 0x00000000006dcef1 in ExecProcNode (node=0x1947ed0) at
../../../src/include/executor/executor.h:241
#4 0x00000000006dd398 in fetch_input_tuple (aggstate=0x1947fe8) at
nodeAgg.c:699
#5 0x00000000006e02f7 in agg_fill_hash_table (aggstate=0x1947fe8) at
nodeAgg.c:2536
#6 0x00000000006dfb37 in ExecAgg (pstate=0x1947fe8) at nodeAgg.c:2148
#7 0x00000000006d1e7c in ExecProcNodeInstr (node=0x1947fe8) at
execProcnode.c:446
#8 0x00000000006d1e4d in ExecProcNodeFirst (node=0x1947fe8) at
execProcnode.c:430
#9 0x00000000006c9439 in ExecProcNode (node=0x1947fe8) at
../../../src/include/executor/executor.h:241
#10 0x00000000006cbd73 in ExecutePlan (estate=0x1947590,
planstate=0x1947fe8, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x192acb0, execute_once=1 '\001')
at execMain.c:1718
#11 0x00000000006c9a12 in standard_ExecutorRun (queryDesc=0x194ffc0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:361
#12 0x00000000006c982e in ExecutorRun (queryDesc=0x194ffc0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:304
#13 0x00000000006d096c in ParallelQueryMain (seg=0x18aa2a8,
toc=0x7f899a227000) at execParallel.c:1271
#14 0x000000000053272d in ParallelWorkerMain (main_arg=1218206688) at
parallel.c:1149
#15 0x00000000007e8ca5 in StartBackgroundWorker () at bgworker.c:841
#16 0x00000000007fc035 in do_start_bgworker (rw=0x18ced00) at
postmaster.c:5741
#17 0x00000000007fc377 in maybe_start_bgworkers () at postmaster.c:5945
#18 0x00000000007fb406 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5134
#19 <signal handler called>
#20 0x0000003dd26e1603 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#21 0x00000000007f6bfa in ServerLoop () at postmaster.c:1721
#22 0x00000000007f63e9 in PostmasterMain (argc=3, argv=0x18a8180) at
postmaster.c:1365
#23 0x000000000072cb4c in main (argc=3, argv=0x18a8180) at main.c:228
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Attachments:

ParallelAppend_v21.patchapplication/octet-stream; name=ParallelAppend_v21.patchDownload

From 11bfefb191ff189514337f11d5cc96bc257fc21d Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Thu, 23 Nov 2017 09:34:28 +0530
Subject: [PATCH] ParallelAppend_v21

v21:
 Crash fix[2] wasnt included in v20, added in this.

v20:
 Added crash fix [2]

v19_rebased:
 Patch given by Amit Khandekar [1]

 --------
  Ref:
 --------
 1] http://postgr.es/m/CAJ3gD9es=aSpwSkRW4ei-fRB119dUkS75iivUnh_BoCF7a9Bgw@mail.gmail.com
 2] http://postgr.es/m/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=KAHh3w@mail.gmail.com
---
 doc/src/sgml/config.sgml                      |  14 ++
 doc/src/sgml/monitoring.sgml                  |   7 +-
 src/backend/executor/execParallel.c           |  19 ++
 src/backend/executor/nodeAppend.c             | 316 +++++++++++++++++++++-----
 src/backend/nodes/copyfuncs.c                 |   1 +
 src/backend/nodes/list.c                      |  38 ++++
 src/backend/nodes/outfuncs.c                  |   1 +
 src/backend/nodes/readfuncs.c                 |   1 +
 src/backend/optimizer/path/allpaths.c         | 216 +++++++++++++++---
 src/backend/optimizer/path/costsize.c         | 164 +++++++++++++
 src/backend/optimizer/path/joinrels.c         |   3 +-
 src/backend/optimizer/plan/createplan.c       |  10 +-
 src/backend/optimizer/plan/planner.c          |   5 +-
 src/backend/optimizer/prep/prepunion.c        |   7 +-
 src/backend/optimizer/util/pathnode.c         |  88 +++++--
 src/backend/storage/lmgr/lwlock.c             |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/nodeAppend.h             |   5 +
 src/include/nodes/execnodes.h                 |  14 +-
 src/include/nodes/pg_list.h                   |   3 +
 src/include/nodes/plannodes.h                 |   1 +
 src/include/nodes/relation.h                  |   6 +
 src/include/optimizer/cost.h                  |   2 +
 src/include/optimizer/pathnode.h              |   9 +-
 src/include/storage/lwlock.h                  |   1 +
 src/test/regress/expected/inherit.out         |   2 +
 src/test/regress/expected/select_parallel.out |  86 ++++++-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/inherit.sql              |   2 +
 src/test/regress/sql/select_parallel.sql      |  23 +-
 31 files changed, 928 insertions(+), 130 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc1752fb3f..553f1bb225 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3632,6 +3632,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallel-append" xreflabel="enable_parallel_append">
+      <term><varname>enable_parallel_append</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallel_append</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6f8203355e..4d8f17de02 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="62"><literal>LWLock</literal></entry>
+        <entry morerows="63"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1116,6 +1116,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>tbm</literal></entry>
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
+        <row>
+         <entry><literal>parallel_append</literal></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
         <row>
          <entry morerows="9"><literal>Lock</literal></entry>
          <entry><literal>relation</literal></entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53c5254be1..ab832767eb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -26,6 +26,7 @@
 #include "executor/execExpr.h"
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -249,6 +250,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -448,6 +454,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -862,6 +873,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecForeignScanReInitializeDSM((ForeignScanState *) planstate,
 											   pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanReInitializeDSM((CustomScanState *) planstate,
@@ -1150,6 +1165,10 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												pwcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, pwcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 1d2fb35d55..e3b17cf0e2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -61,51 +61,26 @@
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
-
-
-/* ----------------------------------------------------------------
- *		exec_append_initialize_next
- *
- *		Sets up the append state node for the "next" scan.
- *
- *		Returns t iff there is a "next" scan to process.
- * ----------------------------------------------------------------
- */
-static bool
-exec_append_initialize_next(AppendState *appendstate)
+/* Shared state for parallel-aware Append. */
+struct ParallelAppendState
 {
-	int			whichplan;
-
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 	/*
-	 * get information from the append node
+	 * pa_finished[i] should be true if no more workers should select
+	 * subplan i.  for a non-partial plan, this should be set to true as soon
+	 * as a worker selects the plan; for a partial plan, it remains false
+	 * until some worker executes the plan to completion.
 	 */
-	whichplan = appendstate->as_whichplan;
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+};
 
-	if (whichplan < 0)
-	{
-		/*
-		 * if scanning in reverse, we start at the last scan in the list and
-		 * then proceed back to the first.. in any case we inform ExecAppend
-		 * that we are at the end of the line by returning false
-		 */
-		appendstate->as_whichplan = 0;
-		return false;
-	}
-	else if (whichplan >= appendstate->as_nplans)
-	{
-		/*
-		 * as above, end the scan if we go beyond the last scan in our list..
-		 */
-		appendstate->as_whichplan = appendstate->as_nplans - 1;
-		return false;
-	}
-	else
-	{
-		return true;
-	}
-}
+#define INVALID_SUBPLAN_INDEX		-1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool choose_next_subplan_locally(AppendState *node);
+static bool choose_next_subplan_for_leader(AppendState *node);
+static bool choose_next_subplan_for_worker(AppendState *node);
 
 /* ----------------------------------------------------------------
  *		ExecInitAppend
@@ -185,10 +160,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Parallel-aware append plans must choose the first subplan to
+	 * execute by looking at shared memory, but non-parallel-aware
+	 * append plans can always start with the first subplan.
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	appendstate->as_whichplan =
+		appendstate->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+
+	/* If parallel-aware, this will be overridden later. */
+	appendstate->choose_next_subplan = choose_next_subplan_locally;
 
 	return appendstate;
 }
@@ -204,6 +184,11 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/* If no subplan has been chosen, we must choose one before proceeding. */
+	if (node->as_whichplan == INVALID_SUBPLAN_INDEX &&
+		!node->choose_next_subplan(node))
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -231,19 +216,9 @@ ExecAppend(PlanState *pstate)
 			return result;
 		}
 
-		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
-		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
+		/* choose new subplan; if none, we're done */
+		if (!node->choose_next_subplan(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
-		/* Else loop back and try to get a tuple from the new subplan */
 	}
 }
 
@@ -298,6 +273,231 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	node->as_whichplan =
+		node->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pstate_len =
+		add_size(offsetof(ParallelAppendState, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up shared state for Parallel Append.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate;
+
+	pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
+	memset(pstate, 0, node->pstate_len);
+	LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+
+	node->as_pstate = pstate;
+	node->choose_next_subplan = choose_next_subplan_for_leader;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+
+	pstate->pa_next_plan = 0;
+	memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt)
+{
+	node->as_pstate = shm_toc_lookup(pwcxt->toc, node->ps.plan->plan_node_id, false);
+	node->choose_next_subplan = choose_next_subplan_for_worker;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_locally
+ *
+ *		Choose next subplan for a non-parallel-aware Append,
+ *		returning false if there are no more.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_locally(AppendState *node)
+{
+	int			whichplan = node->as_whichplan;
+
+	/* We should never see INVALID_SUBPLAN_INDEX in this case. */
+	Assert(whichplan >= 0 && whichplan <= node->as_nplans);
+
+	if (ScanDirectionIsForward(node->ps.state->es_direction))
+	{
+		if (whichplan >= node->as_nplans - 1)
+			return false;
+		node->as_whichplan++;
+	}
+	else
+	{
+		if (whichplan <= 0)
+			return false;
+		node->as_whichplan--;
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_leader
+ *
+ *      Try to pick a plan which doesn't commit us to doing much
+ *      work locally, so that as much work as possible is done in
+ *      the workers.  Cheapest subplans are at the end.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_leader(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+	{
+		/* Mark just-completed subplan as finished. */
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+	}
+	else
+	{
+		/* Start with last subplan. */
+		node->as_whichplan = node->as_nplans - 1;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[node->as_whichplan])
+	{
+		if (node->as_whichplan == 0)
+		{
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			node->as_whichplan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+		node->as_whichplan--;
+	}
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_worker
+ *
+ *		Choose next subplan for a parallel-aware Append, returning
+ *		false if there are no more.
+ *
+ *		We start from the first plan and advance through the list;
+ *		when we get back to the end, we loop back to the first
+ *		nonpartial plan.  This assigns the non-partial plans first
+ *		in order of descending cost and then spreads out the
+ *		workers as evenly as possible across the remaining partial
+ *		plans.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_worker(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	/* Mark just-completed subplan as finished. */
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	/* If all the plans are already done, we have nothing to do */
+	if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
+	{
+		LWLockRelease(&pstate->pa_lock);
+		return false;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[pstate->pa_next_plan])
+	{
+		if (pstate->pa_next_plan >= node->as_nplans - 1)
+			pstate->pa_next_plan = append->first_partial_plan;
+		else
+			pstate->pa_next_plan++;
+		if (pstate->pa_next_plan == node->as_whichplan)
+		{
+			/* We've tried everything! */
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+	}
+
+	/* Pick the plan we found, and advance pa_next_plan one more time. */
+	node->as_whichplan = pstate->pa_next_plan++;
+	if (pstate->pa_next_plan == node->as_nplans)
+		pstate->pa_next_plan = append->first_partial_plan;
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d9ff8a7e51..82a511a5dc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5315..bee6244adc 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1249,6 +1249,44 @@ list_copy_tail(const List *oldlist, int nskip)
 	return newlist;
 }
 
+/*
+ * Sort a list using qsort. A sorted list is built but the cells of the
+ * original list are re-used.  The comparator function receives arguments of
+ * type ListCell **
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = list->type;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len - 1];
+
+	for (i = 0; i < len - 1; i++)
+		list_arr[i]->next = list_arr[i + 1];
+
+	list_arr[len - 1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
 /*
  * Temporary compatibility functions
  *
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c97ee24ade..b59a5219a7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 7eb67fc040..0d17ae89b0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1600,6 +1600,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08ab37..9cd2884de5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,7 +101,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static void accumulate_append_subpath(Path *path,
+						  List **subpaths, List **special_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1331,13 +1332,17 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallel_append;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
 	RangeTblEntry *rte;
 	bool		build_partitioned_rels = false;
+	double		partial_rows = -1;
 
 	if (IS_SIMPLE_REL(rel))
 	{
@@ -1388,6 +1393,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	{
 		RelOptInfo *childrel = lfirst(l);
 		ListCell   *lcp;
+		Path	   *cheapest_partial_path = NULL;
 
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
@@ -1408,18 +1414,69 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 * If not, there's no workable unparameterized path.
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
-			subpaths = accumulate_append_subpath(subpaths,
-												 childrel->cheapest_total_path);
+			accumulate_append_subpath(childrel->cheapest_total_path,
+									  &subpaths, NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
-			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-														 linitial(childrel->partial_pathlist));
+		{
+			cheapest_partial_path = linitial(childrel->partial_pathlist);
+			accumulate_append_subpath(cheapest_partial_path,
+									  &partial_subpaths, NULL);
+		}
 		else
 			partial_subpaths_valid = false;
 
+		/*
+		 * Same idea, but for a parallel append mixing partial and non-partial
+		 * paths.
+		 */
+		if (pa_subpaths_valid)
+		{
+			Path	   *nppath = NULL;
+
+			nppath =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			if (cheapest_partial_path == NULL && nppath == NULL)
+			{
+				/* Neither a partial nor a parallel-safe path?  Forget it. */
+				pa_subpaths_valid = false;
+			}
+			else if (nppath == NULL ||
+					 (cheapest_partial_path != NULL &&
+					  cheapest_partial_path->total_cost < nppath->total_cost))
+			{
+				/* Partial path is cheaper or the only option. */
+				Assert(cheapest_partial_path != NULL);
+				accumulate_append_subpath(cheapest_partial_path,
+										  &pa_partial_subpaths,
+										  &pa_nonpartial_subpaths);
+
+			}
+			else
+			{
+				/*
+				 * Either we've got only a non-partial path, or we think that
+				 * a single backend can execute the best non-partial path
+				 * faster than all the parallel backends working together can
+				 * execute the best partial path.
+				 *
+				 * It might make sense to be more aggressive here.  Even if
+				 * the best non-partial path is more expensive than the best
+				 * partial path, it could still be better to choose the
+				 * non-partial path if there are several such paths that can
+				 * be given to different workers.  For now, we don't try to
+				 * figure that out.
+				 */
+				accumulate_append_subpath(nppath,
+										  &pa_nonpartial_subpaths,
+										  NULL);
+			}
+		}
+
 		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
@@ -1491,11 +1548,13 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
-												  partitioned_rels));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
+												  partitioned_rels, -1));
 
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider an append of unordered, unparameterized partial paths.  Make
+	 * it parallel-aware if possible.
 	 */
 	if (partial_subpaths_valid)
 	{
@@ -1503,12 +1562,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
+		/* Find the highest number of workers requested for any subpath. */
 		foreach(lc, partial_subpaths)
 		{
 			Path	   *path = lfirst(lc);
@@ -1517,9 +1571,78 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
+		/*
+		 * If the use of parallel append is permitted, always request at least
+		 * log2(# of children) paths.  We assume it can be useful to have
+		 * extra workers in this case because they will be spread out across
+		 * the children.  The precise formula is just a guess, but we don't
+		 * want to end up with a radically different answer for a table with N
+		 * partitions vs. an unpartitioned table with the same data, so the
+		 * use of some kind of log-scaling here seems to make some sense.
+		 */
+		if (enable_parallel_append)
+		{
+			parallel_workers = Max(parallel_workers,
+								   fls(list_length(live_childrels)));
+			parallel_workers = Min(parallel_workers,
+								   max_parallel_workers_per_gather);
+		}
+		Assert(parallel_workers > 0);
+
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths, NULL,
+										parallel_workers,
+										enable_parallel_append,
+										partitioned_rels, -1);
+
+		/*
+		 * Make sure any subsequent partial paths use the same row count
+		 * estimate.
+		 */
+		partial_rows = appendpath->path.rows;
+
+		/* Add the path. */
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
+	/*
+	 * Consider a parallel-aware append using a mix of partial and non-partial
+	 * paths.  (This only makes sense if there's at least one child which has
+	 * a non-partial path that is substantially cheaper than any partial path;
+	 * otherwise, we should use the append path added in the previous step.)
+	 */
+	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
+	{
+		AppendPath *appendpath;
+		ListCell   *lc;
+		int			parallel_workers = 0;
+
+		/*
+		 * Find the highest number of workers requested for any partial
+		 * subpath.
+		 */
+		foreach(lc, pa_partial_subpaths)
+		{
+			Path	   *path = lfirst(lc);
+
+			parallel_workers = Max(parallel_workers, path->parallel_workers);
+		}
+
+		/*
+		 * Same formula here as above.  It's even more important in this
+		 * instance because the non-partial paths won't contribute anything to
+		 * the planned number of parallel workers.
+		 */
+		parallel_workers = Max(parallel_workers,
+							   fls(list_length(live_childrels)));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+		Assert(parallel_workers > 0);
+
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels, partial_rows);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1567,13 +1690,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			accumulate_append_subpath(subpath, &subpaths, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
-										partitioned_rels));
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
+										partitioned_rels, -1));
 	}
 }
 
@@ -1657,10 +1781,10 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			if (cheapest_startup != cheapest_total)
 				startup_neq_total = true;
 
-			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
-			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+			accumulate_append_subpath(cheapest_startup,
+									  &startup_subpaths, NULL);
+			accumulate_append_subpath(cheapest_total,
+									  &total_subpaths, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1756,7 +1880,7 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * accumulate_append_subpath
- *		Add a subpath to the list being built for an Append or MergeAppend
+ *		Add a subpath to the list being built for an Append or MergeAppend.
  *
  * It's possible that the child is itself an Append or MergeAppend path, in
  * which case we can "cut out the middleman" and just add its child paths to
@@ -1767,26 +1891,53 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ * its result would be unsorted anyway.
+ *
+ * Normally, either path is a partial path and subpaths is a list of partial
+ * paths, or else path is a non-partial plan and subpaths is a list of those.
+ * However, if path is a parallel-aware Append, then we add its partial path
+ * children to subpaths and the rest to special_subpaths.  If the latter is
+ * NULL, we don't flatten the path at all (unless it contains only partial
+ * paths).
  */
-static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+static void
+accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
 {
 	if (IsA(path, AppendPath))
 	{
 		AppendPath *apath = (AppendPath *) path;
 
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		if (!apath->path.parallel_aware || apath->first_partial_path == 0)
+		{
+			/* list_copy is important here to avoid sharing list substructure */
+			*subpaths = list_concat(*subpaths, list_copy(apath->subpaths));
+			return;
+		}
+		else if (special_subpaths != NULL)
+		{
+			List	   *new_special_subpaths;
+
+			/* Split Parallel Append into partial and non-partial subpaths */
+			*subpaths = list_concat(*subpaths,
+									list_copy_tail(apath->subpaths,
+												   apath->first_partial_path));
+			new_special_subpaths =
+				list_truncate(list_copy(apath->subpaths),
+							  apath->first_partial_path);
+			*special_subpaths = list_concat(*special_subpaths,
+											new_special_subpaths);
+		}
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
 		MergeAppendPath *mpath = (MergeAppendPath *) path;
 
 		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		*subpaths = list_concat(*subpaths, list_copy(mpath->subpaths));
+		return;
 	}
-	else
-		return lappend(subpaths, path);
+
+	*subpaths = lappend(*subpaths, path);
 }
 
 /*
@@ -1809,7 +1960,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d11bf19e30..877827dcb5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallel_append = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+					   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,167 @@ cost_sort(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * append_nonpartial_cost
+ *	  Estimate the cost of the non-partial paths in a Parallel Append.
+ *	  The non-partial paths are assumed to be the first "numpaths" paths
+ *	  from the subpaths list, and to be in order of decreasing cost.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Array length is number of workers or number of relevants paths,
+	 * whichever is less.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+
+	/* The first few paths will each be claimed by a different worker. */
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since subpaths are sorted by decreasing cost, the last one will have
+	 * the minimum cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+		int			i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(AppendPath *apath)
+{
+	ListCell   *l;
+
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+
+	if (apath->subpaths == NIL)
+		return;
+
+	if (!apath->path.parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(apath->subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		apath->path.startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			apath->path.rows += subpath->rows;
+			apath->path.total_cost += subpath->total_cost;
+		}
+	}
+	else						/* parallel-aware */
+	{
+		int			i = 0;
+		double		parallel_divisor = get_parallel_divisor(&apath->path);
+
+		/* Calculate startup cost. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append will start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i == 0)
+				apath->path.startup_cost = subpath->startup_cost;
+			else if (i < apath->path.parallel_workers)
+				apath->path.startup_cost = Min(apath->path.startup_cost,
+											   subpath->startup_cost);
+
+			/*
+			 * Apply parallel divisor to non-partial subpaths.  Also add the
+			 * cost of partial paths to the total cost, but ignore non-partial
+			 * paths for now.
+			 */
+			if (i < apath->first_partial_path)
+				apath->path.rows += subpath->rows / parallel_divisor;
+			else
+			{
+				apath->path.rows += subpath->rows;
+				apath->path.total_cost += subpath->total_cost;
+			}
+
+			i++;
+		}
+
+		/* Add cost for non-partial subpaths. */
+		apath->path.total_cost +=
+			append_nonpartial_cost(apath->subpaths,
+								   apath->first_partial_path,
+								   apath->path.parallel_workers);
+	}
+}
+
 /*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 453f25964a..5e03f8bc21 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d4454779ee..f6c83d0477 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1059,7 +1060,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5294,7 +5296,8 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5305,6 +5308,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f6b8bbf5fa..bdd09ce78b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3686,9 +3686,12 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
-								   NIL);
+								   false,
+								   NIL,
+								   -1);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index f620243ab4..a24e8acfa6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 68dee0f501..75c53fbeb2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int	append_total_cost_compare(const void *a, const void *b);
+static int	append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1208,44 +1210,50 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels, double rows)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
 
+	Assert(!parallel_aware || parallel_workers > 0);
+
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = parallel_aware;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
 	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	 * For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1253,9 +1261,53 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	Assert(!parallel_aware || pathnode->path.parallel_safe);
+
+	cost_append(pathnode);
+
+	/* If the caller provided a row estimate, override the computed value. */
+	if (rows >= 0)
+		pathnode->path.rows = rows;
+
 	return pathnode;
 }
 
+/*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5c3e86709..46f5c4277d 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6dcd738be6..0f7a96d85c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallel_append,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 16ffbbeea8..842cf3601a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a1380e..d42d50614c 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e05bc04f52..3a2e0313e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -1000,13 +1001,22 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
-typedef struct AppendState
+
+struct AppendState;
+typedef struct AppendState AppendState;
+struct ParallelAppendState;
+typedef struct ParallelAppendState ParallelAppendState;
+
+struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
-} AppendState;
+	ParallelAppendState *as_pstate;	/* parallel coordination info */
+	Size		pstate_len;		/* size of parallel coordination info */
+	bool		(*choose_next_subplan) (AppendState *);
+};
 
 /* ----------------
  *	 MergeAppendState information
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e269c..711db92576 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9b38d44ba0..02fb366680 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9e68e65cc6..3e6e02c1f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6c2317df39..5a1fbf97c3 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallel_append;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(AppendPath *path);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..b4083c6a55 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels, double rows);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 596fdadc63..460843d73e 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..96921552bb 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallel_append;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d1d5b228ce..1aa2fd0497 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,8 +11,33 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test with leader participation disabled
 set parallel_leader_participation = off;
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f301d..2b738aae7c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallel_append     | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc0f5..3fafc5f61a 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallel_append;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index bb4e34adbe..07937fa4be 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test with leader participation disabled
 set parallel_leader_participation = off;
-- 
2.14.1

#130

Rafia Sabih

rafia.sabih@enterprisedb.com

about 8 years ago

In reply to: amul sul (#125)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Tue, Nov 21, 2017 at 5:27 PM, amul sul <sulamul@gmail.com> wrote:

I've spent little time to debug this crash. The crash happens in

ExecAppend()

due to subnode in node->appendplans array is referred using incorrect
array index (out of bound value) in the following code:

/*
* figure out which subplan we are currently processing
*/
subnode = node->appendplans[node->as_whichplan];

This incorrect value to node->as_whichplan is get assigned in the
choose_next_subplan_for_worker().

By doing following change on the v19 patch does the fix for me:
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
}
/* Pick the plan we found, and advance pa_next_plan one more time. */
-   node->as_whichplan = pstate->pa_next_plan;
+   node->as_whichplan = pstate->pa_next_plan++;
if (pstate->pa_next_plan == node->as_nplans)
pstate->pa_next_plan = append->first_partial_plan;
-   else
-       pstate->pa_next_plan++;
/* If non-partial, immediately mark as finished. */
if (node->as_whichplan < append->first_partial_plan)

Attaching patch does same changes to Amit's

ParallelAppend_v19_rebased.patch.

Thanks for the patch, I tried it and worked fine for me. The performance
numbers for this patch are as follows,

Query | head | Patch |
1 |241633.69 | 243916.798
3 |74000.394 | 75966.013
4 |12241.87 | 12310.405
5 |65190.68 | 64968.069
6 |8718.477 | 7150.98
7 |69920.367 | 68504.058
8 |21722.406 | 21488.255
10 |37807.3 | 36308.253
12 |40654.877 | 36532.134
14 |10910.043 | 9982.559
15 |57074.768 | 51328.908
18 |293655.538 | 282611.02
21 |1905000.232 | 1803922.924

All the values of execution time are in ms. The setup used for the
experiment is same as mentioned upthread,
I was trying to get the performance of this patch at commit id -
11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20
with the following parameter settings,
work_mem = 1 GB
shared_buffers = 10GB
effective_cache_size = 10GB
max_parallel_workers_per_gather = 4
enable_partitionwise_join = on

and the details of the partitioning scheme is as follows,
tables partitioned = lineitem on l_orderkey and orders on o_orderkey
number of partitions in each table = 10

Please find the attached zip for the explain analyse outputs for head and
patch for the above mentioned queries.

Overall, performance wise the presence of patch doesn't adds much, may be
because of scale factor, I don't know. If anybody has better ideas
regarding setup please enlighten me. Otherwise we may investigate further
the performance for this patch, by spending some time looking into the
plans and check for what queries append was the bottleneck, or with
parallel-append in picture which nodes get faster.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

tpch20_pa_perf.zipapplication/zip; name=tpch20_pa_perf.zipDownload

PK
wuwK
part_head/UT	I�Zp�Zux��PKluwK���b"6part_head/8_3.outUT	4�Z��Zux���[mO9�����������J��w�zp�t��mb����n6W�����%�l��$H�`�b-�z�g<3�����?oN��A_��/_a1���m��a�P���#E��S�q�e�h���GG}���� ����q0@i���\a"�e��Sq<���{���{�>�Q08�9���������k\]��W�*&�Z%�� ��[�2��=�N���$��{rA���0u��/�'}���������_a�Q<A��.���G���%<��Az�t���fx��iA�UL�4��3�U9<���,�/o��=@_A��!�F���w�8����� �"}1��|�������KA3@�`Fx1�$����f	����.3�Tp�5�T�P�x
�����aZ��J����I�j����p06��q��e\\��8I(W��d��=�����#[���xe���X�����zo%�^!��r��;�{F����K������an[�;�9�&��\Yl8y���[&��8P��x8�.��n�a��'t�"�����\��(���s@�l��������������S�f5�z�9�)�����\	��8WD`Qi�Z�8�����pb90'�������}����3Wi�Aj�V��F0��Q
��(O f��*��Fw�fb
�`
�/da)��(�����6B�|��u<�����%�1UB�e�U�g��:Xa�'�����r�a�Y�
!�<n���7t�"�O������2�e��B�����-����)�.=v��#z��Qk����4�����#���5�(;��������
������>UB
�55�SkR����e����@<a��������R�So�XT��V�%��pkPY�����]�)N��MU����J4��O|7Qz�Uy�)�O<7�]U)�EU�yUil��	�0���7X��Xew8V�=U�&����*i[fU�P���Xev8V�-�*�x����5t�Ui8V���	P�M�\����Jh�=U��X�&�
	K�
�hJ3��r��kU�l��l�UU��\���PR,��r���Rr������4�t%��`%�����N.�p'�Y]I9W�h���R�B+l�g���]�skU�t>�$��9��WMRfA�d��w��Q"��p�9	����O��1���1�2s���K7J��s�|�K��Q{VD�<�(}�f�����[���|r����$���)����,+[�(�����r�y'��Zw�Rjr0��E����g�A�*��j�4�V9�
+!o��n<8`��������c�O_��� ��z��v��+*(����"�3� )�j?�0]
����E}��:��-
���2>U���u�c�����,V/�"�����X��a7"��M��*��z��>��5K�tD���Z�������\\�������U��G�z6���$)e+����X���x�YpnR���11t�}��n��PK1+NLA[7}U�i������q�v��aV�w��w���n.�������o��_g���[;�q�9�k������"��$�����C3����H��e8��l�V�Q�]��p���3)���\S�24*��e9|�`�Vs~�L�����^�:V���Q��:^���Q��:Q�N�Q��:Y�N�Qg���7�&��u�m���N���o������3o��������P�w�:J�gb�y�'"o6���p�,:����� ���J�,������&��?S�
���o�����AO����g����x���D)�������g5��j��5>�w�j�������-l)����{Q����6�,���
O���g�7p?n�
c��h�>��8[6���{fE��k������PKuuwKX?,[g#6part_head/8_2.outUT	F�Z��Zux���[[O�8~�W�
��>������.�.�j�O�t�`��L3�-���q���L0S�V���}����@������/�!O��b�l����9��RBv��4=�\�)�f]���$�6=����2�>�l���t�I:�
�+��3�����)�8����-do�
!�GQ0}��$�M����*H���5S����h]	�u�y�V&���d��L�$����d{?
��_�N�\��t�i<��~�.������EI��Y�	@��Q�;��G�!H������U;<e��� �i�<����I���X>K�����qr���qE��>��F��hx�:������P�U���|�M-[�W ���9 PT�rVH-�a�[$�S���p�:V���R�R/��VR������K����eR�0�sk��87���hx3��8o�%� ��Q��=F�~��q�z���z��^/�>/%z�L�z�|�3��~����x�OX��Pcq�1g$P�zl<.��Afh�Hj�c7^�2�������&��(L�t�(�n�{r@"�Q���;��r��2��x(,F��r�YZ�']�;E�(
o����=�;�B�������0����DVv���P+����Z�g_���Tr�
R���\��e$�q����r�E�e���E����p�mqj1I�Od����l��T�^|�aX>|�:����$�.K*e�Qq-5FU�	V+��/FVF9���$u�I��=��ZHL��8�O�W�iD$����e�ee�2�N�q�K��4��������8
���\�$o�6wN�1�?M��I�����_�qz�����=[�J.0:�^�/�������Tm��ZU�v��I�e��pQ���bUe����D�G��R�qJw�
sr*�
jw ����|��T�����d��D���6�R���`@����\Ui�AU�eU�
�1`2`]V�G@�^�9^�*��cs�S�n�*�U)��+VaZ�8f��Xe_p���U�5�������KUx�5��R���#s����*i��Te[c��
��U9t�~��s��Z�a�*�jU���p���U�lmUB�\Ui�AUq�fV�U�R�3X�Y)�'���u��B�r��t����D�V��
]H�o�,fe/gW��ZU>]�*u3��WM�vA�h6�	��>����4�(H���R����ip�p�u�+�SrNS��%�w��>�F�������j��< ��(F�DN���q ��u����dY�R�i4H�����iR?\�������z�n��<[bQ	��b]��
d����RB�*-$�(�
�������������L��A���V:��������/�/
���(��L��������2���MG�]�
�r����_>mzk��PV1�8����(�����'`?G��T9���?l�p�G�p-6��f�A
U��A���.���J����V��y��=& )��?- u��G��n�y&�?5��(�&jA�����7�PY�8-K���������1@Uy;�`��d��O����.�.������3r������o���_'���K��3B�g���;1�E\�MRu#-�wz?��_��k!�`�s�����.������S_�6��z
8]h
_^^��L��o����>��������+u��
��+u���
��+u��S
��u�yx{���N7�����?u�A�y��?u�A�}u��Q����Pg_u�5sb�$���G�b%$�����S��o+�!BJ�����4�<�4Y�A������~���$]~>�<M�g��=��r%��Q���ZZ�u���f������U[��p��'�]D��Cx�~��pNea�2��o�~0��
U���t����Y�l�'��������������PKouwK'��<}#6part_head/8_1.outUT	9�Z��Zux���[[O�8~�W�m��X�_�2�0����E�}�2m%�������$N�6
�"�����:�9��s�qAh�����������7X����_;@��S�!��%��DQ�-��c�w&����	E��~�p"�!��lP=�'L[��fc����A�����w��OQ�!��&����}�Y���"R��b*&�zM:�6	P�|e�G�r�:}X�;����%�w��Y����:I��Q����w��C��e��D��q��~$qx8����s�=�)�
��f��1�\��(��l^=@��S�o����w�>��� ��������8�=4qj|p"�d_�)Dt�[�������eX�+����U*$�z3�f�M��Q]f,����XO�Ps�����0)�i�����K�fzO�y�;�9�����(��*|JR��2K��f(�"�KLx.5]f2&���-3g|���2���'s{��`��~O��o<n��
l<�A�-���#Zl��7���Yq�-�C�R��x8Da�G�8��$~_�	��'���MQpNQ�|��W��E�\QM�d������Ww�a��O`W����Pw
�����7+bC<0��\�E��w�Pq�8V@-(.!��Z���?�T
�k���+����B)�u�lC���:B*%|
bfQ���-���L��0�&
�X��������'�=�p�u��a�=�1UB�e�'0f8p^�@g#+�����"��@�0��������A8@��7t�b���Df^v����@�*9�.����Jp�[��h��A��cg*@��'��V
��y�D��Qm���8uD���m,�
(�n :�����JHA��fwT�&���*j�����Jf���+PE��[��R���7x0O	TUNE"�!*����)N���*I���MT���&J�P%���`F��s���J�R%����>�TA�e�U�d^��i�-U���Q����'	�I�XU��'&86Mb���XEw�tcZ!<U��B��X^Qi��R%4�U�1V�I��$�����*$�PN9���*MvhU���*�4���*8W	��TL �K��;���&����J��`�����2����������q%����2��-�\Za����JW[����U%Dh�}�zI�m��q�1�F�*,���Y��B������(�����l���u8��K�|�K]~�*2/w��5�`\n�WduE�)=�\("��R���m�V��dY�RLq�M���^�q:y��ug-�&#�Y$����l��$P�����H��m��H(Z�B�����wzu~s�����a�!H���Bg����Y���1��(�0[
���^����t<��{����>u7���u�c��H�S},V/�E���[WM���)5b��T�Re��W���+�p�6�8�����n�����.%d����n�����*IR�^�dm�ba(Z	
pCF�I$N���!1t���{����bVR�����B*����^Rx��"�v��a~�>g�%^����Bgw7����D]\���Y����\o�������o��A�7����7���_�pN9���.G�v�o�������i]8D��A�i#r_��p|���������s=<��:V����,��{�A����� ��3�D
:�	tv���5��t�~xk�N��I�g��tj�]WBG�:]�No&��Agj��
��o�����kA���-9�#P-'&��l%���g��S��k�!�(�]q���^Y�4Y��b�����o�Z��?{��?�����G������jm����2��j��5>_w �H�����
�T���N��p/��W��MJ�B��VxZ��_K����p�W�k��F��9���e�����a���Z� ����?PKmuwK��3�_"6part_head/8_4.outUT	6�Z��Zux���[[O�8~�W��"-���j	f�Yv����>U�6��6����������izAb��T+���}���}\�}����������C����W�e0�Z�h��(����r��*�D�8�1:��G�KN9B-����>J����K�����K?���yt�@�sz�������xxz��^����������	�V@Li1ud%��?�Q�cw�^���}/>l��)��������Q'���G�-7J��LI�q��~F�45�C��K�]��}=<i�4'�*&rx���cbx���T
�����}�{a���H,jq����Cm�+��%��b��l�
2�W 
��Hq����r@r��i7K�m'�p���������B����)O��R��Bv�V@�Jj�������>09�����(���D1Xb�|<[��l�!�^�.��j�2��e��e���8��^�YFVX���o���a1����i�a�S�j�0�
�&���Im�a���.+�A�@�F����1uB/	���F'(d8�<h��e@)K�	�������%&s0)��z-��p�A��?��w�����o����p`R�\�E���f!p�+Ep��l|�������R<�(�4H-���MX����!�������Q�_��J
[����0�&K���GIF��E����Z�86N�C?�T��C�WFa�Og�g��:X���ZK1&������01�^�������v�E��33o��E�c`�Zj9t�'p:�����Y��?[S�}8A��ZyL(�A0M�&:���*�~���c��y�nc�n(������C��K#����I���b��Y&������l`�bo�2J�N*^�"`QJ��TK��Ie�f��J��������D�N*Q'���D�e^%!�s	+����d�Rv�R�y�4�9�0	#��
�Bql���*������I��������Ul�T�U�j,4��*�����0V��m�(�R��eR��B�*(�-��l;���5��������*6�U�IX����l�T�[���4��W�Z�*����URq8C�I��b�Rr�RQR�V��ZI�4X��u���[	�n"g��5�����hR��R���v�
E-cW��Z�>��*��K�����2�Jg���������R�6��K��{R���������Y�*����k���`�d�
�"V��,�kk�����)Ya��(�M����&�G)������L�J��,})�8���}���8�<�����R�����^zM�l�Y�3�b��ry����m���R��z�?<�:���x
���a����:���j7�g7*(�������\�w
U*Se�mz�E���2C�� �GA��S��������iu�1"+��X�^��EF�%V��>t�&G��5�2Py��[���/���%l:"q
Ec
������.���}e��G�f>���$)et���4��
����DR	�����:��������PK1��$SP�MBa+���@+�l��a'y�w��S�����OwW���������O������;��q�9���uwbn���'I�� �	?0�`��mS~H��pN9��X^�����w��=�K�*��9�eh�k9�.��T�~j���^�:V���S��:^���S��:Q�N�E��O�d�:9G����g��:U�N�;ls�t�:�N]s�L�:�N]s�l�:�	uz�����'&��g�y��&/O��i%?W�C8Q�(�:�*�.��&��?S�
���o�z���$]�|�z�n�����@��R�5��*��j��5>�wHo����skS���pJ���(�V��@��U�
O�L���9�k�F��HU����?��q:l���
l�-�m��?=:8�PKsuwK�}���'part_head/18_3.outUT	B�Z��Zux����mo�6���S�]�%��s���v�]�a��@��D�-���&��;���Yq���W�1!����wGIB������O��g��y"B^��9K�IA��$['B�r����V}A� y�uq�#_�iqub�9�&�2��"��'`���f(-�@�d��Y�]c���{�����ly}zy���Q�O.����zr�����v~n���Zi�5�=&Y>�����E���:O&�����;EK_��rQd�8�������%>l'���Y�Ia)� ����Z���U5�0�Y�r�������o��3���������2�|^����Y��p�y�>��"��3Di��sc�B�
i��J��^�
�JN�:(������)��jm$uP���R[#�����FZA�jR�=��o���z���Q�glP�K]�d�G����]�F��[L~�Ww��6�V�S��[���:4�4Q"�x�l����Y��I�/8��lc,oj��Y���p����e�ER���������q��4J&[�		�'��������������6*�����6�	���'��1�kA��\A�
�_h(�8��\g"���Df���������|�Ei��!��?����j�x�>Fy� ko�b�7�i���7S-�p�	;����{P
�s������Y�����Es(���u�Nz\P�
P�KX�#3���. �1���b� �����[3�g�l��y���O��diG�5U��XM-oY$ztI,���$����@������8E���D#�5-���GD�{��m��W0��c�A�y�)�D2���	��t{<��)nW1��A���mm��i=b�y���-��t�rDP����B�,�p���hCK��#����@[���G�����n�X�wb:��a���V�Q�G8��m���j' ����H[z�a�#��(MMS3{��N@\���t�d3���R�����n�#^����;�����| ��7
�V(��,�+5|���nv�+���
3�������`���[�K:�o*���$�\mq/����
6��d���f#�����Q���0�j��o����v�MN6O"�"�x%�SD(��E��r�����*A�X)��A�"�Q���H��)�	E0��6��J^��O\(�	����"`=VKl������d�Eo�GV���{����z�Y~�)f�`W�5�����\x������������+A����fK�{uM�7+'5���{�po�L�V���l��>Z�Y7�z���
���1�j���"84�.�ep�B7gjG�
��hF�����h�o�:8������m�C�1�����@��@t����OCh��@�}�4��I�_�c?����v�g?&�ZM����&�,�y����JQ%����T�����?PKuuwKSuF���'part_head/18_2.outUT	E�Z��Zux���Z]o�6}���[�%��a �.���E�`�dh���%G����~��d��d;J�A)���l���{��%�^������������+O��������y4�2��d��p��es�q���r�&�'��-�d7'��A0���e�,<aF2����U_`�m5l�$s�{>"�K�,����ixda���A�-&���{~Z�/e��y[Y�~�(I'a�Q���$��4����W�	X����,��)�\�6|x�KxXM��^%i��H�R��L V��#5�
 ����)��������M�%�n���-���bE��p��`%��'�c�k���	:(%aFR�
� -.���b����S\��r�����|�����}�� 4��R�x�
�qF V
kx�5����FbJD1��n�n�4V��D9��6��Z��6��}�M�e��8�F�C�k��y���Lbl N�����f�T���`MT�V��i�Q�F�<m�r[�M�0��b9x��[qe��#z��|7�K�h�����]B�����a=���f��g�n��&LKO(--wI<�-�
�\*���������@g�XP��#9��p���������]L�8!g�n���e<�����E�f`m�/zI���8�i�*������@6$�/(S�7%�������'�t>�I�
$Q���i�!����n=�0��Z��Z�0�2�,�����`:
��*�CW� FI\�b��G`��TXx�
�m
����i�A��Z���r��e����?�0Y_U@��<�k��@$�4��(�c���4� {YQjK��=Q�6x�B��#�Q�w�b���Y{DkMD��M1�����`	�D����JZ�@Q�nS�0\�[�VbK��JJ�u��Wi�����nC�@�u#P�Q�L�J�r�b���1�n�G�����H��P9�U�A��Z��U5 ��v �yD+��Y��d_yA$�~S�P���PW�mw	�b9]C�O���R��WbU��%��G&P��!��rU����M����x��l-Q|�����PB9��V\����H"��%|�?%��[n��$��vPm��I[��"$0���^}�����;�`rL?I>	�J`=
�� ���H�IP>	��t��$h���`|L�p�y8�~�`}l'%�\	��$@=�WK�
D#��zd��|��=�v�l��.����e
`����\x4���`U������]������{���k4���T������M3��.�l7��}6������^
4����9h���;4|h���;`�A�'8[��%D���#}�����Zy�]�-6b�i���Z{��]<�;��;�!�m��S�-��w{Z���P�z'�]|�7`���W����}��z��2[���p��"�S|e���N)���<�����PKwuwKe�1���'part_head/18_1.outUT	I�Z��Zux����[O�F����� ���~��Jt��U�UU�"7��"��q��O�3�g�p�����������sf�� �}��^����N�~��:��v�:��Q���$Yd�\2i��`\�9�����1%}������G�0�d�`��h3C�Xjx-�+L
i^�%�-��@��%M��'WWixda���H�-���y|��/e��E�-@��#���0]�)N�Y���4�������,}�&�E���O��{><�%<lr��I���f)� ������XqV��	��,Z>z?)����������f��t�����f�K�%�')B	����)�5\d��J%���B\Iz	
.�X�'��%FK�(&���X���S���M��RZ`[~�rF WJkx�5���*�"U�2����6i����rB�lb=
1���&`�<�����Q���B�k��~����Lbl��>6��u��L$�����2�h���Eqe�|L!�g����>G�,LG�p��z��[qeG�#z��|7�KH�h�dC/�]@�������YhF��~'�n��:L+O(--w�x�[fa�����
)R��+��PX���(�&Y�u�&����g��h%�|�@�� �C�3�����e<����y�f�v�/$z%���X�J��U��5�A�3m1��$Z&�����J1�L���r���qr{���&Xt$
4��p�h��%�������4���^���el+��L�Y0��3t���I�$nI���#gmXx�%�\����E"��	ku;@,�-�(��'��q����y����A$e�JN`9�d
�p ��3RNBPJM�N@D���ibj�hF�&�p����#Zk���v��B6�����.G`m�eE7�%v���v�L���J�Iv�@����=m���v��X�&G�"��$-(|�lr�����V��#M	�@(e� \����m�PJvb[�n�Z '� ��(��Z l7�����wM���*�5�U��u(�F��dIW��j,�UE~�0���!,,������'��,�-����Z�E_�[��-�PN�����KU��k$�9�/�,7�d��E+����P�6�ot�W�����
����~��}�"T��� |��qL�g?E��r�~��|��t0E:��A�"�P��`�/�A�����UD��O����%������������598C�����X[�5��5]�8%X�j�����$����~����F��q���=P}�����I}N�f��Z�n���PWb/����!�z%���{��C���t���U��w��C��G��(q�<-��H�>�xK��;��>���J�
��9�xZnQ!�
��D������m�;����{�i(v���-\����e�r���@f���>�,�x��.�
����������PKtuwK��	1��'part_head/18_4.outUT	C�Z��Zux����]O�F���s�A���+�-�VeWTU�Bnb�Eb��ta}������M�9L�<�9g���^������������+O�����O�C'�,)��yq�%���
�U�c�Q�}�RB��dR\j��F��XDST$����-��X������f�4�n����}���Oy��9��������'D�m=�d����W���v�_���k|��|�����"+��M������ew��G����fq���{�?�Kx�L�p���Oa�RtA
����^���v@V�����x|���p[S~���lr�n��z^J�%�e9B	��!�����x�N�f$��`)@/�����^�3��1��D'�HcU�^�XC)�T����@�� Wjkx�5`Fe�R�IZ�d�k�Ice�M�sN0����(���m� �x�M�e��4�&�c�s4�z����Lbl��>6��u�(h-�r��Uk��4I���gby��}�s2-������(���E�Iq��>�w��w��(ol��3������/�c��l��)�����s�	���n�O�*mDs-�4�JV�����*�%��	�al=�S
Nd�����(����4J��1l�I�H�W���t�E�����ES�;�G��p�F3��'6�=
[G����1��'*V����P����8�xz0�`,h�1�7V�F	[����(��P�9f�^V�A��ga7f����4�������eiK�U�@�`h�"���(�@j�'���#��l��� ngk9E�5>�.{�\���Do�#��}")���i���~����]�B4�[�n�G��YzD3j�x����R����#Zk���v��~�u���\��
��R�����GT���#l@�k<R�@e$Z9b���MAt���#�B��T�� �� ����r�����V��#M	�@(ekB������Y��@���v�X��8Y�8�s�
����8��~�l��Y�@e�&��u(��&���S����p5��jy}�O/��	0;�3��,��dfc������Rk1O�����J(���j%�R��{�$���$��GY���OY
���9|�����ED`�l���V��Mx($��D0�)�ECD`����A�"�7(�
EP��AU��vS����&�J�j�d;�E�P;(d	r7E�z2��� t%��o����_�,���G�����f%�?;?��]�]���������]�8%xyT�kN����u��74�S�drw�t�wT���{�fB����7��}���v@u�{)h\��+[�@����M@���1��78k�}1�wZ��!��2�qh�3�*���������K�!��9h\�>#�w'�mpyB[��x
4����x����u��W�_����v�f?@�?� �f�=t|�E<�GV0c��:��HW�����PKuuwK�+UdQ�part_head/1_4.outUT	E�Z��Zux����]O�0���+|�"
���:�I����@��+��^�&%u����$m��@(Z.�U�������c�.�����?�����N�E��^�:�Gh8IV~L��T,����(M�Vc������WhL|D��7f�
������%�~@��#B'aD��C�i�-�f������@��������%p���u+�F_��Ea�B��G���Y���������|����\%i+=)�"�xI�R���4+7��[1�����y2��,���
S�n��0>�7�����~���Bc�7�����k�]��dc��g����$�q�
]DA;P/��;�x2oy0�u�>y������$[.�k�_tLL� �2:���|�c����������� �$Kk-��l��Y&%��4(��NB`�)gs�)�J��{��D�����EW� FI���!��:�,c��d2�RV*	��
��vF��#4��W�p9����������G BF���h���c�m�h�/��$v{�x	r�%$�7E��+�����-k��X�l����m�����."@���&��|��"P:0g[)���+`�������I����(�[��[A0�u�����,�[����la=��f�`+�5��V��[-U[��[#%���[��5[-:�����R,�z������&X���bk�~;�����Nj�n��������z�s���n��
�7[�	[��[	^�-�����w��[���	��qk��d�`�b���Q%��mq�U�����R&�]oa1���@E�1\%8�%m��I�U����6���
,f�	:�g��#d�-��x��M2��8�*(�����p�Z�PKquwK�vNP�part_head/1_3.outUT	>�Z��Zux����]O�0���+|�"
���:�I����@��+��^�&%q����$M��@(Z.�U��R;�}����.�����?�����N�E��^�:�Gh8I2?�ZS*	�u_Y�&w����p��c���������\S������GEI����P9���IQ���i�����,u����C�X��d���V��V�|7���G(
cz���8�N����w���>(~d>�yVOR��J�Vz����K�b7�j�����Z����5����d:B�y8��JQ�n��0?�7��)��~���Bc�+��b���`$��F�jyV���H��f�"
��A���gAO�-,d]�!��A6��dKe��{�S	LU�N�3Z��F�������������@>�XZk9��Va�����@�HcN�T�iV���
�����A�]�[t5	b��[��Yo2��1��0�&B6Q��A��Iy���0����rZ������!��@�������x?,�b	��F�}zHb��/!\t	�����_�������-k��X�l�����,���m���d��j��`�w�*�s�a+`����p�0�����-5�5����5�u�V4|+f��]��m9�[K$�
[n��7[��[��	+U�VK��l�[MU�o�"�fI���y�R��6la��e�d+5��-�t�����5������v��R��h��1��j�B�����w��4}k0%�o)�ml��n;V��:A�����N������*��oK�RKl���2��|�q'w�H
�����kI�q�*
\+1�5\���5\m�0+o�a<+g!W?�pw<��&���	\_%��zx<�N����PKuuwK��aAS�part_head/1_2.outUT	E�Z��Zux���W]O�0}�����4,T�$&�M��P�zmD��$�_��4mR�hy V��Jl�s|��6B]�o��/������N���ht.���$��1��RI�0o���4������|>f\�a0�WA��p��LP�5Fm��QQ�,��`�~D�$��(|p�4MV���,u� w�C�X��d�e��?a����w?BQ�0w�k������4����/��,�UVM��\%i�zR�A0�vN�;�t���}�����@m�s��������MV�:w�$������x��~�s���]������G^a���<U�F�e�n?�������8v�^�w�����CO�"H��}��S�$5�E_��N�*J�������|�c�~}���r���� �vBZk9����,��s�!(��A��bUa�L9�J��^w���E�����I�$�qk5�
2��1�
��10'V��fy�u;	���#4���y��z�|�wx"dT<�F#�>l�b	� �Vy�=$��{�K��.!(��)�u�E 
�V�����J[n$i�V���Z�%�;@�Z[MT��]m���9�j+ �Z�����LMM[#^\oO[�X���o��W�6�E[��-�2���������D�B[��	�<U�[-U[N��U�����j�km��$��f��XT������#�4px������NoU[c;�	��ow|+%m�������X����hk����D*�RN[��>'f1��vN����%��s�}J�Z-��L�j���*)�%����~�[(���H#)0����kI�q���)������Ae����@1+n�a<+f!J��K�.���d������_�?�k�_�ru0�PKpuwK�(+�T�part_head/1_1.outUT	<�Z��Zux���W[O�0}�����4,�/�:�I����@��'��^�&%�1����&MJ -��TW��sN��?#�g�����'�8;����ht8�6Bg��O��$�R�)��q�W��]2������)����Y�yJ���2)��X0���/�QA���`�~D����p�4����b�����!P��f2�2��@���i���w?A�:?u�k�����Y�������$��,�'��\Eq�zRl@0�v����V���7�x�h�t��e4������$�s��b����O����O�t���Bc�K��b��#O���yJt9�<�~�~D���txa���~������������>���%���I,�����������$)J���=��i�G���A�	�XZk9��t!�LJ�mEP>�����s���\��J��{����+w��f^�������z�Y��1]0%�dT��-J������A��	���_�s�|��w�"dR\�&����9Vk�-KD�!
��o����7G����������������I:�Uy�Z��"��9hm5Q}h�w�U�o=h��%��-���Q
R��������5����h�V�x�]�����o!������h=lmA�>���L�X���Z�|����l��[������jo�P��y��k/�d�������*������d�n���o�$�K[��TP�5�Vk6lmy/���o
T��o)�]uB���2�h�N����%��:�>�mc/�Jt�-���J,U��L;oa3���@Z����j�Z�e\Aq)6u� �f��j�Yq���E1�Am�������_7�Rt������8���q�]�F�PKruwK�(��*xpart_head/14_3.outUT	@�Z��Zux���X���6}�W�-���i�v��i�
�	��O�`k����2���g(�r�������/�Lzf����<�����w�����O(��~�1 ��*-	��m9��:�(�q*����������D�&�r�d�LW~LpN9�i��[��X��h�`]�"��4O���'��������
wHp=��_�r�7��j��d��/�Z	LG� ����z�Ul�~�%wY��~>&����d������M��&�3{C���bw�T��z��N}c�^�7�vA~/�<j��A��R����0�����6�T�RZR��.f��(R�J���Afi�����@���Z�OdB���������^�}>���2hM�Ih�2�P�2�#�����5
WS��@��l�:nO��g�&-W������~�������
s�	��6L�����s��jk�����G
���'�l�]��y���	y��+f������~�3�z�����*���z�����LVk��\����|.r?z�~|7Y1[n��? g�~�������s��z"�?L[oD[�na�1������!|�>a}��3in�'�%��c���������C��@�m�����C~�GY��,�h��I�Rc>���{�@����7�����!D�C��U6A 9 c��B�r�������������;�D1D�c�TT�6�
�b�wU�z�����1d*;���"�X�4�X%O�kp���i�>���C�������*g)oH��6�8���]�������������I�mX�����&�?�42���6���5tM]���AW�TdM�S*��P��]�A������"X�;Zr�����F�������9��:��o�����E�����a5w���10y�Y
X��f}|-��������
��2�}��=${��wD
8��:��g���������f�*��8kx��qP7����X�Kp�C|���d������b��|�&�~L���������������[��e.n��!��7���G
��$\�p����?��)*R�LA��]�u�(1�PKquwK��f�7ypart_head/14_2.outUT	>�Z��Zux���X���6}�W�-��%8�����i�blE��&l���Z2���gh�r$_�F�@�[�������I���?>�y����y��F����r�������(�1��iG(�� ��S9�)�U���Wd�L�m��*]�13gJ�(�JVk�� X��	�9��,����|����������aa��G0�KR-��c�R�[�����1���I���&AB��������f�7%���<����f�%�|�8:%x�l�=9���R�����@U�NZ-v�S��=y���[����Zw;PV������=/�p�:�(uV)�U�?�����c���E��
�4�i�W��d�h-�g2&A��������n�k��bd�F�&�(�tS"�1�����80KY\��I{�e>#�i�J������$'EN�!q��$Tc�0a����H��c�������{�%<]���I�H��P����`����f�j4
O��������~��,�"��v/�|-�d��8�E��v��/E��������.���Xj+�c�CJ�L����C��t�0i�����Q�0)J �59iA�&)�`�rj@���%.�S\s~�������o@��i>?�+�T?���t(}�$�,.U)���g4�h�d���A�K��@�,��=�X��NHX��C���"8�
�o�����P:_���\?��>�HE�i9�h8�!�a:BT-�X'�����T��q�d��l�Z�U�4��6�VP��~�7��C�������*g��h������2�T��@?5�pyk����tu�s�Q�c�J-��� �+)�b�j�^4��������_�
X��a�����������]�A����������������v����]{y����ut��:�Z���3�13&���w<��/{W�=v�>��f�v
L�3��p��-�m��d��������Hl�����p������q�)v��F��[�u��.}���'�p�O�
�O���U��L>������2��}\}���{�b�����*H}T�n�0��Eb���4"�NKV���y��m��#�/��	~6N\
_PKruwKb���4{part_head/14_1.outUT	?�Z��Zux���X�o�6~�_���!x�M�.�l��!+���'C�	[�,y�����{�D�R,7�P`MF1!�|w�}�x!�o���������������~�1 ��:-	��]9��:�(�q*����������D�&�r�d�L�~�[M�D�8��$+�
~���+B~I�$K?{r�Xl�")��w,L�X�S������K�m��T��%��e�R�.���*5T&���j�UlW~�#wY��~>&�o�m��g��KB�w��L1���c"����;��Q�{H����S9F�&�-��E���3���)�����L���Da�j�� �9�Tm����H�q���"G��Y�����)�l�A�V���0���������7��cj���0D�1>�t������d�E��A�����{�e>#7i�N6����gIN��S>0����&��p�RK��i�5��`���#�g�X�a6�-��<�������3W^�����\���|��e\eqal|���!�]��7�G�,��!z������������j7&�a��5�<�>����o���7�H����1�c]e��J[]������[�E&��6QR���������W���u�K�E���~~DY�@}�$�,nU)u�Y�-����4���?�g5D5���y�@r@u������Z�Z����5J��{�w��s%�@CDWC���4b4�i����\O
�N��!S�b��h�d�RY�X�U�_C�F�l�����<M
y$����������&1��~t5�B�GY��[����n�������r�+�8�nh2�e���]m��F���5]t��*��2�3��
��G"C�GeP��������+�x�u�!���#(��z��k/o8�����{��h5��3�1GtwmT`�y�j���n���b�����i��/s"����
{%{�6o�jp����J<h�����Pq�s�}A$�Ra$�0�j�������l�K��%8��C|��x��'xC�����'�a�,��p-,3�������i��+�v�~Q�h.P�r��+����Mb`h0��9���n@^����L1���4�����a�h0�PKsuwK����#=7part_head/15_4.outUT	A�Z��Zux���[�s#5���B�j���J��J]������(>�L2�L���=�,�=�-y<c�����+�xZOz��-�������`?�����������?�wo^~�g����ooG�M�PV�_�g��s!P�Zr]�������9��������:a���j>��|(��#�9Y�yv�3��x�����~=*�w���;��|:�����:8��������vP)���{�yr�z4J��Q���ZI��`���~9��u��|f�n2Bre_$�0R�l'a'ocZi���`�X���|��]�?c�rT�U�pEW��W��dr_��>R�6�����&�Zi��_�Zk'�qi
�1N3.�B�>�����o�����M�������g����7�Wy��rrl���G���
�=�8�!�b5���)�/5���!{%?���:�����a�}r�^P�����gg��'��������UJ�
�O1!������<L{u7�Wu�����8�*��
��'��Z�c�T�P!(I����?x@����v�:��f�S����
/x�4:x�az���k������B��Kt�@
����#�J�N�(/�9t@����v�Nh.SU�Z��B���b���u�o�:@����v)/a�L9����
�@�(H�LyIp���5@]��nJ�'���N����X�:��s��H8p@����v)�x^n2X�������Jf��Y��
k������2��$�-AU��j-��y�����$���h�]L�`sJ�A��4JX��TXnmNA��_9E�����}5.G��&���[s�D�[}����*[n/l�]:V�����Y*$�9���y!x���h����s�?Z&<�`5��s� l��kU�bX����]�r���p�0��w�i�B�tD���t���q�6��*�}�h�[��|l?����������Y���a<�wHs���L=����������7�l5`�4qA����c�e���0v�,������y9���-]���"��#�
>E$v�ha%}m��}^�bZ�����t�XX�7�iy]���1)��~�N�_uS����Qurr2�6�f����%f������>���{2����nE1�Eqwn�MEo������9�^uPO�m���(]>�����M��8������Q�5�w��`>�����8�w�iUb��D��D�e0��t�u�($+B�xH`Pp:���`,�X�������=��L�Y��K�&9��6�
#H����^,	���d����fc0C����e���t�h�u�����[��cGzvX��8���u��vY`�]�4���j\_
��:�X�x��N��]q������1_�z�y3�M@m��.��}X��|+0Q�~�����g�6��n��"(�� �IR��D��J����-�rI'E�z�QB��-�8l<&��fg�C����P���Y������	��k"Eb�B�t��b\Y��
�"�_2i��|��BZ�Q-�@�`:6���)"-�X��g%}vX��"�]<���9�u��
D��]��  @b;!��>hR��"J�����A��E�E��t���E(p�-�/�=N����(b2E���*�"*xr��cW�F����x�(��!#�yo#F���FL�#Fj6h�9���d�w����@���#vU�`�����'J�i��rDAeo��"���E��������	(b��J��n�4��;J�y�B<\Y8{Q��aQ��ZZ��!���u�v��V�����,����Do����#�--�niA7���j\:�(-c��Y:{\:������A�������GzIJ������(cv���P�����5E,*�'��oQ5z���v��C��������5�>;,��UK�PN����W2����v�U_���Z�E|x�1*�'�Hh�`W�+���BEK}�z��q1���"$V���XR5G�p�OF�P{8�����UI���E�Ys����8<m�8Z�J��W�$���Fjg��^}�W���#�N���1��	�=����cq=�J@���qJy������XN��>�������PKsuwK�OL9�,part_head/3_4.outUT	A�Z��Zux����mo�6���S�]�a!x|8��R�����-�d��W�k��rmym��w����j'���^�I�~:�yw6c���g�o_���3vp������'q�Xo�,�-�r\J���f������������������$:�RH�^����s�I2���:z��y2o��AP�[��*�t(L�M���M��[o�[v����c�[,'��8�FqM.��/�/i4F��<D��{�����ID�[��0^��4=$c?�����%�a4_���e6������E�(�4��g��?[d �E�dN��x���@����q�o���������<��uv`�V�r+8WVk-��;�R8�!�+$��R�E��_n�Vr�kL2�����(��.F��8'�8���|p�M?Es�2����� Aqk��v_��#O��,+_���}������/��q:�������/��Q�����4�
6>Y�E�98'�A�JI%�@���W
���c����)|5
��/��6���yn=hzr��Z�\n�\��Kp]�2���vr=�J*�������i�������M�� ���i�XvYS���r�5��j�5�N��wY3��/�������"F��yn���C�����k�m�*�������!�[��W����m
Uv��tu����/f3:�����R�|�Pi�W~���&)�%U�ddy�;g������F2���1;�>��A��i�����3��r�R(��e:�����v�w�&{��h�W&A9�s����9�H
�;��;-~F�dg��DC������k�r�'�.���
�I���`��	��]V��Q�t��%:%Zk:��)R�z��	�������XoD��(HJ���V��A�+�
�c,�w�5`�)���t[� ��XT�������}�&������\�4XS^��mJY�����>J������M
�G0XiM`�"�H��-{�H#��<�p��8�=*��5�:Li���4��yen�-M���C�o\r����j�n+�R>�C��w��Q���I/���4����PJp����J��}�A9�>fV���!�����8�p����U�� 7Kk����rXND��n(_.�Q�8f��V�����t0��

v��_Et�8#�.}���=#)j�7{c��{$
:�tXuA!�z���U,i�j�d����x��4u���E�l����^m������,gi*���HT���A�d���-^lm��h������c,����&�����P�^���R��5��Qhp�L�n
��zI�F:��R	'��t8�*Z�������@o��3K�[z���(�[-������o:>*���z�o�R��zSp/~�*I�}3z9z������KT����
��5�7��-/�d�=������ZR=)|�F�C�)�;G������&�o������T���������@OZ�g��n
���c��.`lFO�E=e�z�l9Yn=���w��j��2m�T��REA��|b���v/8~[���@m�Q�M����Pr�u-���uZ�� ��T�%��&���R��8�a�i����T��>)���N�s��5�����(�~�q���6[�c�M����]�.[1@u������^���1����Ek�V���P�W��������:]DW�h��������_��8;� �8���o���*�� ��\<������M��K4X�1Q�?�J�f���io���PKvuwKc�0�=7part_head/15_3.outUT	H�Z��Zux���[[s5~���3���F:�g3-(�P�mx�{'���6�R~=G�+y/��1�}�i�8+��'}�&������_�_^�tF�E��������W���7�rv���3�*��9!���:�2�9��J�
U�Z������t��_q&F�|<�7���yr�48��4�a�l�X��#��/�43g���U�oV��|�F�>�����ww��n�'�a%c�ZEiPu�-c������u�1A�{�����6�9U��+T����1,�;�(��`��J�8e��|�Zl�����%��Y������r:�]o����mwH�r�d�`7�E�W	VS����4�)��m3���N�I#�v*���n��6����#�n���Y��I���q��t9Elo9Yd��5����8�.��%WC�[J�3��1q�V�;�/�4��E6�$���h0���<����?��������?%��>*��L�S����/>��[���m�����������?(4����r�GT�(�������)��v; ��P�T���RC�P���>���r���:��~7��������v�F(*e�z#t_��Ux,GM�����:v@UPUS;��a�B��Bi-|��P��5z(b�����]�3��
�*�0;�����F�"����5
@MM�%��r�`�1�1!��A�����#�6�5�Ky�CMp��0!����PtF��9z�#�5u5�Ky�,
5��Y�P�d
�s��6�v��)�Y3�gu��)�k�!�('��-2'���#��{�3'���du��/iK���pXbK�*$8c��U��V����-_[M�W��4Y�u($�Y%�%�&a4��6[��`���G���l��w&��B��-�e��HO$+���A\vL����b�����#�*P�:Hv
t��R�`��Q�x)�Z��`��������&�<��_'��
������bn�_~�$��%�ox�v� 0Kc����~�{LX9�������������l<K�Hz�$M��/����9-�f?�J��(j�l��t�'��s����c����.W�$!��g�)d�y�������z��d�h4B2�6�n����\���)��m4�
��H�����-��9��}x�-��;�����m5�G|L�8���q+�}�|/?-V�jM|3/Kp�����o���������Wy�����+�����p�������c��6��`
T�������#�n��0#�Xc�Fbj�&����p8=���Z�jm5;���}v���f���H��xI�H����j����P3�x������
.��,�������r����<�7������-0t\�����D=����U�U���^}=��*�@������e�u�)���r@vH������(�"�	�1.����Y/E����4@m��q�qJ�'��(b0��6y�������;�R���qQDD�8�t�����y��"W����(h��U@��@g��)"8�{d^]��-�X�)Gs���6E��oB�J9��dH��"2RD^�U��!A�:XaR�K��I��a	�K��"����H���lQ���f��W�"Jb��-�r����ET��>���W8K]��X�*��%����� c���G��PD�(��;����0��-P(�g�qQDokL��ZD[�C�D/E�2�y��;���LK[s���pD���hh�7�����/-|rV(Q�qq�l[Z�Q����6�vG��a�b�hq�c��mE�b'MPOR��vK�;*b�v���iYH�3�B9����El�zt(E�����/3��8FY��XAmp`��8�b�)2-������s�I������L+��k$Cr\q���X�--�������O�]_���A�`��R�b�����UOA�N���^��_�R(S>�.�.
�qQ��m5b5���W��f�����'������m���is�����v�I0*��B�����ZRS'|�U*�����$�	w�zI8P	���g��1�l��,���[j�h���W��3:;����7�/��	PKquwK����,part_head/3_3.outUT	>�Z��Zux����mS�F���)��Snn�����$%�C�I�L���h��rd�	�����cK��4���w�����k3���q~r�7�����o09c{�O����xg���,;V�����������1�%����J�^���#�����J�<�8/.u9f�$S�8�#B�/;K���ii��P����w<����b�Cs[<������f�q�7�'Q�E�����kM.��i"�������d>��;d+��g�d>����'g�dIz�3�or����g��G���ary��dz��
��t��>')�
���jH�&��G��{�&�����4��yv`��h��K��B�-l�Nx\:�i��>�h�����XJ�q������f3����a<%�Ig7��n��g�(%.��f_��F�����U_Qf�����F_s�+!���8�O&m�j����|2�|0����f1-�qge��(��9Ar���%J
��:O����-�|�����W�������Y��`�s�AQh�z�
��K����y������UR����=�����xp�*
 ���:4Y�Lm3'�aN�jNtb�9q�9�k���r�u�Q������l�~K�I�V�<�h��t�p��
Q%�R\�pL
�c���-��:���������b��n=���tJGc�����E{����8mHQ�3���-�iP��Q$���������u�6���h��Y���
��Lj������Z�kF!��%I(xS��ky���&eQ�k!A9��O���<DAyG�y��Oi��������}�)�(�5`\�I���mlp��P�Pz��Ak�5`�
�A�!���I���#RfL�a=���luCF�dwVk�GYF�!�o�B�	��`i�S,�w��^����g�Hl"�>Ci�/[	��u�^��C�����\�4XQ^��������Vj�J����DH��l�5��`������:�p����O4�A������C��F
ce3� ���*M����+s�ni�w�r~���� -�aI"t�Ir���4����F r��<���<���FX��a�����v)0E���1��n��//��:��J��"����p�u�#//���Pr?�����(�1^IA���~6Ft��;���"������
�{)j���b��LJ����jWH������ZWa0�ey�!\%�(��������uPhC�R4[(����C��������+S�R��A$�u���`�a��,��;qW�+����r�RQ����V
C�{�~�J���Zmv�$�����#�T�m���}�ET�b��^��=
�Z����5�����r��U�����$���B���w�(uk@����F����
gB-�����v���(��k�n
��:z��K��l��C������G+���+
����M���*(R���|�m�	u�XUS�G���~��S���M���,0�h�Bp���ZR&��"'��Q�7 E���&��Zr)u[?��f	
���z�g&8����o���������������a���=
��9�}I%6��p�G�)N���@U{E���bo�������(e��*�^^����7����z��r~�W\���.@���wI[���A����F����hm�����)�W����������:�EW�h��c����������P~������!�[e���/O���}���l��|��,&���lh�p�U�o��o����PKvuwK����>7part_head/15_2.outUT	G�Z��Zux���[ko7��_��Z��c] i�6��M�m�~2Ti`,�Ti�:��{8#R�i�j�n*���<�\���Oo�_��������3�'���_��D��y�����������7�c^r>�.�+k���*AiP��J���W������+����G�r8!e��]i���FiJ�j���t:��������|7���yV.�����
�}p�	!/��������5�)�u����Mk�h#����0$�������pq�}�u�����f�J]�`L
0�bTI_���id��|1�.g����K2��,/��[~���v������!l�YV�����WK%�����V�2�m\����r�(LJC9�q��m��m��b�=��FX��"/�H>~�M�X���1���dZ4��1����T%S���3�R����X�������R����_���I2�|rE^p����������?#���U=�.��S����/�-�������eYYL~������T4�.��[*�
-���eF@������;z@����vUJ0��:FX�{v��:��&���0�����
@UC����T��5���������i������T7���2EE��R#�v�!�
�v���/b�fP�P��g�SE���h7��@�KY���SR;�vP�P�AI8���QV�Dyf|J�P�|�@�p���
@]C�R��![x#z���|(��Mk`8����7�
�Ky�5q�������@������WGOy�63{����
��Ut�^�>72'�PW�������S��m6���W��H�������(����3&?Xe�u�-v��kW���$��tIrE���Y���?�-��/������;�������$[��F�[O��-�a�����8p�2�`�w�y�B��Z?���u�N��,�	`X�+Vj_� H���0�����f�@��e>zXTV���s|����c��-�e�����������T�'�	;������Tj�&��s�F}��I�{��K�i��E|�X�Hn�������6�Z>�'e6�H�o��=�v�x6�G�'9�B�|����|�8_���� Sjcm�L��`��������H���_��5�����q'd�k����K!�GM��<�����.��I�o�
��t���$4��{\�?��pY���f�n8/sLb����#�sf�M:��1�`(�E��V�[�jH���;�=��v2b���Y��.FM���jg�A�J�_,5v��p�(wtv�1�L�	y�����2�?�]5{]���f����~���4(�l4��@�k����FMz�'��&S��K�=�+a�!��m�\�������M�<�fe�
6�[�>,k��@�a�J9��'Wn���("E�`�!�qI}�pQ����r>��P�P>K�"E������sPD�(bx���YoSDjO���V�r@g�O��"2Q�N��!���"��.���T9���@{)��8i��g��lQ�1CY�����E��������J���(�E���Y���
�R�����=D8)���E�#����9��x�0�Z������6G���6$g�r��W�Gt���!�
��D��S��!��#V!�E���]<��'F�Z>Et�"��o;����)�.����}r\1�b��U1�*��0�U���+�x��p�&S5b��N�#� ?}��v5����k)�3�NAN��	C*��3�>9.��uOK2Z��Yc��
q=aD���`7���y��ISD[�����T��V��A7����p�T+�:�����K���6���5I����]�(�-.\���#��LK�/F<{��EP����H��Tk_]���H���(��=-�Q�U=-�h:�����r�paQ��a�q��)����]����D�S�]����"��(�������'�E����9�Gx������jdh�YR��iqw�����8=dm�X��5��U�I��!/�+���^}�W���$.O�O/	�oy\����l�,s@Y�H[�GH�����:���>���]�s�PKwuwK�7o�,part_head/3_2.outUT	I�Z��Zux�����r�6�������1X���L�:�!��v=�^y�cq,��D5q��R<��H6�X�+%�����+1���q}q�7{�����09cG��gG�M<�S�:�d��k!��Rr^^k6M>��A�Oq?�k}�:�^:�Y��s��EnPs^\�r�0I&tqrD�N�1v�L�<� (��Z�^:�K���r�Cs��{t�:������(N��
��M�9����?������:�p|v6��"��N����������	������,�����>Mn��~7�V�6JI������D��,�6%SZ������U<��#�z��'�oo��-�[�����[���Zk������D���Nj���|��h�l��[��������~3����A<!��4N�W�n����hJ\���}�$(�b�W�_�"_�������j������_�{?����@���7���7h�b��}w�����oVo�~�9�a���T48+<Gt�Y"��.j_�_��w����K�[Z8.�&���!�X���"@�{t;��[%_����
}�8�{w��J��:l�<�8�iN%��v���cN��������iN@���!<����$1j�0�sC[�R�k�S�j!p��Fj�?0Ds���2��������b���<���dBGc����$�x�IO]q� ���/�\�������N�g�s����jm$���0���#��u�,�������F�k�B��%�k�@J��d�V��z��s{�h�W&A9�3vL���
u*����&��U�K���>�p_>j�6`���,��&��`����/�S($`0��
X-F!@sUd�(�)�8D�r�C9&�`s�����Qm�/�k�!+#I��c�	�`���]�
��KPzZF0���NJAA_�`�`���n�*����6`\?�)��i����!�]LJ,�U��=H�@]�����#��4
���(��X��@#�k=�p��(3.+�V5�!������!�<�����}�4���9�	pu����$������9/�Ar���4��uU#���Uy��My���s����k{�A9�>fV�.�!R�I4��E��KC�sAQ������)����@D�C����w��3��k%(_|�M{���@�P`���mD7���~���F�����"���A)�hkc�&e�E��qr�wj]��|����p���)�7��m]--xQ6[(����C����Br�i�Q B<�D���L6�����&���5g5�v�����4��+���Z�����2�m�J���$���\C/)^��(lC�P�R	�6��*Z�?n���I�7J��%�mz���(�[-���l���������7�n��n[��u�!��J%��o�z�D�	.k;��]��=	���m��mz������G�T}[��7w�oU������o=nB_�UA��7��Y�NR�hk;���qUM��z��=[Nu���n:f}�����!�C�H����8�FN���@��D���&��Zr�LS?5��Ai|[���	:����MQ_��e�2��Nud-���uZ��$��T�%��$���R��8-,!�5����0���`�(�8�]�5���*�~�q����[�mhYW� �d��V����V��R�};1�H��X�6V�pw<A|Cy����[��&�2{7��t���q�������w��y}yAA�a�t�_-%�*��(��\<�����T
��4����Qo��D5��RuH�T���su<9:�PKmuwK����B7part_head/15_1.outUT	5�Z��Zux���[[o7~�����0��;��@��mv�n��[��P%�X���u���pF����W�U�$p����#�s#��������__�tF�D��������7/��3�rv���y�������b]\�8*9�A�U�Z����{6-����9O��xF�l��4(������1��b����b�����l� �+_lV������}p�9!/��V�n\�h���1K��4��m��c�����lU���(�������w�����Tl��3�8����	� �s�8��8��VT��j�Y����d��>+��n��n�������)a[.}>�v3�4�ZHn5��z���9e�vmf�G�P�\d9��a���:��G��Ww���;�Mo�$���r���Y��������8��-A
��C8w�:!����4����/����������+���/�������o�$��>+��L�S����/>�[��m/�����������(o�jP����.�~@�E�AX�X�CV��T�
�����o����0�3Pe(z���������-@eC�
F(*#9C6�)=.�b��rZ�:���j�=@�a���C��Z��,.%P05�FE��T��
�Oy���������@y��RuH��NP��4�~P����H#�<u��NC�$���-@mC�S�P�<w�Ey��*��)m��^��u-@]C�S^(�uM���Y�PP
>8�F��C�<y�kg����1�9��u��QPe*j��I�g����~o����,�<�N9������m��?Xe���������I��q��]�B�\������h��m�h|���O����?��7t&���V�[M�@y:���d�A\vL*���bUt����G�S�zH�
t
��
�&u�C�|kY���m��o=�	D��M6yX�V����
����W�1��6�������o|Y=a.0?cPN��0��L�+Au����s�pm�WY>�e�=]��E
-�'[�um�x�E��l��Ij�*�~U�b���9�?w�t��&�����N!��=~@.���f��b������X���Fnp��S���d����a��5��s)���P~��e���})P �j#�m��u|;�Y,���M���X=����f^�q����y3����'
�7^Nb���XAG03	��&�. �`��P!��n@g��[0�������	��y�[��1Z#1��Q��{B[jB����-&z��O��������������?��jt���[��I�k&��G�n��B4��-������<|4�������-0v\���\	$�8����m���q��5iS4�Yj�q9v4bYg��eT�P�U���!�����P?�"<Q�r�E���b�!���f�Th�����$2R����O�"�~��EF�v�����1)B��Th��iQD$��H]��B��WPz5.�Bv� g
z��$9�!��"�`h=ED�"!��X�7KzQ�����R8,����L���j\_�i:�f
���1�$$� ����� �5G�)s��G��Lk��W�#�k�H�p�5$���8�E��B~*��.�N,��}Q!��=*|����Q�;�0���j�G���:��rDb� C��U8�������D����2b��^�`�OV$4��rN�^�5��"���@��F$�B���<����3[��C�!9-����%����%�M�_(v?E�1,D(�������=k�r�1��8Z�����E0�(|�n���iQ�&����&��,(�5I����F>��C����u\<�j�18b;�*=�yn�$�q$���p��T�I2$��W7���+�Z�!n������H�t�#�R%\�F�{��Z�8����������~����#�r���!9-����iht I@Q�G0G�K�!��o
�?O�#�s&�v�@R/_d�����4�{�jv��m8��*\����%�E�p�*|zI��k��|}F��dSde�+t!��(���U�����������PKruwK�*�l�,part_head/3_1.outUT	@�Z��Zux����mo�6���S�]�a!��s�h��{h�.Y0�U��B,��\Y^�}�%KblY�u�s@A)�O�?��&�{�����o�����`rB�����xg���<;����PZ]K�&_����/�0�JyDz�A���IO�S�$0*����t��q�����	� �"I7<Ni.��
�f�����������������|1����4��hr���*��E�a4��� "?��!?<9�.&� ���?�� YL�#4�����I��t��&W����E+�~e�dxB�dv�����l��xM�W��U=��zO������M�����u]��%;n,zl�a�
#�K
����T��cRP�����/��x�I��Mt������(�!�$�������o��(J�Kz���\P���*��U8�U�b9�26�Z�_	�����q:�����{�_L���>~��Y���f��V�~��2t�rS�Y��x���Qa�Nk$un��4\���j����;^e������Y���s��9���*m��s�n�za�T|k������yn�J����:l�"	;�bN��0'�fN�,�aN�jNm�s�m�4�m�N����#�[O�5�)����R��Em��	����I`8�)��-,_��d�n�C����*��z~�������yFJ�r�	A�
��R���\����$���������Z�����������hL.���b���d���E[E�����%�`RR��^�S��.�M<�����\$�r����:f�����&?�U�s���!�t[=�s��T]�u�����3���D���T�P�K�\)�5`q�f�c�Rf����M(9x����0f�]�k��D�����UM�y jB�}��w��Z�N��"�������
#�U������}��x�������4Xb^�s�LR+j�VJ����P�k�f]����j��DC�V�*J��F0g��,��G������F�V0`fV��9<4�,���i�v~��&��!�����4�K����=`��<�0�]�i��FPg�<���<My��85<�i�T#0g������%?�1�T#��r��s��(Kk
�5��������jm��(_-7Q6?!�;)�����`�^6����
e�m�
���������\��a�F���Baa�!�*��h5��d�tP�|�s6u$�X�l��|#�M����X����(��E�^g�%��USS�MR�7��f��b��T�d���`��U�t/�O])ChUB	����Dh��z���12�.|����:Z�{��	�I�W���%���XG�)���t[/HC��`S�=f����@	mz�.8����U����y~�EM-R�S�%
�I��n�hh��u���
�tZ�����4V�u�+��z���9z������D�o���o�������U5�=	���;��B���4��N���x_S����.r���z���m�z$�B��S�Z^��2AV���	���{�qMQ_5�R>u��������UI����uZC{��j��T�I��
R��89{<m�R����b�5{���va��ZR��wyU@�����wV��A�U��W� +��U+�3����V8��~���$�������)�|Si�c6�UA8���d��zM3rJ�\������3�O����~�0��k��������}��Z���d~@��F�E#��#����Z�?��"'8:8�PKouwK�Z��'zpart_head/14_4.outUT	:�Z��Zux����_o�6���)�V�!��i���[7dE���d6a�%������GK�+�rc�dDb2�����x'Bo�����������GBW?���t���g���pi�v��8d[|�M�|L��r"���Y�O2R�k?f�P�9�q�l��b�����zE�/i�d�gO���_$����A7�A�\� ���\�mK��Z7"����Z��'E�Q$HYmQ'DV��b������<��1�}+n�}>[�\��K�e���`g��TS�fw�f�f������o9j�&�-��E�G
�3(��)��n�Z�=(:Zf9u}������Y�;�T���OE���Y�����)�l�AZ+��LH��M�p�n�������h(��8�$�EYb�2�#^�]�NP�����p�]-�'Y�3r���dC�x�x?KrR��h�	�9���E&�yu��QeU��([�wCx��>f��2��C�������b�����8\���?������,�"���<��+���(���<hO>�=X������vc���8�����KjO����}��t~?m��9��Q��)
	C
��j���O������������/�����=����+��]�/z��V��#eR�H����J��B�Q�
whN�1�
n�4��gs�8�������C�����U8�1��&�8��EN�#���!R�xTb1�rs���C�I�C������lE��������cV���e���
��G�y�9��tU����U�R^1�q�t�
Qo5`k�pyi����]�=��t�����Z��k�����lbW����]������n8��^QK��13����E�v�J<�l]rH���h������^^p<%��E�}C+]�
]s�2�C�T�X���T�n�Y���:�5U��\��|��%�J��6/�j8��:4����^�C���"m���?��:n�a\�f?[��u	Nbs�W�����~]l?��d���ka�Y�<,�����=�bj����[dh�(��}�^$�
�P�?�[�������}��&�O�'�T�_�����PKpuwK�a����,part_head/12_3.outUT	;�Z��Zux�����n�F�����X���F�m�4���"(
��aIT(*����Y�"i���D{��`��R�g5��������_\�GW�o�k%�����t.���$Z'ck���P�8��S�9�&�1�ghL�M0GI�pcA45�38e{�����h��N��x����2���zG����m�n��������~��T�W�������E����h.]���
����,\-��+Fz����mbLvf�Fx������-���]���q�,�����pr�NE�q�(�a����9��WA2sq�9n����1�E7�(e[���N<��]fD�����j,����q��f9���]q��_���!�d�`zh���E�\����������;���Pk����7�<��������")Qv��L<��2��"�������-�U-�i��/�Q-�ka�b&��Z
Eo�:qSt	�Vdi+4mUQ�3���8g��H*�[R!�0����s��I�D���XB06��-�b���e:�6������f�2�'.>G�Q�h�.��G����\�x�.^_?������?���V�Y�������Y�h�c�a{��7���p�����(�����A����Z+�
/�������*�9O_p���u,V0��0n�t��h�����k�k�q}����5�4��&f����|��^N��,:6�py��x
�
�Y���>b��x�KY���'��#	��bR�Z��-��S��x��������m���@6�,D��mYH�������|;���� +�x]l�����������fv������?�Yd��"YV@���$@f�L�����h������$Uy�`��?��|�
LIlSFV�
+a�����rk�0���WA�{�L2�i5#���2�����m��d>#3��2�W��[��,�dQ�����]OIl�����R;\lfG�;X��	���
��d���� +i�Y�PP|n�0L�dd_�r�����A�` ��Z@5P�TZ����$����rkYYb�YVA���Lm2�ed;�r�����.idK�L��.J)��pz��Z�{d����mY��UGy���� w�����@VU�U��5�
d5D��o��IN���m�Zi�l��E���` �Z��'�u�M�5�=�$~��.6�#���G����ZWA��fd��P �!@��|G�k�VZ{6���49�\��dj��A.�6ed��1W���C������u�w�Md�'��h6����U~[���e�A�@F�YQ�Jk9��d[���40�bU�k����� w�6�����l� ��@6)���jQZ�����  �R�ID���i��D��`��#SR������_Im��a���SiW�����V[Cy��!��H���&f�����0���y��(�h��;E��	z��M6INfo	-
6�������;;9�PKpuwK7�mp�G-part_head/4_3.outUT	<�Z��Zux����Ko�6�����s����H��E�$HEO�k�Y��r7�O�C�\[���|D�b��P���:G���O��������4�����u0R���x��9UD,�pI����$��S�-���c&.�p2MW�����)#Bc�4��eY%��|^�]~D�sM���C_�x��zzJ��$u��sM�&ySr_�|�;o�=/Y���6Bq2s�����1�\$A����fo��I��t���+���dd����-a����uT��y�;N^�DtN���FH���z�������A�M�4Kv=t�
���H��t�C�����F��T�UD1Y�	�=��8I���Z`[x7���r���C�)XN������������������7.}�a������2����	�	fY��s[��#2:�5B�A��CZS\��Zc�f�a#��f	oo�;��b��Y�F�� ��:�� .���HA���$_y���M�u����5TBn��� ��L+�M�� ����A5���b������1	C�a_��t�8��}m#!���bRp����\8H�H��H	k'���q{��y���KFh8,�����}���K�.	�0�o/����h�����$��ep��u�N���*��G���]i�=����3��[e��Zv�����k>�e=�`���R7�G���`r`�����;Qp�b�+�j{m�)b��cKH3;r�����@;+3���v����~�8����>&n��E�G�E�C�(m1/B��mA�f��k��1���6�Y��D��j��	bc����"�]�}#-��fH�n���t�4������*�=�n����;��/�����C����x���(c�Tvs�@H�y����Ti����|i~�:Gz{�M�^�y��&�W��-e-H��4�X�*�D���hB��q����	����)Q"-y/��H�
�E
�"�e[�V9��F����P{�K�����c/EZ�6���L���*���s�e�h�m���%�-��U"���[�%��D������4��k�e��&96��Js�&�~+M},��[��y���j�2���<�m"M}�m�,�����i��i���:)�69����VVu�����Y�M��'�H�L���-���W�H���X-x�Jk����Riu|//�i����H�nU��cgh/��HS+1��J����ofc�W�* �x���i�J�*��
i%$��kyx�iq����$�w%�S�k�M����b]%�I��r�K�Fcc��h�9��Q�K�-�o��Hs�TE��'��D�
�m�hu�<'z�xMy�D��
Q,��V�-;��"�c���)��?hB�T�<���mLk�'���N�Hy����w���'M��Qr�N��jY�Z��d����C�������a!�/������}�H�Fg�m����PKsuwK�*�Q��,part_head/12_2.outUT	B�Z��Zux����mo�6���S}SX>?u�lK�`i���b�&b!���r�t�w�Q�%G������!H���G��w$����~����zw}���V��N�������p&
F�2Z���XP�7/��/�!E_�q22}��(YS��37DX�
#o^�M�i-���	<����
��4����8Z-.��bw$n��z=�~bt�����=�tX��{8G�p����n)��.'�b�]���}�Mc�q�yB)f��gF�h�,�����o\2�����*�/S�o�,���0y�c��w�u�L\��yz��Z�sN�qNY��x�������{/��i0�;.��]��h���{�.����,'5�%��V�,�	�����9Y�'w����V���[��y�����
�Bew9@0�)�K��]�aB���B��b�������V�����A�{j)�u����5t-��Vh�yTTP�3������+�f�b����T�0�S7E��'�~�Q4�
�B�_����U�TV�x�\�X���b3{N�����L4Do?������\�FW7�~�����������h��uz
m~F�k� �=l/��6v#.���&��
����[}��!zN��g����������"B��/���k���%,��$�A�-����&��x�E�!�>>�������X�v�����}��c��w(���������{������&X�R6&�8!������yP1�w������)��
<�nSm^������{G C�X��
���>
q+���BO�7�Yd����Y�4g*������rk�s���2�d��,K S��� �@��,�H�;AfPHb��H^%��@��A��0���sKl
���J�-Td\���fv��5��O���2�4#����2������'j��B�$�(�s6$o�w����dd�@@Ul
���d�M���/6�#�-�d��dQY�2�@�e���}�,�Y(l�j�0�����,0g���������U�u{d��,,��.6�#�-�de��dYYv
����,{�1�u�%�lY����oy���+�Uu�l11U�u����kxL�t 6�#�-�q��Lw �2��S�An_ �>��*�<�����������"������7�ue�L-l~�b�22�Y�e�����G����Z�A���!��e��}����Jl�`�!#[	l(��u~��u�O����6������oB9�E69\lfG�[XSF�]�Z�2��K����/�M� k�6�a%������;e#YQ���@�� ����Sk��l��.6�#�-�	d%yw �2�����uo�O���������'��0JB��F���G��R[C����m��#���!���j3;����jk%;��)�|$��2#e�������1�����o^��
���]~u�U���[�() �j��@g�����PKnuwKIPv�G-part_head/4_2.outUT	7�Z��Zux����[o�6���+xW�"�c.�mm�-	�������D�,���&�����kK��/���R���i"t���_��Aw�W�g�
.�d��`�
��2s��6X��-�
[����rL��`�>���@��4]MB�s7��0�
Wo�[�8^�����{�>�$�9�)�W�����=NR��:�k�W%��N�������5��p�#'3�,(����$�� }]_���O.A7.ylQ)e�$#;eR�%�����=�Af^���g����I���]W\OV�������I��d�C�Yw�����`J�=tj�����C�-�����������8I�US-�����)�SUWN(J�rj����b�pe-�#?/������7.}�a������2�����	�I�!����*�qGdt@m�r~Pm���e����A�	q�6~HmZ�^��DW���f�4"�b>�qv�"���GQ��f%���}�[��[�n���&����{D����`a�Vt�.�1Q������� fL��� ��<�D(�����H|@SBYL������_R��� 
���70n�����A��d�����l����V]vI����z��nE��T^&�2D�(�����e:�/�0�Vi�}�#wq��{�"��)������b��"���-���|(V� zD���!"�n��������@U���D!^������C� :f/[F
��#����%��aX�D�Z��OO��a�!�)F5|H��������v�4Q�b^�?��� mXfT�F�h����fMH��P?Uw#M�Q���@U!-E/��H�
��IH�i]CZk�5��wHS1�U	(��Z\:_�������C����|�D�(c[\�x���H�2�aB���%;�K�
��IH�i��4,i:wi�;����b��[�Z�f�#���
i�}�-���e�Zk������!0R"-y/��H�
�E
�����D�����F��������h�;��t	V�4i��LZ{�����w�������-�Dkl������%���ZJ�~���,y�� Z��?b�L�����C����r)�e��KijK��7i�d�eB����i����X����9�}DZ���zDDZm �:�����k�U����l��x��������/p
��i��;&J����!-1����W.�������FZr}����$�����8=v��"���4�[��n����h��\����>�M�J�Uy�Pmy�'�n��:��'��D�
��ID��h^�����h�?�%d����&m������R���H�����&]"m�h�U�a	����BZ�iy��������f�w������b[.�a��4,�(��7�\��Zu��4�ty&�Y����������j�[w��������	�<jBN2j�S-kTk��%o������A��m��5�����MWi���
����
��u1�PKouwK�`����,part_head/12_1.outUT	:�Z��Zux����mo�6���S}X	>?u�lk�`i���b�&b!���r�t�w�Q�%G������!H������������~�������x�_+yl'�������p&
��*Z���XP�7�
������/�$��>C�`���J��rc���Z���M�Y-���	���/z.�Y���7q�^^����6H���������U1��=�tX���?G�p����o(�����rM\���}�Mc�q�yB�!�6C�>�F[f��mn5��i49G����n��~��Q�����}����������09��h79��arLmn;�9�����;���,X,�����b<mh�gt�I�~	V�����K+�t,Nex�X����?LO�����Z�|Fn����G0xhg��6.�E�1���/���l�0��j5�B��r������V��bR-�`���Z
E��*qt]K���6�(���QZQLY�IR���
a���l�f��������[P�����Z������*�������Sz���^�����h0(<
���hp�����7��r�������?�����i������A�q4��1���@����]�L����7(���o�A/���Z+�
_0���/��*k�9O���_�U�����1��t��h���Z��5�8�>��}���q����'�2�$�����y�z.nQO`���7�;w�=f����M���hL*~B�-	����bR����-��S��x��������m���@6�,D��mYh��Yb��}wz�UAVX����� K)D���fv��5����dV��2�@�%����9I�Y S#�,�)@�i� S[d,�?��c�-����U��Ed%<�T`S�����rk�0����A�{�L3�Y9"�����OAi�mL��YC����U��v����Dd��`U�5 3��Yr���� ���%�;�Ed�)�J��@}�,�ea�j���9�4����	�
dYY,iUl]D^��k%�������v�,�,�R#�����[�]��� ���^Z�}~$t�����S[��d��M{���� ������@Ve��> S��Lmd�������w#!���m��T1��d�����+52�z~*�%�."k�e@,%�p��AnaM52������4��J���d*����p��jd�kd��W�����O���6����Ys_#K��=\lfG�[XSD�]�Z�2����O�25����dm!f�����d�:���0 [���
d[������J���r���� ��&�����l� �R�
����������*;�hx��5�$����l��jdJ*�5W8O��������+	6y?@mfG�[XSn�d��5%�WBH��51�e��<�9}���������A��	z����I��>��H��O:������PKruwK&�S-�H-part_head/4_1.outUT	?�Z��Zux����Ko7�����>����!)��6	�:A��(z2Tia/,i������;�T������X���.�����(�����_������t/�._d97�4��8%���)���R�W0$�i���
���8����B�G��p"�x]�qR���Ju�I������o��k!����$��wi�8���K��am�8%�*��]��rt�6zY�a�o��@$�8J� ����<��4����s����}���Qz���R��e"�Q72��mh������,Q|�g�h<��7����~k�|�i���n:�1�t���;o:j����
O
�.��j6�1�#`}��4�����U<���m(��7M���Ce.�;����q��k�>��^�/���aQ��}4MR������m]��.�`������7��7���7�Oo���Wof������r#������6#UY���:��� %A��z�9]��w����"�������4�&�l]�#���Q�,���������UKm�g'�h�������L$������!�����!�3��8��He��6�Wy	�M8\cY���,J���p�ol���+�.��T�j0��^��?��z?����	�����.��t��`{{�
��[2�.T���(>����+��S��#�Y�_f��k���xv'����I<��,���z��xqx����jN �2X�]����K�{�,>�)@������Jf����)�6�N�z��OnG����jU'�i4��y�G��!��i��k\-H���`9^@�is�F�����������M����!���i��D��	��+H�c�6�@��
��S]#�{�4hMR7(���G��=I��_#M�p\v!m6��x�F��-H;v���
�����^�)��YA��4�H��A�������-]�_���<l��A�vx�mHW{i��i��tT�Y4�^"m�H����U&+C4���P{�I������I���y���`'��M��C�_zh][��I������|Z���;i-m���w�I�n\!7����Ik��F{�43i���Q�`���P��pQ����.�i���:�,`�E�$�o��&�i�=�d�"{"M+H�Qq�+�vGi
�5��;�A/}��W�H���X�ub��n��k����X��xgM�|8p|�WK�-��+��F�;�niw����@����4�B;�v�CJj/���c��	o�l�W���p�i��(���������J�h����H�S �W��G��%��&�5��H#3����j{\^$���?�j���#�z��������� �^&�m?��S V�H�HC����]�4����C��X��4`���ng��Mzx;��M��M�/����i�-(	M��x��S���������wMT���Q�2P/�������7���6��.������1���W?a�yX������/���PKouwK��:'�.part_head/5_4.outUT	9�Z��Zux���Z[o�6~���[�!x�K���nm��(�=�-$Bl���5����$R��(vc�!�A���X��}�F���������o.����::{�r��$��@�d���s����an1n�(����)�����siN�I4-���">W�H��U���S�Y���O���'}������f2y�j�Y�m[��r���ot�Z/NN�I'E���x~�q:�g�<���7trL�'�t���:C/���4[��)z�����"o��:�MP�-����8Z�J[���,����y�LqX�L�h�|���<[//����**�Mx��������Bx����-�TR�_��FE��8�L�E�}o;�m
)f�&��{T$����	f��c��AE���e�Lo:��6#]C_E�u�o�*5��6Ui����
BCzg���J����8_���(Mc�C����h�N��}���>����#Z]osA���{�]P�h�n����A�+��)����AfI�MQ�n0W�%D+�X�)[[���0+8 @���Ta��YB�URj�,K��N:Z�����:Gn4O����St��9��^�'<��@������V��U��8���������V������&��/���uxkf�!{#��m�,��y�C9\��,<�s���r	e�A0!K&,���zk��iX���0R����,T��}��!m4��s�1��>N�ei�Z�T7
�h.,R���(�Z4z;�U��'��pj+;�7�(>�tL�!@m=:����h]9!�X2��vj3��O�L��Ym�C[�{���f���acFR���td�e��y�4��0#��fmhS�P�����V}�f�M��Vb@mY�M6��R	Q#��{�
�\���5�?���9	��l|[>�������)����K�Z��)������v��V������������fc�m���-V��P���D@���R�b��`��:whg|'c$�[X&�u,1Mr�����RW����G!a������zz�	�P�����4*�ewO�}ZEW�A*������Xl�vJ��h�'����otx��J�-^��sMo�������V���k�exB
!�o����
���$��F<,;wUm��g����UE(�8P�u�VK��s��T�aI�//�y�n�S��3��|r����g����d���{��{�����5D�������"Z,�@�����.�=K��7>��0A ��g���`	�����i�&���/�\Q��K$t�$p*�~X�����TB����8�
��������.�((�.���p���G��<�E���������:�-���Y�=)�{Z��?�S����Tp�F����NUh��4�@=ug�P��"p������89U���iw�]qj}�e��p*\=��?tv�?��S/C�Z=V�v�%�z*B�qA�p�B�
��vS������S1Z����@�LC�����6����fH*�H^�����z���V�$�HLq;����l��S~�s��<NN��c�9tO�����o����TF��
j(���_�^HU����v�OE���`�G��]#�v�����,����>���pA~�t���;``V4N��������������e�(�����K>x�^�������@�W���g�.��s�p�#����j����A5�F��F�r�n���rC���:�����r�����RSI�&/o��/?�~v9��rz�#,zW�{p3��.��u���K�|L�|(�=|4�Ng�m��z��W(��^���?`���������������������]���V�V�������U������p�\�-5W��@p��B��S~��P��b����E�-^]����6�zw�}�kANR�?�����%��	�(n��zqO�eXU�R
*2V����������PKtuwK��06s.part_head/5_3.outUT	C�Z��Zux���Z[O�H~�W����_���nw)����>!7��"�S�����=�����4qP���G��������"zs�/�xyq�s�C�������fQ���8Yd��[A5��b\�Q�|]�S�5�d������`�-�)��Yx��+M1����2M�9�>=+��@�&I{�����)�WSf�j����~���rvr2��0���-����1�I8���8D���cz<��Y�Zyy-��2�NA��77�Z�|��d2BY2?�B�a0_��~gI
�3�����l�6��i�=D��d9���K�� W�C���y���<���a��#����|��Jq�EI���8������DH1#%��C���P���	c��^�kh����-FL�������>LWT�sV��4V|KU����\�-U-��$}��8
�8=�6o_�x|���N��A�E��`q��yC�|\i�����tA7d|�e���l����r��?�(�TQ��W�&D+^k�)[j���0����������g	e;hTH��U9�t�X���7t��h�)^TO���kPe��HOx|[���|�5�������8{1�
>7Kk����I�`�p������kf�!;[�k��%�$L;��<M��Wq�����,T� ��%�kLmE��F��iX�0Ip����b��6�Y���3m���tN�M����%q�Zu��
��K���`7�`KH�[Z30l��8�����l����m-�L������m��2H~N1���v�#Vc�+��M���E����xkkl�my�`J�����[v}��M#8	3��4��0h�1)����6��ak%�`��*��l����;�M,���m
xz`;'aTA��mE���M��_��M�no�M�d�os���5lJ�rm���mU6���6����
�l��
v�d@{@��r��%��zc����T�Q���L��I8�.c�a��%�N�k?���p�P
�y�����v��r�f��D�!?�2h���j��>-��p� [�i���-VZ;��S~7{pI $���T��%;�9
i�x�V�5�E�h��{����m,��<!_5Lm�TY�0=�~8�~��
�{�q����*��f��������V�)�/�u�m4����u��|���/��1��{F(��F�i��i��{��,^C���x�i�����
�2��B��8�~��/�`t
��_8A��yM8�v N�]08�������5q���*��Pz�z��g���*5��-pJkNl��fNU��(�r��9��K.�<SNM��6�������>���!B|g��S�Z5��H�{�~�ZW�%��+6���%p�|s$`�i{����SE1���S��;~����T13���w�i��)�iO=U�q���J�)�sG������C�i�X"���3�]���#��m4��W(~qZJ_=�����
���u&�L������}u`>N���#U��F�#���8%�Gb��'��
��{9?�9CS�'�J�����}yAU�j��}�T����z��C�JzHU���N�����2J����w��9d���������`,�r:*	��NG���v����E���;�.&"���3�`6�}������U/��|3p��}E�P���}H��ix�/0c����U+�9�R��Y(����hZ�=���9���>� ���I�6��������. �}����t�E7e�������(���o_:�k�������{�h~O���r�w(�<�V=7~@���j�2J0����P�a��3�l�c��kRk�.�b���B����������l��oW5&�
�����+�d�!8+�
�r�2��(���)Fa_�S��Qt���v�����[oo���/S	j���D��n�qUC����y��<��)a���F����wtzt�?PKmuwK��;�.part_head/5_2.outUT	6�Z��Zux���Z[O�H~�W����_���nw)����>!7��"�S�����=c{l�NLqGTu������9��x�%���w�����g�f<�N����he��EvJ�T3�-��5J���S��G���T�ct��e0EY4O�$�a!
���ze�$s���X<y��U��&d�&��&�����
���+�1BG����h�a���k����}��p2O�q�~CG��p4���n����$Z��e���������f��d��d~r�n�`��}����g��u����}��g�>��r~vs��7A��C��5=LoG�5=T��5���_�Y��8���Y�~n��M�3R�AH?D���������-g/�5�|[F���"��H��Av�+�J�9+]U+���D�*P�������wa�@��A�������`�o|���9H���,n�� ox�_{�{BPX���pyi�j�^��6(����A&Q�]Q�n0W�'D+^{�)[z���0K,�5�o�o�*����<*,G�&�A�V:Z,����:E�j�)^������-��\d	�'<�.H�<_�>��Z��U�5q|1�
Un��Z���'m���k)�0�����hf(L�]���B[�N���!�\������(������'�`�W2a���z��5�%���Oc&D��dh�X�{����
��p���o�j�(����(lh�\X�$g�2Q��-�6�2N����V��o�"�
������l�6lI!���qB������ 3q�J>J!���a����Tlk��	��$Rl���Z��a�nl��i� aF��V�	�rQuR$bp�U7��-5l�Dl�`C\���4D
[wb�X�u�Z���)1.H��je��V��t���� 6����m��m��^�����k[�vl��A��L�����(�������K'Z��(���D	��a
sQg@p���]�3��1�@~����bKL�����1����,����]z���^���l1B�#5���A-���W(��"�	GR�e��*[r���)M)�fG>8�)�����hk��5[�F+�����k����'����o�B�r�	)���aj�B���D2�}	�w�O��m�����Y�'%E��&0+�,'�T�iI�k��i�n�S����|u�!��B��p4rw�5N�����d�"d������E��� ,��Y�������~�N�%����	�����A@pm�����O����'u�)����*�~Z�����TB�H��4�����j1�FS���1��J����%������wl��i{����tx��:�5��1��\�^��N�k�RS�H���4UUs$`�i���v�7�6���r�����=OM3i�^y�z�����c.�
�����x����4�z�y��������#T��S��u�6�_�������,���l`�(�^!�tO���9�^���VQ �|�����S#����Z�<%U����5�T�1��z���������Rr�=��n_^P��������oK���N�F�l:x�UQN������Q����_�w���t���-�W�[���M�]vG%��<nw�������u�C��ES�)�)i0�|~���%���������=�|����=WS�:���7�������"h��D��s�v�,jk��\��C���=H�?d4�4�U���	�������7g������A:���2�������(���o:�k���~���=W���$�/�.Q|��������8�.�����(�a�u�{���l���_0F�~��FZ�Z�qg��R�xW���oW5&�N��7����8�@w���,/V^y��U�b�Q���e����vP��_�a�=.��)X+�7��@t!�^!w`E4[�w��x�O��WJRK�2�o��sG��PKmuwKNf�r�.part_head/5_1.outUT	6�Z��Zux���Z�n�8}�W�-�!x���u����b��!����6���P)Y�U���t���B�g�g��~�����������?g>�N.����d����EqI�T3�-�M����%E_�Iqw)�9:���2��"���Jk�%���C�Y6���O��O�����	Y�d�6c��
��f��������-����i��I�n(���E�N��<O�1�
�����(]�b��.�J�I�g��8A�_|x���M\�e�*���[tG�E���x��0?��O�!�/�4�&�b�*������<���x�<;M�y���<R2����SI9e�4*�,��M��n���R�HM!�W��$��
c��^�h��L��FL�������.�WT�sV��4V|GU�"^UF���V�O������Fi�b����2�}��S�]�	��#Z�mrA���l{�{]�H,m�44Q�.({u�dW����S��%�WEj����@�o4�����]m���.�p+D7� ��=4��D�,K��N8Z,����+�D�5M�/��st��9��\�'<�����?|>��F{I������Yn��YZk5>O�&���`�,�`d��������b�v[1[�O���C�\��,|�sg�����7�Y2a���z��5Jx�a]� Y�Ad��zHp4k,��g�a�F�i<E����8JQ����t���
raip ��y�u�[Z30l�nq�[�>{k��X������[�waK�0���cIl��Z�|K�!i[�Y[���c;l�R�W�Lkk������o��h'aF�Y��M)���$bpk�u�f��������R)�@�V(��:,l����|Tp����s���l-i|[>0l�nm�
k`S���a�
�k�6�?��v�m7��������m&����fC�m��Z�X5^B9���2�P%��P�G�;�3��1�pd4�����4�yC����rPh��(at�Zv]�.��q�!J���������}�;�q��#���m[�m�R�)
P�=��91X��:S�������K�V)�����k����f�Z��6����R!}��5��_[fp��R�0�qXt�Bm�=o�
�����P�q��8�a�O�B���uy�L�8w{��&�����Kt
��B��t4r_�=N���e7YuC������X�l
�Y�z�oY������`�V��}�4a�\��8�V��1�u�E��vW���+`�8J�o��Qr*!K
�iw����SEL�LoX���T���i8��.�<RNM��.�����	b/k�3DH��7p��m����^~X��.��S�����8U�8���=�T����D��Rr}����<NN3q��yW�Z{�}���~j(����_��K�V�N��e>a�qAe�:��S�����T���A���,�v�l`�(~<C�����m{5%a�R��F���N��F�h�����)n{8��Sw�nl�O�1���89UJu��=�+���M!m��%T�$V��7�����R����������Rl
���\��s�1u���FA]~4�9���c��]��v����E����g"�]_��`��/�l��@v���_�MW���*t=.�J����$�c�<�
�6���GE�]U+�9�T��Y)�����7ub�uNY��>n;.5�n���
�qz����gW�>��wQ>���"���H[�lT��e�w�
|@�s|�����W��t?T`��$�E����_z�^��
��j�2J0=��Pw����SX2�k��q=�R���Y}��z�U����Y�	�^\7j���-���Pr��+C~�����b��=��E�M^]����V�zw@�n����-���K�!#� �H�f���!/�eU�J0��+���g�]�����PKmuwKx�n^��,part_head/12_4.outUT	5�Z��Zux����[o�6���)���V��KP����Y�a(�!Pm"b[�,�6���P�%G������F����'u~<��F�d�������<�_+yl'���m;A��"LM�u2�F1���o/��O�1E��i23}�F�$�s��7�V�51o/���<�V�������
��<����8�����bw$�yt�0�����]���Y:,��}9C�p���-n)���g�jM]��������1�8��kS�o�)��M�h�,�����o\2��g��&���S�o�"��C������O�u��\�;9n1(�&���&'t>9)��Bg��Q|��5��������e�YNf-
����8	A�/�zV�X��`zi��Kk�)M��^��5y��l����Z�|��z3���`�8��45XmZ�l���N���b��C&Q�fPh9_��rZ�B���]�����He��R(z�����K�Z���f��V%�� J�,m�IR���
a���|����}@7�`����b	��Xk@�����*�<~����p����F�g�1:��������kt~q�����������_MfA��_����g�&�������oc7q�*�;o�?8oP������^��Sj�|F(�`������*k�9K_����:	+XK�7I���k�t�{-�5�5���>�����/��^���d�I.�S�9���:\��(��FwKoW������?;7�M��I�O��%a�qTL����2�?EK��G�m���?�����	dC��B�l��=��g��-���2�����U�5 K"�m�<W9@lfG�;X����@fe��^ �dQ�Z���t�
25�"O��dn�g�AZ9a��?����U����%��sL�:\lfG�;X��Q���2�d��L�Y>�|�9��� ���Z�U����<[��j�za� �JD��U��Ed=�0�m]x���� w��%�?�Ed���f �2�J��@C�,�y�j�Th��p� Gs6dY�tI�b�jd�w��%=���rkYYb�Y�A���,�` �@f@g��K������Z)��*��RkU��-&�*�.�6~���X{���� w�����@Ve��^ �dj� ���@VC�����>B7��� LeQvr>��R#���?g�[23�\�[�������rk����1��e���
2Ub(�� S�M~��5oY��S3�m�����T"���[���Yq��8\lfG�;X[D�}>�6e�M��55�
��dm�&�E	k���[T`-��m-���l�b�"2��Sk�F�������?�md��Sk�;~����w�N"Z���Q�
��2���F��_����LV�����?A2
�Nj3;����rk%{��)�|%��Z&3XvM�c������yQ�VP�r������l�&�����3��;���NON�PKmuwKO��
�H-part_head/4_4.outUT	6�Z��Zux�����o�6���W���C���X
d[[lK���0�)�l!bK�,�I��%Q�#G��!$�MQ_���;�E�)������t}��$����W���u��sBF�t�_
���TZJ�%�:A������o�,��yNF�i���I/�KZ*j8�4��p�<M������-!��d2��G�C���WwwYt7���w�Q������3]���mw/[q[�G�4&i6���-��m�r��i�OMg/��$��2rew*�*o	�e��u���U����,��i�����$I�����z\O���~k?�O�,�Q�K����?%����\t��*+�����Ly��f�Z�9����%�������U���V��n+�K��W.�$V��]�kv�/[���9����(�O�^_���aU��&Z��	��}��k�����w��;��h|����O�3nr���^�����]-�Q2n�*��	����Zy����6���{yO#�c��������h���8y����l���I�����5��zCR`P�h�=c2�Gs��|�N�&��7�������j�g�k�dO������;\c����<��d4������%y���.3�w����_�F��B7��c���{��|�X�<���y��|O���@��q��3��h��j������������U���x�x;��(��E9���	���������i�C�z����pC���g��sg���/U��&�h^�$����=����i��,����6��Q�,���^�?��6���Pa��@C��Z��H�C6�F[��
iH{W�@s�$&k��H�]��K�u#-�	��
��_��a}#
�C�h
u�g������=1�`2 �����BZ���|9 m�uH+��I�
�i�1W9�biq��D�>GD�QZi��OMU��C��DO�j�e������0!J�;�������3(1H��>�ri�B����1Dk�.u��D�����<i�;�_O,
���y�����D<� ����h�AZ3Cu��b�f��N�w�
�U�h�;H(��+��0�i58��IA�Gi������	&��iyD�i�z[�	�UW���������4@����O�����@Z�wV"�ZGi�t�H��!��Yj�J<�B�W5i����n��o����X8�[#E��#����QZ^�+��FZ	s�����(�� �������)�\����:&1a�Lkj4���p�v�Q��[�.�Q�4�J���{�H�S m7��� 
�<K�v��M�H��!����.x����}-���q� 
�#��F������kie=��y�&J�a"�N���@����{ -K�ei.�F�
iLc�git�]�7�����&�F���	���C����bZ�0m%�T����N�^5�R�S<j�6�5a�j�{��q�j�.���k�jL����buF�=F�u���q��Pa��t�J_??;�PKtuwK�53Z�*�part_head/10_3.outUT	D�Z��Zux����m������W�[������:3n��i��u���SF��>����^����� �I�G���^O��%.�.,@Q�"�o�~������o��e�/^�������j_W��n�Zkm�QB����n~�����qu����eq�X���b��/_K%#L
B�G���n�y��f/_@;���8����a����[��?~,����zq_~^\��WWw�u�����1r�]���\��7���,~U\}�>��^�g�e��h}�f�[n��K(��/���������b�����x����m����=�n��o-7���z�5{�EQ|���[�H��%�����U�NO�'}lOm��<�V��o�?�?_'}��w_~����������u��<����-;��)�7[�/���<�������n�����vsxx��������A
���pK�"DY�3���j����H�A
�r���AF+U�3"]�^H������X5[R������b[n�g�������=�������9bl�S'$�l������&�m���������}9���z�*��l?��]��n�^�`wr�{o���v�������
:e� v1�a�j	���U	��b#=l8(�e�p+-%V��dV[��V=<���w�%z�*s����.:]���Z����rY�-R�`|���w��%`,��:�u�k'
=�����V�:����pgZS3B�S�Yt�l�!��pg��GW�,�z���@�2�9S�X3�5�8<���-����G'NU�����|7��YGUAg���,����c�e�0��xf�<���75�O�)�rS��'��O��U��'Sn*����6��1��7C��aQ)%��b����;3)4��0F���d�
v���.���;����*����|��c
���Q��zv
�l�$g\to\�n�?mV���1&� x�x4"�4�����)�$06�������\U|��v�S��lo���6����H�'7_���N�y��T������%�cd�Q����q f�YK
����h-��X��1�DSkI��S�c*�{p��goUA�{��
L��Eaj�I5��-���%��LP��jg�Y��uA�� &%x9GUZrX_@ �J��X5GYZrxO�f(��������KK���Gh�����
����n_�o��=���Xw{Ju88k�Y�9J��8U	�����p��<����	CS�-8�iS�'d�uX3NLp�0Ghv����(���`�q&��M�Y��6 �y-XG)ww�]�M�C��r�.6������jI�=�9���R;x
�u�Vs��r
�$�PM��;3O��2��G��G�x����������{��|����9'_1��b���,�|��"��_��+�2�)(�����d����Z�����X������z�^��y���i}�sSe���)e���9����m��\s���iK`����
8��w@;��[{VO5�BH�a�	�����F]C��l��
F��*�?�V�������� X���Gx(�C�^OP��<��l�K��������C��)%��3�����E���\"zo�ND����7��%n[�����@�g���|���]l38���p�5����xO������+&�(&������n����=������+��0Gmh��G���o9L�Dx�������G|���\��O����L��}�����,��}!�f&�����F&��;z�$�%���^�0bY��1�g���<�aI�aV�O1
#9��<�O`��0x���j0|�\`��,Zk	��F���a#�2�3��(P���4�bFF��8)&��5��R�c����`RR���r����G��`�XT�h:��$���zd�A��\��v��HvU�T�v�CR�1�Hl��5��0�����
W)XDX�����M/��u/h�L
P�0�(Kr`�%:O�:U�j�]�	}�����>3_/��^��r*/����Y���a��a4$��1w<CXDX�`q*�O����\3�e�������%�?|I�:�������Jf`/_k��rR��.6"�?���B���+|3�K���8��#1�4%O�:�s������0�Q�5�{�\��dG��#9������1}�Sty}H���'!����b��������7��$,�,G|�����������a�s��Ra�BR�I�0�����BXd`��&��&�����,c���
�y`�`��� ��7�����]���T��(kwj��^����CLm������@��f�Q7of���#pe�tD�&��YV����)�"N�A��y�}���	�S0�O	�S���\,�,s�|>��*yO�#BW���D
~$�J��N����:����O���0@-�5	`
��������i]RS�R����
*[0B)�!*�����R�G���00�fj1m���"#���(a{+�0u>;_p"i3ebbV�&���!.��u�`��9�4�������)����N`��p�|��]n5n��k."#)^�&�5�,��%Gr���xk�w�B��!L�@v4��n��O``�`5#XQ�4<u��\��*
�*�g�4
������bz?�#<�&���r��I�!�"&T�Y>�z��E����4etJ�:
�z�7�LL��kKi������R*�31&�,�6�r��j��!�����6+43����C�U%.��c����L/�,��[6kC��3�@��<�O`����	^���U:N�l�,a�@$�V:�<�q+qG0Tj+c��:�1���_�f�#;�O``�v�I�1u��\��:
�:�����	�&/��~��)�ErL���c�5;�!)���PA��t,�����E���E}6Q6#����3�k��
qx��N����R!���k�yK�#��/�cF���Qr�"��ScN}���a7,��Yg����o2�#�@�S���Mg��s�����)���lN`����OkFg(4��*����`�/0����&>�i�5i���~��f�3�������0e�&e4��������"�zh�v&%O�:�s���������$�K��Gr���u����r��I
�&�:��tj������Q^j�$$``;�l6�~���l�yR ��i\��0U)d�X��iT�����m�
����4�m48�b0����@Nt�%��~L�6�6�S�~��s^��4zTD�y��Y����`|4��(���m'4����s!���!��y`1/�|���8>��6h.ix� ����m�m.����r���� �@��`�H�0��85��D0��1�����@8GD���s���J�	\���op�R ��
&(���&���Y!��x����(���m����Xlp������
�<)��r��)��6��.nX9��A� �6�@�{�Q�1e?������>Y6��'0�`M
���`�OKsGK��q9�����!�6������0�Lq����i�(
������L`��D�EpS���\���r�{_�=�%s���d���%�Y�
�6���`|?�kG�!I0a�q\��AD�:�K8��0�b�V�������g`�q��%&t���~4�V���B
�&��8�
h�,I\���=��D���!�����fH�Q�&+���b��0������a�TL3<2"P���0�?��
�A�Q�s�B��w
����g�s1.&�
�;������r ���A���wB�(���1y��^�v*��p�O�=�> 8=J �\����}�#�r,�c�<�b�%S�5�w��o�3� ��Oa-pQ�����L�?#���,.�������;S��y�9���7���r��.$sRP	W��m	WX8���v��P^f����=.�s
[/Lc����E<���]=I�1���1�V`�kVL'�wZ'�)�/\����'���s��Z��HS#���geU�����|�^C?~�^�p��og���(�V����a��I��g���*�pW�B�[�"Y�`�ZiE�����sl��j���J-m ��4`�����G��AY��ka`~MB?v�Z�>N��x0�Mk����r�uG,���'�;��?�w�����?��_����7�8�t,�,�La�Bop�)��%3�'c��2��g�:�U:g�ERJ��k7q���+h<S���|7��t������h��a��-2z*)��3��_�f�b]��&T���	$��������@��hS)\{��h���(�}T�VR�A%f��Q	:�%1�o��Z���
�L;�2��H�A��r:���n�^c��{]p<�F����?���W���`��;���������PKruwK�!R� part_head/6_3.outUT	?�Z��Zux����[k�0���)����B�Kh
a����vc�=��D���X�]?������K	}��!���?:?��##����������v|����.�O��[�p��$��H����b����|D��������G_D	�n���
K�1��Lx ��%�z���k�>�4J�K����*�E>>]���|�l�J������� �����-�
2A� �X(�V�G���rt�DiO�H�q�d�wJi�}�����7�]F�5��MI��zU��7������2N�A�`�@W�%�r�
M�	>�mq��kA6bCLHE�|E���(I�}I��3�6�RT�.�!7}~L\;/��[N�U�e)�o����	���q�v���U�tM����U&�R ��'���z�.����P�.������>��]�g�j"d��`zia��������k��K�n�e��IV��#�������~Q���S�22i��Wn28;$������H<@R��C��^BC$xG;H�C�
H����f�l�	��C�$��&� @m�tH�I�w��������@b I�e�����d�lCb��y�I�B�>�����m
Iv;��T��	wR��b�0Gr��P�(HL��TI��!�$�S�Q�Mw����
%t�p'DW���i@2-HJr�6��G��.�:��p@,sRW���m@����)�\�p'�>I���P�m�c��t:$J��Y�sRbs0���DY�q
'q�-��iVZ�)�^

��Y���n��I������
�l�����zPKsuwK�]��part_head/10_2.outUT	A�Z��Zux����[�������+�������2�JN��7���qm������ti�{����_�s��$��	��YX%wQ�<$q.���U�������T�������_�x��������W���Znv��Zk�� ��~V�v���kN��n���������aqW�W��ka��D*�U����m6����x�;���jy��7���,���_�<>�^��_VW������j]�������P����7���v�������`_\_�����zUu�|��-7���%�����������z�_m�d�<�������:�n7�������z��;{����n�M?I#�H�<'��T~�yJ�>Oi��3���TC���/�/�Y/�������'����vss]�7���n������_���������`��i�^���UW�no>|���:�STn	W�(k�c�p=Q
��N�-���\�[4Z��*\����Ch�y�H�fGj���q^�o�-���C��8���`�����p����h��	a���O���#ll����������}9���z/���l?��]��n�^�`w����.����/b��[l�+��Q������R
�hcU�0���f�8����m�GgVP���j����s�n~�^m��u���U�k ZJ��g�c�Fa�1���M83��Y	
*����k�t�.LkN�g��k��q����(o�ag�Y�l%�
��h����93���)g�M�p����k8����c%�����G�\�T�8:�pB�c>��e�@��:B{�Y&rt�4?V��]�a\Q��yf�=���O*������)'x�����)'�f����I'U��s�i7�����ta�T�g���o��qG���"���9?�1�������I��������	�����c�1x~+8�����\g���.v��mV���1&� ._s
�_�<�s���.!�"r.sp�=k7�������6��;��M���u��'?��uAw�P���$�D�a�������1��!����J.3�����9� �S�����]fg!�/B5�.1�Fk8��BC�{������o����^c1�R��hi�gh��G�z��un��, H@<��9����r$�2�����+9| �j/���Y���#r�
�c!��9����n_�To�����Xw;�:
��M�����H�2�	-�����_r'�W)NDLD0���:�*�6)
����f��#+��e����r�(�P��
�!+��e	`��m'��Z��X���w�w�O�w���������w�%�z���.��#4f�-�=�5�x�@b0>��������N���1�^.�O��}�2�����9�]}����}E1��G_Vo��c������|�T�^�?�C|d�����&`�a�b�/���3�{x��{�x�i�GC�H�jJ�u�����M��N�O�70x�!f)���d8�z��
�L������&ph���@������1�{����`���j��Z���C
����?:��@�
}��,�y�m���b��7���c��)-��F���}Ex��H"ynL$����o�SG�������[|�_������v�-�Gx�����������DW�w#������8�D�;�EZ����+�����cL���+�p*F�<I���k����7�a���(�`e�W{Y���|�p9d
�WY0���.��g�����Hj��H[	Eg��D��)�mq�8������(�`��&8Fv��	��e������s�_�\��%��))����J��_��[�Ao+�i>0��aa�M�@�X����b��Kn�5*��)���o�j��E����A�w���$��R������)�G��I
xA��"z?R�]��������������L�r�i��5��`,�`��7�-s��E��	� ���n9GX�#';`_���:����"�0'K
�B��CJ�.}����`�5F�����C>� ��������������@�W$X�`q.����)�����t�X�v����^B�0|I|�`)����G�<#�0��6q�����?�
����`jg����"�"oX������R,�,J�x�����D�q�O��GJ0v�z�����q�Crf�����Y���X
����P�[,49�a��2	���o���"(r�&'qH���8`L��,��[V�Y	����E%1*B)st������M�Q���A�D�)nG�F;���q���wA��q^h����X����LX�/����x�r��!Ub�u&����#��X]%!��`L�+�P9����/������.R>UX�X�M�t^�1�Y3�S�\�p)�I��X>nr�}fCO�t`�G
�@�S��E�0�>$g8����bV��%����0�.'�$��A���N�7!C�%S����VAxj�Gp%rT�8����������=���6h3���J����l���d��0��e�x�8/�S�~t��sX}��J/���%.�!�V���a�2�cF��G�������[�CFl��u|�P�#0�
�t�pb���*�j�jS�5<5���`�`U
��o�E��^Q��$%xL��L�)Y�"%xD���%�*��;�OM�~�$8Qi����L[�d���@'X��
�,c�p���h1i5�O�(3W~v%uD�7�����{V��M����y|%L�������v��f=���S`��b��v������b����,P����>��g
��f�&	v��M�l\X��B6�����+3����HH0j.��l�.	�8�^Z��Z+�o�X�X�(��o��J L���A�w��$�R��<R,�=I��tQ��#OX�^��<L�!9F��]��NO�K_��p����CdV6I6'|�)��V@�=��PR�P����5p���`�QZ���\'��<��d�����x_B���g���bF�Xv
�=�h��c
����#�S�~t��s�|����Zz
��v�)n��7'-��\�H��	tf�@G�0�)
�]�����Kp#�]i�3�g����&����x
�39xj��.�&	���g�#IK�$%xdzv�@B��U �)�*9��>$+;*�v��w��$8��4H�8	N3�s�M�=�l!����.�N���`��z� K���e&������]R���Y���V0�{�`����(���`i��v�*�L��`[���l?[�%�.��k���D0$�X!;_�O�����-�L�����4���,D��%:�)�"���=���6�6/[�5<5���`�`[
���`�$O7	��_$��~��"R�"O9	|H�`��c���G�0}
�i\�[Yu�������%���
��*�)`�-��:K�4�������9W��x$tM/j�N��zV��f!px���kR��(�p�p��5�������1�EGh����Xv���l���o$2��$8��q)2.����H�k]<},�)Ra�:,>q��U�wCn"-���@.�S�@���	,�W��:	nj��.�.	���g�#9R�$%xd�	�#K ����<e	���`p�m����$8j=:ICtL��$���hzzj�pZ*	l|�:g)N���e`�GR�\�X-�e.*���H������b�PI"1���Z�[�x�N4�6��O�gd<�����7�(w��`��0������a���nW1�)|P�]�[�L�+��LL��9��}����F:I`b9�f�����9�>g?���^�N��<
��;bL�v���6��I�����b����QJ�N.X��I���`�7�&��x��F_2U\�wgc�F8#	f��8��+p�M;9.�	�"~j���y������:sn�y�������c��]W���,�������m��n����4�V�)����=9q�k����T/d�k��C �o>�5����������l�	:o1<�OD%���D�s��D�*�1D���<��ig�}�^C?~�^�t{�o��x�����t��������%�*��5i��ul6�P��F��%��-�����uni���R��U:�4����:*Kw\-��I��H�����-����Mz�����;a�����S����~Wf�^��6�
�j,9W���#3�C����D��rq`[2���-r��������\3�1�R��h������&^��V���g7�M��q�r-*2��S�����x��-$��R�����rcRg����J���Z�d%�9a��r�d����p�H3�>Xo$��>B{+	mT���N|���CUf����-q�����fn�C�����n�^c��{]q�kh�w/�������������W��X����PKluwKc3*part_head/6_2.outUT	4�Z��Zux����]k�0���+t���B��)���2��1vULb1��b�v��;�c{vRw���Z",)��y|t�����/�o��Dw����,�������u�/����a�Y�(�U��]���(zpK��q6A�`�� B�m�%�)�8����jI�$[�������}pq���W�]�
|x�0��0�������:�u$��%�)��>I���$5�\"�H��G���Rtq.�H<7�6���������`�Xr�w��H�5/��*��,6RtU�w�`#�%��6���<�R��6ZK����(S��v�	
k�!3A�^����|�A��/�G�m�(K]�Bn�x�8t>���k�]���)JbT���

[]/D���6m
6���?��V��.���x�F������KB���N��4���;���%�Y��iq������-X��I���S7��t�"�b��L���U1����Y{����L�L�m��[L��� ��#f� )Ce$k�a�
<���!�$��$�&����P�<$I ���:����!�$��$�-���$a{�;HwVk,��!15@:�lA�]HR9�"IZK����S������Sr�3PR-J����p~���1=II�<� ))�@�2i�t2$����;� ��!�W���4�P+���A�� �$���$7�jI0�s�Io;FJ�LIl(���� �n$i�1�q'���$^�w"�6���tH��_g���c�W'��J��
�_��*\7���J�g�����A��O�M:B7��"���"����G����'��_PKquwKU�(�part_head/10_1.outUT	=�Z��Zux���������������;1��M����N���t����X��w�E����w$(�eJ��d���"���a�X������W��]�z����G~����i{R�\�/�Eq5_m����FqA%�y������sN��7�[x��W��v7�+�����p\9�('$�����V���s��	���w�������6��}�&����}�����������j�����[,����������m��)o��yY�����}r}������Y�z��b3_�����Z�����gw���v�Z�e-��-77�r�i]{�]-�����������gE�f������K$�;�I�,��O���?Y�?��x{�j�_�V�z���_~���������������}_���������������Gp��^,gw������j�����u�n�-�0E���pE���;f�#��+Z�-��0�XB]�F���T��H��R������"��-�����}3���k����hq���xO-�N�*j���@;!����p�6����~��e���T�W�������xu7[.K�;y�}/g�����q`_�����A
b�a�j)a��*a�T��������QJ����3�������>�����\�)�����^��U�UC#�36�3J�qg���v�p���pQs���;����1��a���sF����L1v��E��J������
��iX1��]�i��o���1�a6n8@D�}��:���T�8:�p�6�|�-����\�:���I&rp�4?���m�$-9����7��TR&�	7e�n�$?��|�M�+Pv�M���r!=�O�n��?`��0EC��[5�p���wf�;e5'���
�t����t�����������8g����c��;�1�vA���<����e��-��Z,����10�h������x����bgz��=j?W5��_��`JW��
|�Ux��y��Gw_�����r��\
������12�C,�^UP��XKH����0�J����j��3����/��S����8����F����1X%��DQx
�S���`m��t��(a��@|���4���w�� �D(9ESZrX_,��� N�3rD��A���q.��������K����e�������f[�/��=�A9{�q�C���&T�Fs�����S����N�_|��_��0 ��rF9?oZ��*,q]�Kl��v����X3JDvm�>�XZ���lx�&d'va�����w������|�,V��a���bN�����{�R;������Vs��rj8!�H����3���j�T���	����n[�������h�g��O�s���(�:������_����_������k�/|�l���?@o������n�N������kX�����6�|L�@�jL2u?���D������y*"�g�|(�2#.���1����z,�<U;3����3WD`�����;����.NNzB4�oa�����f�|W,n~y�L�r���a?Q��>�y�k��<P7B�?a����C��1-��3�����E���R"zoXMD����o�SK������������O^�{��a~;[gp$<��k��MAO���p^)fG��#�Gg:���d��T�):��>$!�J�QL�;6}�_h
h�3�/��/?�x.�0Ea�'�z��`!������	�����&�A��GYz���2I����L#{rX6O��.��/�<�<���CJH�
#9��<�O`��0c.r<�,}3��1Xa���Zsn�#"�?���+���A9c�$Kl	�Oa(c�<�4�������U3����"%���:�s0�0�������x[S�'�����v�����z:��^$��UgJ�1;�!)���#�X��`���@tb8M(�0W)XDX�
"S�O�����o�0�/�a���g��)������03��_�����>+\2Q�	v*��3�F���a�e�3���Cv<CXDX�`q*�������K�+�!g���`��ePM����,y��Jo�0g.[� n��gPlT��P;�E��Y,���M	�c'���\,�,r�x�zO��;?Mx?�1�{:�"9"�2���CRF������6�	��s��3M4�6%�(�#�A�`�������4)�1���=k@j��R;�"+���l�}�l[5-�4��=�N�MZ��/�2��Qt!�0c<^0�0ArA��u^��)���,O`���v	�������T��BR���B��
�M��d�-�}�W�����X��.���0������5�	,c,{�&���I	�c'���\,�,s�|<�I��$�
�1R�#9X<x�<���CRF��V�$D�5`;�gB
����``u�7�\�#�!_"�>R*S�b��9�E$m�	8�'XT��a"����`�5F`�+�2�L�`�`CB�'��X��,Cnx�:�j����ZV���h�e��dm~��
V�9#�(����TM��m	��@x�<L��|Z@�%�?�)Q	dM����M$�����7������\�r.VQV�X=��H��d�Nn��#9x ��^$_�'}H�I	;������Ia&���kr�MM�:
���o�����LC�H�����`�`���l��s:m�� �����
�*v��`�h�}(���X8���mVzFV��
������#���>��G
��{��KirJ\��l�/0-��`���gLf�3���0#!l��@���-��L�������LM��:�u��Y)�q��GR;�u.�``���#�z�����$�h�������x�>$%�p���7X�`��J��#������l�l����u�	�����9�`.1XR
xj�m=m���&s%6D�L�y6�I��X�W4UC@e.�&�	.����c�a������NL���j����Z6���hXOw��r-�v�'>�U��l��x�qM�	�e�]w�������wpmt�*�~�=_.lbl���9���I	�c'���\l�lr�y����$�	���E'��\����`-��}HRv�~����"h�NXx�5U*�~[X�Q�G|��V��Ok������:6>Rj9;n�K��=1GgBx.��Z���K���\Fd	����F���7;��;�\m��X�{_-C
������T�-����K<����l�	F�+d����l|��MYv"���a`�q��Q�``{�CQX�%����:�s������0x���{?��Y��MQt"R��m�5;�!Is����;5( �|R%Z6U �:�|t�:L����������L�wqH���
J���)TK&��H
��e�a���k0O�I��U	'��P�6ks�:�.nX�����d�|�U;���2�;��G�f���c@��@�\f���a�,s6����W��^�r�	w��#������&d9K3iM���<��E�Mpc'���\���r�{�����9��G.�	�H��*w��I�~����Mp�#!W��p\R�9(z=V%�f��SsA��s���i��2��L@Y�4�%B�����1���U <W�@��{���
Sc����(pS��b}���pX��6�������^�q=��;�x� ��L����4g*D%���y�c�U�j�Kq��	�D����T&�����\���\	q��4z"=+|�u�#1F�v���N��Qn����b�G�q}�%�r����&L�@_2V\s|gm�F8#	B^x<"4NW�"�v��/��E�|">v|g�����o�|L�a�|7_n7��dN
*��l;�-�
��?lf���Bsk��Cy��������el����QU�*�J'�H'J��@x�NT��N�gxE�_&=&����fh�|������T�~����cYZ��].a$X.~��Em����rx�n����U���7�/���GW�0��(�h]KZ���2�5Il�%-Rk��,P�R<���w�ki,�4��V
�8����^[����\��]K��i��T��&��y�����%7����y����M��2��O�������S-D��z������/�D�b��eFJzc4����2��HD�j5�XG���!�B/���&�#$�
��_E�US�'��tN��O��hx��(,2z*(�a#�j�4[�s��T��%��PV��Q�����eE�x��K'|����R��d3�>XgA��>�v+���3�>s(��$�U�u����W� 1���$F
���)v=*H|u7[.1����.��`%S�o�_�R�w^I����H9�p��+Q�/~����PKnuwK{�|���spart_head/21_4.outUT	8�Z��Zux����[o����)�m��!H^�z'�l/� M�(�$h�A,D����M��{H�M��F�f���K(����O�xxH��d���_�|�w�����1��<{������f�I��������EH�OV�?��,�cv��>�"y>���M��fv��3����`�������|�����g��/L�O����e�Hcd�#C��G���=�n�y��/���|z��d}w{;��+��,�7iT��ts��:K6������tz����%�Y����|yUq���-�������������U�y�Ik&������
4�V����j�iKY��!�Z�[--+�(N��<<R��)7��JT��b ���U���Q��~��]~�H%���<�\��m�51�����6S����l�V1��Y�C��r�%]����b��9��o���-.�w�����t��a�[{&���zg�D���Q���`%a��b�0�L�������6����5�8����2k$�A'�bR�}�	i����=�i�m=��B�I�Y�/�t�/f��x������1������[)RO�/�O���*y��%�Xlf����E�9��	.Y�o�]}I*&�L���������nW����\�w�@)8B���p�������J|gw�T+���Qc�i�)'�q������2�4U���m��.���B��
�Xa��H����@�(	L+}�
�����MWQ���k������U�5W�:�rRXb(T�������O�M��+�R���-�Z}���q1�HBq��A��W��r�z>gd>Y��R����<�����>�7���|���s�s�d���_���A�[3��������mW�J��������������"Y.��l��6���h�wJ(%DZ����'?A��c�`�$���
y_�� 5H_	��m�J/�����9u(��ry��;�q������0Q����2Lp������Yk3\!�i�Y!��\H�"[-h��:�����n��K(�5�>���n�sw?�=�����sz�`��b���~rK���}���2�Ub�����W*_�RC������J"�����Y�.�)Pt�Oj����������c_������[[G�%�D�������&�@��F>-�3�C#�0',�%ab�
����	�Ef�l����!!_��,��Z��=g��������$`�m�N�K���*.6���-����.�T1e!��@�@}^��h����W2*��f�B����^Q�� o3J)�����EG�jb��"����+�jz��WrhG_�Qj=�B6kmN)u���[��nO*�V~w��O���V�Q�45����XFQ��3"@����Gt���6�����5an�@]p�����v�-�>���Qj��#Q��M}��������O5t@�9J�\��r�bH�L~�A�;![������HzW0�V:9�}0zT�����h^B�����A[�;�����#r'��DX��o)��U�>��1�/j��A_��/�}�������z|��
�k�;����wKJQ�w�O
��h
�Q�����l%������y�\T���b�8���:.���L=���y������;������������k��AO�J}���f���a��wd
�r4�����vq���W���-v�JP����,���)�O����}���A}����0��vO�HC�v�w�l� 7	�F�
}�9�������c,G����j���>�����������l��{���*���F���A_�}U��:����[[������.W�f��-�=#�q�zZ�W��������s�y�Av�iu��!��w�C�Zp�SV{��G���V�k���~��u�i8T�2������N�����>Hb���7�dl���$FD}]�����F}}$�S52�������KHK�S���n}����-���'E}\M}���r}�D�#�����l����������S���p���������N���
�;t��\w�Sy�K�SYR_�a�oD}S�����F}s���1�R�zk����J}�����>w����~jx`x_��H�XU,���\�	V����l��[�?$����4�����n��kY�{�e;�����C�GbwGx�������Bz����T��ok���Q���o�ulS�|V=zgn-��9)��������KF�������*�N��/-qA�"[2F����uW�C�'d�
���}���q�m������_��������r�������-�������l��x������P��|2JTt�v�����o�a��j���Q��[�s2�����Bl��5����i<�x������O���m������V���`�T1��5�.c�2 �-�[��7�
�gOZo���5>���?���v{y=]�Qmx����.�y,<?	��cb��$�O�]x/N���������$�O�]x�N���������$�O�]xoN���������$�O�]xFkz����W��^L7����:|,�f>��{���z[h�@%W�	o����u������O*�]���zE?^�����I���z[H�q��[��}�e*Gn��i��p��/]�u����I�/���	��'���2�k7��h�����vB��i�!0R��������O�m����(�����Ro	��eBa\q��I��t,�!~������������X���5o
��+�|���/N��N{h/�������g|��7J[q������W��S�7�}
�����X�����{��
��B���L����P �&��}X�2���������N������!�)`���]�o�+�I�	}���X����aT��z��pp�8�����5����LaF�W��~�����������������~�1��E��������N����PKmuwKXH�)part_head/6_1.outUT	5�Z��Zux����Mk�@���s�}���&�6-�P��Rz
��RYr-����Y�T��R��%^^��~g�����u���7_�����+.�N��_�`8K�l"-�����6�C:a����r"���,��2�
'�r�s�����%Q���{4@�^^|�q�����&\Yx��U��U���V��e��H2SK*[�����6{��\�H��G��nR���8�c����
�x�<8���.�d-9�;�l����}'1��H�m�N�m��d�^�����DiT6�E��2T����+O���:n�vr�u��U/���DQ��x>��YC��x~�x�8�Y��O�~=Gg�3Hb�����KM\�H�k�$�K��M7��,/[i��$F��aT	��`��K��s1OG0����v����Y@�x�����i��h}�L�lk3<%q�+s���<�
1J(����z�w�����\�L������N�� ��+��� i������a�*F+H���!I� �n$1�I�M��L�A���G�I��3��!�$��$-��$�z��v!C�S5$������Z�T��U$)�hO$I������M$�s$��nA�{�NIbY]���3I��p@���VIgHGC2-Hf��c��*�Y�W���b�������<Ww'�d[�l�V�"�$�UO�s�8�(CP�9����	 �$��$��Jw����A�����j�@2gH�C�;M�6K�nJ�!*Lx���
b����5E�=����~/���i�r
�t7��,�<�}7D��x�������h0�PKouwK,fq�T,part_head/7_4.outUT	:�Z��Zux���ZYo�F~�����@M���F�i�4���N��O#�RE����;�[E����Z��������������������}�����!�#v������$������A1w�l6�����f�s�'�8d�p��x�.,m<�)T[F���>O��s�/{'�(�����|zqw7���,���HM�jS���G0��1���P��r�[�p���$G?����-��������C�^��e�}F_��$��h|��&�S����&i���0N�Y��}�$�I�f���a6�f�}4�[G�pJ�������p�'EE���6�Ax1����D��}�I��1�n�e8O��%���������i�B�����f�XMp��m��I���������Y��FI/)�hL���$�:P��0p�1ZK��gF1r ��#-(�#�C
��<|Is.����� ��W}'z����Ie6:69LH��0��0&69���?�k�/a:d�N��RI%��2P��W�tk�������\#Q�m(��id1rH?M"kQ�&����`�,���tR?�����7�(�ft��q��t��������|wuAz�y:��vq�3;�M��zw}q�,8a�_��sWt�qQ�u��x��I��F����"1�c�?�S��(����8��
�4�/R/��Y
C���r
����[Y��=A��X+�t>�"��Q���z�������4Jn+�	1%%�|������Q�����G�&��!�BW\r(�S�{��p4�F�c��86IX���dbX^n�s\>P�MB����p��XP���K�lpN�r�?R�<&��Z�c�L���z�V0�
���pe�������-m�q���B{S�����X����4�ZbY
t�Te���g��0�d��E�Y��n�����=�\�!(T��B��&E�+*��LmT�� ��hw-�"�[r��T�	�
����v��	���W�N�j�^X�>A��N�2"�P�� 9��5��!�1B`�k2>F(�u�@)\\(�kA�U����J�L�k�gMdC�7Y�3#�0������W�/�qC�h�=
���k�;/���oB*�yeT*���y!������:�h
�S����P58D��('e�Ts^1Z���Z�Q|���e����K���a)o~f�a���v�S�E�8����	�����vI���O"��q(�Y�r%�Os��mI���c��AX�|{���+y��>Y1��o�F��6����Sy�2�]r����qr������yvS�����H
��E��T�Ral�/�#,P1-L ����~������������6`�k-lp��7P��0s����:I"����� �huDs=�7�����X{	T��~�`�x��{��V/
�)vO%�I��"��@I�<������:��y�����8��������/%�`)n�f����G�}���3������.�c��U�.��r~�=`%�}m-%�e��.���+P��*%���Z�T�Q�ul�T�[P��@���(+��-{�*;��noIS��P��P�!G��U�B5�K�*w���|r���UwA�-���@�~�+z�����.]��{���P���2�����v�j[LW}�*)�j�?W]�h)��U��*	�wV�ub��J�����  0���n[(.�Q���j�)+g��~B&{	�^����?�z�d
5����4���Y
�K-�4N��������]UnUua^u5K���]����'����N<�n+�b������U��X����������8x7���_�qZ_:@���V1V���l�X��RI��|�����1�};�qz�^�G�y���W�k�)s��q����PKruwK-�s���spart_head/21_2.outUT	@�Z��Zux����[o�8���)�6�bK���`3@�ig/���b��'���Q����N��~�;u��Xr65_�����������E��������o/�Mq����,g����bE�������QnW��h��c{�������g����|���X
�R��%$����2J��z}���g��/~������n�J�����I�7�G�����~�{������^|��h{w�\������qp�/��f}}��w/�E7��n���K|����q��eq����b5_.�G?o��w��>m�O�]\3J&�>&E�����Jmr��lN�oEbz��E��)��%�{[*X�R	D���\P�Z�J���
*�~���\����<�������Y���~��E��%I�5�R�����k��o����|���������WW7{����~��-����	%�������M@��v�����@6e��I������[�&gL�Y# ��K4��\&���I�@n���u���K�6=���1��bf1���r��^_L[)R/����]|�Eo�.W�E���b�9'�LM$�|S��M�
�t�Id���!����j�I��alC��a�r��I�ic���6�o�"Vd-�XZj�-u���-e�bB!�N)��������\}�D�j�kl0�a(��P��3h������ww��:h1�Y��(��[i����(��2���Fjr�����7����a��P�����x[��o�`��,���3�� ����j�B=[Y����1�5������A�����b��7���>n�r�����oG��Ao�{3��������V����vi4_.�%�������*Z���b/v���h�\�b�M\���L����x��!S���E���N0R��JR
�f��*^���]P��������s��p�P����Z7?.Rep4
��
~^M{��n�iV�����&Y�|��uzy�9�m1�PZk�>���0rwe�F��v�)>�����#�����-����?�sS��i^`�lG��n�+��Y���D�zF1����^�3+-�)���`��*�g5�W>�3q?;�5;88�����u�[\����h���9w���B��Z�R��B>p=6�
�<a^-cKm�W`0��n����.O���eq���|�9S�~���k$�qp;n�vB_\k&kU��:�7�����[�1������)��j��,���b��'�B_j���[[u{\�����@)eN.�@��,DN}C��W�����������h���T��f��P�m���!�]��U�~s��OJ}�dY��V�s��t/^��
��`��=�I}�Q�Q_h��P��/5:�4����2l�)r:�����5�����k��G�>LM}��������P��cy�<���J�&�Y &��A��i���D'�����������8-�r�Y�|����P�s3`��i�����A@�:�Gnj��Dsl���������q����^�+����'��N�,%�q�O}Q�����}q$�e�������;@����1A�]��pu��-�;���
�h���Q@l�'�Rxk�/�?
E}J?��.�b�<������s������n�� R�Q{�����,�ok'��������G�������I�;�}9�e
��X��lb�+om
��(%�t��O��r��)3Y���P=�/����Z�R��+&#Q�g����<&��G���V�+����~��u'b�@k
�g����p�����������y�A_�;�����R�@_���&���A_�BN}���'��H�'�eG�/�	��Mi+����+���g�jI}F\X�WKd��6�sUZ��U��z�jpR_�I��=h������w\R�1�W�Q{���T����4E�S�O}��s��;�����>�t#���Q_OF}]��>������[[�1r��)�pz�}��	��H�>�c������$���[��Z �
���Q�l�{������R<�<f�3����N�����r�;���5�Oz���{�/�G8�%zV�Y�����&�������t��P���o&���Q�+�o������&�g[�	�e����c����v����;�Ke�����@Nl3��.T�1ex��6�o=h����{�`����=�I}�$�"��hAl����;�b}F�*Q��q���m��v2������m�����G���Z�}�/�X�+�c���"h�}C��J�p*�N�/-��.�j���f{��j0>�O��GM�
��r����6�������oz��p��}����Y���u�%��~�X}�_~0��B��
{�Q�g��)s����m��I�5��-��
)��9v/�@l�9k��L+������>p��G��C��*�In;�\�����b��k0�=�M������TD\W�����Y��i�����x���p~�����|s�[�����Lw	�B��Ix_���<������/B��Ix_���2^������B��Ix_���:^�����oB��Ix_���6����������'�}���Q��|�X�r��_���Y�u"��
�%������+��fIu+��M?~��raW���^���^�>��mR�^oK	#*J��'x�����v�[a������jq�e��Kd�o����&��v�
@��>���ClHJ���c`$��������O�m���
G�f�Y�^)n	��eB��a�$~g9��<|��6����'�;����k&x�5�V�k������������e��9w ��3�a���8����X����Sn<��+�N�w�ci��G�x�G��q�pM)�I��r,�M�
�	zj�,U���N�~w9��6<���� ���������I�>���r,�Q��0*�z�|��/�I*F���~g�H3��5�����<2�hI���Y��K|u���?����%R����Qg'�����PKouwK"�xG\,part_head/7_3.outUT	9�Z��Zux���Z�n�6��S���<��@��]�4�����b!��J�����S���,���&��H���	���>���}���>��:9;�q����(Ch0���e�e�������d�5���k4�����A0���e�Cx��2+�1.���2�Ng���8w��7QL�o!z�L�����$�����Z(N���a�p�	�1�6~�G�O�(�8�����'�j:��Y�d�Oa��8?���nL�8�������m:�fn�)�i<�a6����6�������8L��0�[G8��cI=�Pf:R3���j!����}����$��pt���W�<�[�8J>�"@�V�:��ze��c��	�	L5�4ILe��4����`�Y�q�dk���
���i]0B�2BZ�%�ZM(��V[��f����i@y�P�/�hx��\z>L(� ���N�z��y������=�V�^��^�L��x��@��s��[�����(.UR
n����y�����F

��L*��Rl������H?rH�Lc kQ���(�����)\�VW{����&�dag����/.o�������%����p$����W4�L��zwsy]/8E���sW����n���x�n@���#��i���T��h�8��)�TM�U
s�����UxjI�����_�\e�#�F1i��k���g3/s7�Da���b�����Y�J�	v���Z���{��R�%)Hfb�d��
�
Q��e�&���F���A�&�p�>�_��a�i�j&�@6`g�mk-�S��4��K��p��M�v��%�6�V�
?��yP���X��_��(;����`,[D	�*M�!�[�W��X������T�{������ ���_���VY�� �tR,��s�f�9��� �� �c� ��Ep��)a��rM.U`�Yz3�+6e�����2��}B�B[l�����$��������yL;z�D���V���+�D
c]��]��XZY��gS�����w��Wco��:]�ti���*F0!�\�K�oA�U�0�W1�	���(\iX��G#4�\�[�M�J_m� $w�0P����q���Aa{/���obPu��$��]%h�9��A�������r�r?o(�^t8�I�)U���������K�bRm1�
����0K1jq�*�����_����4�����k'�:L3����5LlE6��K7
�2�=��[�� �����`-2V�J=���j�s��Q+w7����vw�:�g|Wf��G��g�<��;�o����g��k���������PS��iqGC8�R�r��^��gp��Pt��o���B�g,�I��\��i�N��j��A!e$�
8TM��������+
�/�����3�v{W2a�-�TAm�1`�!�H�G���o��?�����M���;tX�z�\B>cq���@=Bth��������&g�t�%V�Z,�6�,xqj��������!������5THg�Cuos�cv��������Tf;���H��F�h�W\�u���dT��:E�U�
UW��r��v��{�������CUmPE��P�u\%�#sp��MW�j��U���|�"�jZu���KH�&��B��@6��eVJ9O��Vu�R��*ECYiW��� �4`�Rq9�b�����Y�����DB
�I�Y��~_��U2���;�R�l]d�lh/��v� P\��,���v�6�0��.�S�%������������Z��b��j�z�� �[����+�FIb)�.j��jTa �:7����7��h����M���1?J��bI��J��BG�Cw��`�Z����~��\�-e-8`&w�������PKvuwKM�j��spart_head/21_3.outUT	H�Z��Zux����[o�8���)�6�bK�zH��N;{����>�Dh�:v�Vv���{H���������:
-�<:��(��������{�n��Eg/�`9��.nI=�Zo�������h��c{��?�������g���~����m|�,���R|P4kFi�\���������Q�q�9�%��\R��K��_���sw��U��O��G?���������n��7d;[�o���/qr��>�����w�M<�������7�u�|~Y6q��Y��������������M�i��
�Y�d5�dA��:3aZ���������^��wZZv�U�jOyzI�����=�-�fBR):�J���-*�~���\HJ�U�B��'7��n�&F�6s=�f�8�9$3�Xm�����7���6z���V1�#��������f��;[��7�{����b{Z�<�ZJ����Q,����������v��mZ�3]��h�V��y�J1'-{�s��x8�Lwyx��n��!_�;�N��A_\���/���X���X
9���/���I|�Eo�.W�"��z����[.��yR�M��7�"�M�Y��&�����w����\�w������c����
�6��m}��S V�=�h LaZz�� 7�Fp��7�U�:��;\��4���a�[:,��%�`s��6�������ww��:�1�y���[Y�Y{����3\����e��:��D���	W@�|�b;�o\��� *_���\�!�Y|�_�W��gKF����:FL}�.�%�|>�U�P�Y,�x�����+w����~d�z�8��D�p�����
�������B��r/�E~�>^�W�z-�x���3�Y�2�P�l�Z� ���'�;|�l�:��������Q|%-�l�M|/��k��l���-����#\7/��1�z~h�����a�K�Q��8�j6��Z���v���G���kQ�6�lD�^�_}��-�������y����UiD�n�����N�="������������:����(!1���� �g���U7���D���/�p�@'�����oF�������x�1-�����|K�D��=�w�."��m�(��]��]!�	=6�
�y��Z� �:�����fy����O�[�F>�q[<�t"�{�4�?|x�
8�sf���#�}e��i�o����������rJ,T��G����'��g����I��4���[[w{���"��(�������$<�"�U�����GH}Pc�J���RZ�F�S�S�vR4��qO��Zm����!e_�[`�R_x�4����:�#8��0��QG����K�Q_�������`�U�B}�~�����P;N�P_4�/&��hP_��lj�Kom#���pQ�������4F:FC���j���<>�3u�����iv'�q�1Fwgw$���5�����.��B_�=����N�K�����,�1�����l~�N�4�#8�a;�/�"�Y��(C��%�q�O}������}y$���;-������a������q��������F�W��@k����6���j%��Dw��QL�Ws�����T����;�C�����i�w�/��.���%��QG��Z�\����qb������n����di^B��I�;�}5�U��X��|b����}���$�9��yc|zGWN�"�)3Y���P=�/����Z�R��S���2�E�����}�&�>x��$���>j�-����#���w�DvW�	}8s)�G�*�L	}0	��}����>�RM}��m&��H�'�Uw�O��Ct}P��>�_������FR���J�!���
7�l�W�
G=e�wR_�I��=hZ������wP
�3���S���C���{�v��.�R�[��0[=f:�I����A}=�u���H��01�������G���{���q�
M�6H}n��!!���xl�SFX����%b�K�vn�JA�3�Y��P��t��L|f�x�4�?|x��M�
HK�V���N���n������RB���D�kUn�
E����M��f2��������LK}��m�~���R_����gx0n��c��{;�#����d�%Y����p���gx�j�!����1r�����i����u��Mj����0�������/�o�;������qt�)8C��U�o���Q�6�o�ul&>���3����qSb_���;���
+[`�P�}�xU�O��,�:�����Q�����F�o��j�QSq{a\|1��M��<������E�����HC�,���:���p�]�>/?�yt�]�/J{�jO	�B7�2w)!��n�f�v]������#�!��������m>g�c �i�2�?x�2n��K&=�������*���[�*Y������H��&���n���]���J�d��o�b?�����������@��������������<u�E(�8	��S^������<u�U(�:	��SB��$�/O]x
�O�����7���$�/O]x
oO�����g����'�}���U��<Y�W����Rh�s�|��:Z������2�����=L���/7���T������~�|���!d��r�����0so���]�+��Q��3a����_��p����Y�/����'���;���v�
��|�G��bCZj��#���o�E��~r=�z��nx8��K���J�pK`��
��
�'�{�����eD�-�����;�q��K|�fB�[�|�����mvq���������*|�\�����3�a��V�V��r,�!|�T�����I��r,�u����4��)%���-������p�`����*cu������r,�mx]�;�:@����"�$���y�_��=�F�v����\,�8f�sj�4'�{KO����qN?7���B�n�g��/���O����,�u���������N��gg�PKpuwK�d "b,part_head/7_2.outUT	<�Z��Zux���Z[o�6~���[`!x9�K�tk�ni��+�=�#$B������w(��l����h&�Y$�s�H��y������������O"���)�r��g%!��QQ�r�-hK�P��2K&���)'�������#r��i2$ev��j�Q--�q����h4������sB^ey2�����d4���L���L��7�%�f���b�a�.aT��?��'$�4�����'��fzx������<|�5M&�NN��K�_�<�������Uq����G�5�(��~������p�o�<=j��y�:)o�	y�NnV.��T�@0.l� (7"���t�a�3������&y�^�Xu�y2����xJ�!�2D�R����e�FR+6�*�	4q��
�dM�Z��{v�5�G�r#����1aL����,U�*F0�#�4X��aT@�7GZP�����4������	n�����^v�~3���h3��f
6�Ll��Rl���&�1�f3���%�-�}��Q%H�������@�&��ST��������`�FF��Q�d�k�hr�"�_�)��^5W7����U6,�	�}x��=����������\���n��9���6&��zsyv1����yI����c�����#�K�%�D���^��_��A+�~�N�s���c�0�8�i��F������R!%��X`���?�j���c����xd�g�,��"^�xwO��x����h��5���Dd��87F��d$���TG�`1���4�(�"2��p����y?Hr2���	K�-PQn�s�{	1�J�/�����f��<G����c��u�

A�[��n��\����X���k��e��X���6����G��h^�6�e��O]N�]B.
BqM��6��u!�C��fo$�\<uA(�A�h����%��� l����5I�})���k��e������V�]B-	��~�����%T� .R(�[1���&nv.��_0j���=���5������2�}6�?� P;wMf9F�v����5)��
hb�x��`R�]�.[������}��y���S,�g1�<�a8�f��p]��&}u��P��
���G�Jl�iZ5~|A0���:���V�,�����$!0y��$��z����m����;��p���)e,2Z������X\&���B��tp���'FZ��_$��6�����P$7)^�L�w��I�H��v��f[���e|��I
1|�3pmS��2��
[�����z3���G!c����@��[�
|����^��2�M~�~	��Y~C��/W�v���j(�xy^e4*k��3������j����{W
uK��A�N`�o�/lp��CP\��������H�����<�`��4�i�����I��c������Ao�:��Bj�n�m�$��RL�nHkp���q���n�����iQa���>��A���kW�W*�-���X�����:�TY�Z��c��1�a���}0�X
�?R��b������b	��fP1�]�B�9�����*ZP�����/��y��;T��W<�%M�z���8A��{������AR���\ul�PuThA���O	�T�Z�;T������tTSAeT;�w��SW�
`���<T�`����N�b�3�JW������.��AYU��[���w���9����Zq1����m�8�O��$��7]h������&�Z�C�����[WY36t�Zr;N0L���OC�f0��.���KVU���������^��/��^�x,��V`a���bB�*iP,���69�����8�nt��[�%��������?�c�$��db�$�^!�${t'DJ��$��y�%L+U_i�?���=���PKluwK����[,part_head/7_1.outUT	4�Z��Zux���Z[o�F~������z0����8m�M7����>�L�DdJ��m�_�g8�I�h������Ps���B2v����o�~a�_\�&��N_�8b����`�x2[���)�������������9��������	;N&�*��"�O��w���y=�z�t6����#���������4�#eo�����v��&E��UF
nlu���l���>q����_�X<�����,�fz|C�����8~�5M�����KA?L�<������zy������P,��~N����(q�?fyz��9��mR���!]��"\z-��H��n�p����!\�����g���b�>N�<Oo�����}��'w�K%�A!�)L[�Q��J�p�0�p�]�	9:���&�E�_-�M�|�-����$��H��#l#��##��^T��(�0#��_���)�!��m�M>/K.}H�g������N�z�������:�F�:G!���������6�3Y��gY^��V��u��A�n��H�#�^�$��`+�F����q�����D����7)�����r��������K:��E�������_]\�}s��w�����'w���]\~�����]]\�N�����9�+z���h�:��"^�+������ub���<|F��AS,�����P��r�hu^���7iI���y-�����A=�u�N�/W�y�y�M�t�������/��4�����'e�S"�d�����"���GR4EI����5�<J��DF�2�N�)����>M���r�2a�SM��y��LrE�:�������5����oI��{}*���n�A!�g�7q�9<���'����+��n�qW�h����p�X�Y/]���!7��p�kV��!A���Uo� ���.�G��xXj�"�?����tM�����Z�We���o/��}Bo	�z^��)�5��[�����]���l����5�CrM������"��rn������w�d�c����C�)4[��3�
��J�pA�����-�q��T8�M�
=������ ��������B��"���5y���B�{/���oB��Em��P	jB��R!�$�K���A?��/
u�CV�jRwJ��-EO{�	M�r�lqO-^�&��b���2��I1�K�����y����w��I�L���=�k5L=�l�/OI��S�'��5�<)��g��A`���!��d$��� �����v`h��l>��]Z����~���_"�W�,�e�����v��j$��������4��
�z�Ee��;!N����xW"
UKpY������Z=q��C���3K�k��:)"V�]�*)2%Aj��F����I�P�+������w
��^�F���cIg����Pq%�Iy��o��?Y-�(�0���>�>�������y.U�v��Rj����� ���w6���{���
��`�����`-	��E���@��8T���B��������j�8���C@�}P���`6@[(��CU=�J�w�i�����<8�CC��Pm�"��*�p��M����AU��A
)axe���u����2��fj0+C��w��P]��v�p��RW=����t��A�
@i�����6K��������%~�RK�5��Zq3�BR	�e��~�O��`"!1��������bc�L!�j8�"�xt�����������4�J)���������)U�Xu������W{m|B�^v�}>�\�����Y��	m��!��O����n�� ����������e�E���Y�Xk%�d�$������0���1�����/���/�dU�b��
��^�����e�����PKtuwKB��5�!�part_head/10_4.outUT	C�Z��Zux����m���������7�^yQ@�}*��M��������J��K��vT;#��r�����$(�5�(�9�UV��t7�n6����������+>���?G~�������x�|\���f����c��<g������������/�v��A�.nf��~�P�������V�[�X���������[�����wO��m1�ow���������~}|m5{\|]�l��77��b�[<�(����_v�����i��/�+n�_�����g�y��h}�n������kl������O�����l�\���x�����b�m]{�_������=������M�'� S����v\=���7�i!��U{��Z��_�c��m�_���������~}w[��Oo�+���m9�Y<�7(_���;�	���������O������O����n�@h�,8�1�xa��jJ��eQ
���qWk�5�GU��H��WR����]iE�1[R�K����~����|����{�hq�FN0�U��^J/��h=?f�]X���w���w����`����u����l���j�v�N��l�����"
���f��A4HIC,p���F)�D=YYi������L9�#l�w����3��z���O���|�Q��{��B�o���Y����^�C���"��;��������.5S��w�(T�qg����"z��;�B�0��A�3�,{�Yk&dyg+%)?8���;w��e8��������7i���Y�M;�z>�p�6�|�-����\�:�v�I&rt�4?����m��d��yf�L���*+�=��b�Mq������_��g�T����Z���Q7f�1��d��3����Q����}g�2Dn�t5�*r��\N���.�YN��S�
�7����:6^�y�����}����O���g��������q�����
�.-�g��pc<������C^������\�����z��t���!���Z��,�F�}t������3ct5H8A����ZB:&&�0�(gu�hm6����XK
DExS
�pEw�������p�/�������V5��wOO����8C�f��7L�\�cGS3K�d[� ���?�M2���w�!9� %�g�'�LUZrDO�C�#�A��B'\�,-9��cD�<��Z9E_Zrd��,����!��Lc�Fz��b�[����{��!���x���p���>�6*N����;�|�%?`X�i%z
����I���F�%+��
���<^82	D
���Q@x,HFE5������4��ip)��P��V�(f�����������X����O�9jtPs�8%�2���AvNA��������H5�F P���U;��G���x�rA\>�$���m�(�����W�{�F�7�2t�u���?�?�M�}�_��o���������&b�~Wvb����:�#�z����E������hk9$���!S���+t�7���I����w�*�������r�	�����Eh�
d
V�t{��)�CNF���1�����-N�T&��.W����/?63A0���C8����vvx�������4vx���E�i�\m�.O�U���D��LD����o�SK�f��oV��0�����������l���@2�)�#��@�%�������>�H���o)�u��\HB�E��t���h�'�W�a�K�"�B~��k�\�k�E
J�7��^/�R�N
�7�M
"��/�� .�.�2��x�>����P���gye�0W�_��/��^�a���P��"d��"�y���_�R�-2��3�����O�>[����
])��2�/�JifmxB��IV����'��AH��4�M$�B��"������L��c'���\�Q��\�/���Rz��;$�HF��#/�����I���5sA��Np�N�����A�``y�7�\\A���?�<���EKV��7%�r&C�����%���0�*�E���Rd���K�q�yY�B������,�|\A��=��n������!C�WFX�`y.�/������w"�
. ���y�����!�B�����fLSoD:��OaTw��f���G^	�e���=6�����\�r.�Q��X��x��<IH�����'�#��H�L�!)3��(�\�b�I���,�E�g2d��R���:�T.��
�-o�V��FV�y�)
�q�O�}���r�)-Sj\%�3�"`[5Y�JD�+���l]Jw���N��;��iq^�N)�����`����Bg���J���J�f���DK`��R2���<43�h�$P3�P�iF��C���``�`1���(�j
�f��Nr���XEX�`�x(i�"f�����"�,�x Xz�<�t�CRf�����j^p>�F�g���,�K�:
���o��X�7����{�"EYB���,Nr��O��K ���c0��4p�����Y����U	�����`N�?���\ZCC����R��kX����`�c��e�f�
���h���I�S���c����i	�L`�R�[H�J��	��C�rV6���z%:���ztX�s�4<v��\��:
�:�����a6�$�J ��Y��+0y�+�I]��fMU�wpj��p�N�z�T�Z;n����a`s�7�\\�>b�M����|k��E������52O	D��C�/���:�
8��E�N2�b��`Rfm$�XX�t�<�Dz�U�8�1�l_^e(�06y����`�u�KC������=-��w>������$�H	��BX��U��FJ �Ewh�����O^�:�M��E%q�)w�H
�c'���\l�lr�y�L�$�xM���x�m�"Xw��I	�8\���,�S5��Cg4S��w�J
�6
���o�������9�dJ�4x��wPf�Ch!5���.eI``�p���#�D���a"h�Fo0��c$��"�����������������)e?�����l�X6�g���E�okA��$��IW ���~�Ia��[9J`�$�WL���&�N`d�)�02�����M�}G����(<����<v��\��6
�6��C���$�E��#�������|HR���f����_����VF4�kI�E���
.�h�M'���)��Vv`���B=��n�n����p�5!�O��(�LR�Y����L�Z������#�Ek��T�����'h��*���P�"���;����c�2����0�B�mN�������,*������x���	mq��B�:EJ�����#���M`���I3�c'���\���r�{<�&Or��K�!#�*q���)���0��#��!Ik���,D����Ve�*m����/�<�j&�Q�'|���0�-��e��en��F��1���i2I�`:�%,L���.�(�H����B�����G3�
K�N������0w�����R��k�����`{�+��6����VlY?m����H�\�L+l�����=Z#%�\���	��	W�EJ ��34+?m��3��������`ed�Epc'���\���s�g���d� �'��S���0P����;������s`${�8}*�%�H���fg\�����S0��M�y�� �$0�;k�,$I�FRB��\y����*6�,�t��L�����&�2�-53�e����=	�����"x'��� ����XOd����_�q?��;��A�<,m�DM���%0�����k���rZ=RK`$�I�;,��������T�,�&y$��-��9�G��is�F��'b���������8��Si���/��<�3��O��&o���@z}�Xq�����[�-�� �S����q�wh��t�2c�S���_�������������O>n���?/v��B	�$Wxe���/����?lg�����s��q8�SRn�6^	�bc���E:��3�4XL'�{@�����/�Kv�O!�L'7T�I�UL ,zB.gTy����j��)������iiU�v����a��y����/�����'�_WM�G���xQ_�1��S+.i?�Z����D�g(	l���NZ$�.��kd����i��9�k<��Yq$���������7j��F�k
�9U�Hg�U���#����3����w�����/?����w24$hP�%S���H����U�9�7����6�]�6��HK���Xb'����(vb��!%�W��;Z_�p��6���s����"Z���*
~.&���N�h���,�8��������}����T�Uq�6�d
L6�������!l;:MaU���TmP���%T������SDRK�+�py�1�i��������r������x���o~Y����V�KCG�KG��&L��_��?PKpuwK^�'��spart_head/21_1.outUT	;�Z��Zux����[o�8���)�6�bK���`3@����t�n��b�O"4F;c;;�~�=$u3e�Rb���|)���G?RJ��P�����N���|7����W���%og��M���Z�7\JF�!\R������$��77\�L^L�6��y����`��T�
!����(M����^��0��>I>.W���(/)M�%C��G���]�~�y���/��||��d}w7��+��,��i���ts��>O6��W���tz����%�]������C�����l1����&?���w��>��O�MZ3*&�&�LVmmh�-�6��{Lo����Nq�=e��boO��z�$�L�f�[���X�=����������	!�$�-�y��IW�6kbdis��6���@���f
�X���������j���O��]~���~qu���������{���P�=��~�T�<�cxh��5�����P���n�"��[�gL5Y#Df������T��0'�����-��&���6	=���z1t�b�xO6~1������������t�I����-��b3K���-r�d�	����o�}���F����!*����;��E�J�"��}0P_X4�1�"�20m�����*n�S����fGO�&)����"��4U@�C��{��AM�;,��*�4`�>r9����>@��n_���������=��<������18'
���\v����M�1��q�P�P��o��qJ��^��Iu�����b���|�\]�����E2g��zU?T?���t�����&+w��|��~d?��p4���������npm9���\M��t�.�{��j�H��d>[��Mz;�8
r���bH�h�Bf<���>f
6��_L)������A|%�R��*�Jgw�k��l���-����\7/��1�zvh��}�)�	�����j6�`1�T�A!P��h7?��j�"NB���>��5�Jk�H������Gy�����~J�t;-z�2����|�m�~���6/�S'1�:���W*_���r1Q5�^�kp�pK���4�����>��q?9�q%3.�����|KT�|K[�/�� ��-W����7�|�zh���<aQ-	cK����Y�VMy��PQ����z,2��F�3��:��/�18�Fc���td[�/
',�U.����j��S���O�
��@�@}V��h���W?�R������m�W��(��@)emN��%�b)C�C�h��	R_��}�
��c)T��f���n�O%�!E������.�/����e�^��=hj��1�n������Jj��m�>c��j�/��������2��8�K����t�&h���k���Q�����D}���[[�y��E��7���	"O\���&�Y ���������\q�Cv'T������;yd��-�j����s_�s�G�����A���b�Ck0�'J�AG�=��p��!���{����n���w,�(OW�����5����/j�G�>.��������;@�*��k�y��C�F�2�������������H?T+d�����\�/��P^�4�b�<\����s����>�.���5V�#]|��m�����������{+,�PB�o1�/�S}��wd
�r4�����vq���W���-QI�3hI�p�������
�,~��G�
����NQ-	c�l��3�m4��Zm��:�[zC��Q��<gv$�;�[�������AG�=��)�<;��q��!��s�>���Z9#��(�W5�����j�W�����B_{k�I�2��I}���Sa�i0DHYR��o������ZR�1�e�]��Av�iL��:�����j�AOY�N�:*�����~��-��7�vd���Rw4�4��+���G�����&x����0�I����F}=�u���H��jd�om=�7hI}�[�^0f�#���f�\�[� �qI<4��q�l+�WK�P!��r	xK+��j��O��������S���:����:�{�/�G8������>���v�g�=��<\ �0�oD}S�����F}s���1�R�zkwQ?��
��-�w����HU����v���;�K��%����#W���Zi��S��1nZ�z���~���R��+F����l+��8�������D��\������X��{F���Q��F}[��=��M5�Y}�����c���B��xP������;�	�V��Sy�uZ�}�7��@E�d��T7�W�-�V����}���q�m:�g���_l���`���+�����o���K��~=[|*^~0��B��J{�J����)s���I���Y�5��-��������z ���5�N������v��; ��deS&=��T�Or�	*�l[�*FY�����?�����y{�'�
�gWZo���5^����;?����f�:������G���g���$�/�]x�O��������$�/�]x/O�����W���$�/�]x�O�����7���$�/�]xoO�����Z��������J�S��t3[.
������s����:�����S��cR�[�.&��A����f?�J����]����w��B�&){n�7���������$���T���k�����t��V��/�b_"��&�O.�)&hy�VScC([�6#���o�%��~t=hz��nx8�������������g��N���c������x[���@�����w��o-�_�������/X����f��/N��/=�����<����g<��7J[q������W�C�<� x�5W���o-��^����x��Q���5�P'�[���7�(<�&��}X��"�^hp
�������1t���������l��O�}{9���g|�����pp�8�����5���o--iF�W��~n��������,y�%������_`��Zi������������PKuuwK��� part_head/6_4.outUT	F�Z��Zux����Kk�@���s�}������MK!���������T�\K"i>}G�W��G
6���������ifg
�������?�����+�.�N������?��b$-���	]��11x��b6|�h\�Q���#�%�QBB/,H�l��������O��?�p3�.�iT�G�Zd��%[�J�cT��eG��FP�z�<(�������������e�I���dr����L���S*���e���~c�����E�QJ�e�c������cl��Y,�t�)Q�}i�1��������8#��#��\{\ij��-gDI'�)��O�m�P�>���<=$>�}���_L�U����c��RP)���k�L�`���gy�j��e)�����H��sN^R���pX����������,U�J���W�E4_���,+�������_��|��iQ�!x�=p����2J_��t�D����K?�oA��W�!$ i��>H�@p��!�
��=C:�hA�Hb��N��Q��Cr��.�2�I��!I� �n$i���!��;��8n+�qb$k q}�t<$�����8n�"D�rXQ��$)-Bb?H�I'��[��V�S���	�{`O���D�7��J:C:�iA2[����tg�����2��M$Iy��N�� �$��ETkH���NW%��s��j�'���@r-H�I�a��!�)k@�!	���&��q��hH��O�t���9�|&���^LBH����.������?�������]Y��=�}��e������^o��3�y}|�zPK
�RwKpart_pa/UT	�SZ��Zux��PK�RwK�3�M{:6part_pa/8_3.outUT	�SZ{�Zux���[[O�8~�W�m��X�_�2�03���E�}�2m%�������$N�6
ii�v�������������?nN��F_��/_a1���m�)B�^<N���RL9�Y�a�Q�Q�c�O���G���N�!J�#�	E0.�~�0�G�{�;x���A?C�9�'����$�
��vuE��_��� ��*��N��e+����C�����Q���wOa��;<L������~���������_a�q<�@��.����g��3K8M?�]���0��WO�1-����B=��S[�K���g����W�����0��������$���qj|p �2��S��|�p�
Y��N!����/-����U($z5������V]f,)���XO�P��2��L_n`���k���y��x5�g��al���A�~�aq>�	H`����7�{��V/�^F[��H��]��`\�V/�m^F�3/s[�K0�C�����;n�dv�;b�`����r!�Bf��Il-_q��-�C�R��d4���Q���>|BG(b8�>h��&E)�%�a�e� WT�)J8�L������A������t���A�x�����k#��Lq����d�Z�\q*(�ncP[��eiV����W@��W\���RX9C	�P��
]	��y�*	�)
@�D�S_�F,z�q�C�z�l����:��
��p��G�0�{L&YQ%8LY����?t����$�z������Q#�u�]����S@�(���Gr �j��G��S�-���e�����`��	0������z�H
?�V3���/?���_�q���^�{�PW�(�?�O�G��J!1FZ�=�XYlJ5�diG�r=or)Q��,Y�s�=�x�U��$K[���	L)
n�x��P�{���|{d�:����@��,�����g�`%d�J�Fm�,�H����i�8��4�%�B
�!��>b����,�[dQ"�[�Y��dQ�O���5�,�'p8_���>���z�,1d{d��Cx��5��,�H)�}!��lQl��S\	���bxaj=�z,����20�8��pS&`|��RVQ��G��=U%U�[�@��	Q
a;%e�-�"��� ��bBL��&%[�4:,�D6���1��I���Ee��`C���*�������V�U�MF2���r�2�pL'��}���}i��`�$H{w�{�k��f������\R~��p��� �t�
�*`Jx~C���yc�<#�;S��'b��Dq7�J�GnJ��e���0EG�$�-��y'��Zw��T�`/0l�-�
��m��$���u)	���qW/i7EB�J�n<����������`���zwA�����l�?�jA�����%�r7Pe�0�m�81��~��:�[4�?v}�|����O�V�h����<���u�t��U�������{S�*W��	�U����W_���z$N�i���������
L�����*��<�_v�VqH�RF�sH��h%4�������k@L��=��2�z���b_]`
�r���#$�Y�o-a[��n�M��Q7}e��������n.�������������������o�j���������[~�T���C3��|9��e�)'�������.��q���3�L��C �.�i�Ff���S�f��[��
3�s��zx��*t�{��=t���=t��x��=t��\��Ts�&���A�*�����]�NW��o����T�3o����V��o���jLLv����}��`'��!qmI5��]�������2�g�Q!�k��y�|&�?��f��p�w�"��|M���p5���_,;d��������bk���p/+�g��uJ��~�k�����p7�	��s�a��N��$[6�#RQ�>/��vh�}�����PK�RwK��p�?6part_pa/8_2.outUT	�SZ{�Zux���[[O�H~�W�[AZFs�>h�-m���[��)r��Ng��{�����\H�v��������9�q	�~�����������������_�#��^J�~7���q@AP�u9e�$���1�����X��t�Q�'i�!<�`S�����[�q<��{���[B>�������I<���&�m���o�L��?�K����&�����>��|wg$����0H���c�����0v������_�u�4x�)�]<J3��8
&^��~��0!�ar�OY+(�:�e��x�*���:W������8�G�s?������y#.�Q��k�a|Frzh�<���L>[���������	�qG
��] 5����&�K���p�u��+�u��RO�e�j�/v9�V���r�2+^��;/C�'��Q�{?����-���i��9F����/�0X�a�L�z_�a��R��&�������oq/*���BqC
���9#9n�y�N�3����*�V�wy�zGj8��0��N��8���1�8���xfP�P��z=�O;���<���YX�,�~/
{i�������s�,�~z3pc���I*+���s�����*)�|�������R^_�-�4(#��Z"�p���H��,#;��*�(��s������[�X��V.��a�S�{�N�nyq���6���>9�����Q����VSU��[�Y� `���=���(�WCW�s�����/�w��D$����Y�ee$�6�6���4/w9 1����>��i�������=&o�9u���f���\�'�q����!�CQ�{�Q��\���'�!��*#9�V9�=�x�X|,��b	�w�r����Y���=�D��@e%�qJ��	>C����.���,����X�I,1��2�@,�
0[��6�@��XVZ�=���X��"�@��8kK��E>�,5?A_����Rr���|[$�nZY������-b����O���C}wU,�,��X�1���X�YhKb���X,\V���"�[b�B�=�l���c��D�X����G�s�X���bI����\����2��6��K9jY����lw��,s�I1kZZ�Uj)����;-m%5VW�;�g;���1ke�U��PC���"�����*_x�j��%'@�	F��n�v%��Q�>L�����`�s����_)����0�
����������p�"�h�x�J�+��q�Pv��,���)Fm����&2��@���)����3���:Ix[��>M�7����R��?cc�l�����������qC������
���J��!���������	nP��� �����V���&�4��q�`�Q/r5�b�j?�qW
��
=�n����������c���O������c��,)zg�c���P����o�d65�rP�i@�~�+M6�����hP����2����Od����uE������*IpX�!��������;u`
��RlcL�
�\��D����ik�i��6�of��?��U��5_�o����A'}d����/����$�_����|:{O�<�*�*�Z�&��<�B4}N�����W��y���'�	/������mo�NN�t���;�g��D��QWL\��F�{y���f������3������������u�����5��+u�S'k��W���N��S3��z��H]���N���k������3��-O��Qg�C��1�\�:�:�V�Y=&f��c�+\"�����Sd���9����y1Y�;�[��&���kR�e������L�����M/�@������������E������;0�7������jo%|O�FtY?���M ��M������_�~0f�����a�G���({m�'N��f��u��D���PK�RwK��e�>6part_pa/8_1.outUT	�SZ{�Zux���[[o�8~���[`C���`2@�ff2��t��>�#$F�#��d~�J�l���.�+"@E��>�+B�?�����?������!'����|�rBN��8��\�)�S���,�>�����!�����F�|
I>x�/�H��Gi5,<2L��>;A��&��A
���,�����1����5S����:T-�6�Q�L��]��\�7�����[e�..��5�9M��8�8M{���x����y�2BQ��t����$>�Y�k�k�?�����v�����
\�AV�XPOX�V�����|J��W�?�� ��Q��D.��S4I�O��x5>#8�d_�b��-BS��jSH8
�y��l����R
�0����fy��`����@������
*mM��kT.G!��{f�������d�X��/i��p`����7�{���M^f�F/������$��$�r����^�V��������=$a�	+��X�x�	T�M6�����QpF��v����C��R���h4���(��s�F.I4�^�����P(�P����,w�]��Z���]Tw8H�A��]
{^�R�����]qc�[(.4�T�;g�+.��
8*�X���C�E���J�}�p}�UZaPFjM����,�:Q�#��2� v!���)����T	|��(���?�%~���F���;�S��h8���j4����	�#%�R[��V�,�`|>��3��t����!���i���/�(!)�T-��P� �sz���(A�V���<tY����e0��=�	���K��;�����E%WwI������9�sQ���P��(��8�W�@���B�o������6�`J�+����5P�t �i�:K��]{<�D�W��&�8��Y�8R�@��Z�QB�,�F��zp���v�����T���%�J��G�Z$�PW�4��
���,_�a�2�Yjy�����#K�����cY���8�RN�
�,Z$K��H�L��,�Y�I�$;Y�5���,�,_A��EVDT��
�����e�R�#���,��,'V�����6cL����J;��9U���j����S���=%���v���N��	r�fX��l)�*deC������^���j���T�!QjM}o>�Jx�o�����������0���
����s���}i��t�}���b%t���q�#nm�|=$w�8G�?���}*�V�0���Xc�U����������~j���H#W"�y(��Qt3kcJ8MzY�X���9������4��=��(�f|[��<	\���$u���]��$��f!�%�K���������p�����e]o���������-$����*O��_�o�u3�pf���I��R��x�<���k/t�g��P�p��u�q��jw`dqf�.�w�������Cm�R��,��<��[~�6��a�w����� j�!7\h*�O.��~����6Iqk�_Wa�4m��!W�Y\����`HL�5F ��Ce����A��6��M>�@��/Z��x�c���z����. ~��K���x�����zw�������7w�GG���3�7���?%�[���+����~��~RC(RC-�x.�>7���88�q�{���Muu�6[�M��0N��L9�oX�_�>?��p��B
�`'��0(m��
��>���A'��=�}��@����l�xk�N��Nt:��N���*��c��tf��5��
��N�����Z�p
��>��w:��91�Jz2�	��
��|9�*C���O!%Pf-V�����n�d]��.����Y�u�?_I�/k�������t���v��	�Tkm�������zu%�}om��+�ls���am��MI�u����E���N����/�gy�����?)�-�F)A5F��)�>=;9�/PK�RwKL�-7�>6part_pa/8_4.outUT	�SZ{�Zux���[[O�H~�W�[����N��0=�����^z[�}�<���`gg���)����c����J��L]���:�:�&d����.o�M�|����rt��������('�x�N�s��q�rAi����N�9�st�?�KvB��a>��$=����fT`WE2N�	�>9B�N$��Q�G���,�M.����>����5S�S�d�����M��+��/g���L�,?��G����<~��F���.��N�A��;��_q�i=MP��!�����4�O�����qF����]=e����i��zV��R����r�nQ�l���#*@���$����\��s4K��]�_�J��/�
1S���e+|�(
{�TY��R+
}7����fy��`����@����
E�(��Z��v�_Q�l��M��5�c�����h�8-�����%�\���]3�{���>�)�k2�k2��&�^�9�k2�g2)x�d~��M���Q6��B��=�f�H@�c�q������x ���n����&���t6��GqF��$�Gi���s�M�/z���(�BQ����(,G�����BQ�q�v���;%�(������{��
�����+n��x`���K*k���k�;
�C���6�
�����TJ������
F����7�E"�TB*�eHA��������m���UT8��(tb1�M�
��>�F��[�S��h<���b2����	�~_s-��*�H	V��C,�N��!t�-p�{-�~�� _�QBR�� Z��PH�P�)�F��aTA��-���v6m?��y�!�����xN>��W�����"����O������>�p*�~o�%�'����_j!���r�?���,���mY�!�f��@��,YN�?�D�W��&�8��,�H�����]*YF	�?�dYb��"�aY>m�x��T\,YVZ�?��*Y�����������I
&`|�������,%W

��kd�6��,K9
dI�,<�I���%�J�"�,��T���,��`��hgyY�Z���QYy��*��aq������������.�������M!��@�1b�V���2��.'(=Y�d��d�Cv�|�N��6�
�dK����+b���V�_�6�0�lu&��N����1��^��nueu��d-����|�S=H�]X>���q>-��U���Q>|���PM&���}�`8�<�E��!���9b�%��S)6B�aR��L2*��1$���m�a�P��VDf���=w%Z��jfmL	�� ���2y�L�����l*
9�
�m�b��Tb����I�	���=��H([�B6H����pq}y{��:��'��(��z����A�m����$|Q��5�r�Q�`�3/�Ur?�����������R�pm����M�c��gY�Y���B�~���DfW��T��6/����_��-F$����T�X�e�������y�~��q�6�$ )���[�.,�����@7���r��ydH�]4F!��Ce���/Q� �����?	�	����^���yv7�/�����9�.������k�������_/?�]�T�-�a�XZ�%�[���+����~��C�!TB-l.��7����8p�{���MuuU�"��)}9Tn_�V��5�����t}��N���
4����+���=��@'���]�:��N�C�:��N�@g���w��C���w_�:����C�:����:s`��t�}�m��fN�%�.W@����S>T'C���O!����i�xh��4Y�A����������|�_�6O��������vK����E����X�;��������jt�Y������a����\����:��;�d�!uO�#r�g����pCYKA)��c^}"zrt�_PK�RwK4�T���'part_pa/18_3.outUT	�SZ|�Zux���YYo�F~�����@����n��E���Q}X��	K�BQ��_�Y�Z��L��(�v�o���E��������N��>�J���+�4ZDB�i����d�2�����9�����%}�f�����hL�u0GY���aTK���8�5�J6��I�������_�d�<��L�� �D�m��d��������u�_�\�{x;FI:����d�%Y0_��4|�~��3��=��WY�S<�8�:���Kx�,��^$i��H�R�n�JT$`�����d1������O.>=���_��*����u4�^�|	I
�P���'9����,��S�Rf$��`)�/����_L(������aFD1�X���.�Pp4��RZ`[9gb���wk�i���`"��b�m��Ice�N�sN0����0$E���`g�a:5�}��`������Np�J2p^#�0L'o�*2��*}��!*���(�,\L(��������h����V�����o� ����}D�8����x
aMwV��s��������1�h�����6��������r�t��
fxsI�4�JV��+��R	��E"�L������N���%�|�Bg� �C�b���`O���c�,H��v�/zE�����J��U��~�g�3pf a0�+�)B8%�dL>��b<�7w����/�a<�h��B0������sI�Pr`=
`����1��-&q�d!�<���7t1
b��-N6���(0N�E�i{������:{Sk����S"a-�����ee�1��Z&� 1����E�[�]$�2LI�\��d�%�����.	u�K -��&"��6�[���h�PX��+`�c�����c!�bF�#���m�CJp.Y<{V �%v�	�U%��h%��D�s8����Z2���D���0!k��	��[�����p�h+L�� 1-��&ME��@����m�LC���]>E�)��w��@lK�bUQ[�@�s&1��e��E�lK�J�����P��].u1�0k{���=��W��Pv����4C����
&�������������x�t�WQ|Yy'K(����M-	��N��$������d���(����P����8:�+������������I�I��!��'	�'A!���c?I�>	r�'��&A�$�A$���ib�>	z	� A�'	�'����`}��((�j�����	��.��W/��_b��C��
���
`m���/�\�t��p�T�:;oi������F��~'���fR�2T��=�M51�[�;G�2����qE�T���Zn$�=�����-�&���l[;Qm��=-�~���F�&���]>����]���6�K��������	��^��]n�|����Y�]����/a����r���XQ��trN�Yx���1P��=��z��PK�RwK��*���'part_pa/18_2.outUT	�SZ|�Zux���Y[o�6~���[�%�R���mXZt
�aO�f�P[Jeym��whI���N�\���i~�w.<${����o>��>�z��+���Q�c��<�����Hj��K�y��\HV_G�}�����l?�T�d��l����Xn��|��=t�fEqA��=���K�~)������2=K�txqJ���������7vx�������CV���\�
^�VE��.�l�>���SB��M�����%��������7i�[(�=)�M
[$�j��R�$�oE����gZ$��Z}3S�����F�����:/����2�|Z�$x������O?���R��E�N�1QisZ��V������)s+�,:��u
�u=�y�Q���da�1Vq���I���F#�����Q�.��47c"�w��R�!L�A!�
&���x7L��Y���o)�5Y�_�$j�����nP�a�������c ����,O�*��
���F_���f�*-��b9��~��2����<`/�3)����|Ia�Mn
�#1d)^�%k�s�^=5g����J�Iu���%��^��C��>_�FP�� �M!MdZS��?.�{�9�&%�x{g{�����/��,���r�7�8Y���?T���UF\����X��L�h��n�f�kG�9�4
�����QL�Q��=�7���f�����H�i+��5��_3	.����>� p����/%W��I�&y�{��{�~f'�$gE����<��8����(2�V*�	r�8dF���
�!�`�;��Sa����������H��:&��&����D�$�%�a�b�9�0��@%�|&�����Hqk�Xn��=����&����$Lt�;'���.tz3E���I������:��Q��^w(L�vD�Q���3�Ca�l&Fc��^w L����C������B��Nt�&a
&���IW&B���J]��l��1�����&��&�{e��3�f"�l���M�#��M\�q�E��������.*��yU�� �?b�]t��;�#��#�\����q�=��5�l:Wa����n�c@�[>M��r-Y~�>��^PI$l������$(�	m&� �V�|]�T��w�qv��|0�������A�"�Q"�Z��"�XuO@�Mt,�#�N�`b�}����6����"����"�0|,��	u� v�N��2���;�
�����5�v�F��sA8����1�������{�i�<���x{VA�����&lS��[��5�~=]�5U�W���������~s�+C�����POdE���1���-fH����� ��K��t���7\'�r����}dLz}�=�6����V�(��6��c�����.�CZ��{���4&�l�r����5KS�]�����S�z����J^k��=��k:YV�i>rB�����}[�;���PK�RwK�c����'part_pa/18_1.outUT	�SZ|�Zux���Y[o�6}���[�%>�I)�u�6,-����@��D�-���&�����-3��(��	_L��y���$	y�����'������������9JfIA��8��Bq�8Pa)���2A���|�������7l��F�bMI���}n�W@iJ������k�,�������#!�������<���"�^\��@]��������������Q" ��7#���8��1��YM��d�o?.�D����"��9���e|s���p���{��}
�$eI)�H���I"ag�7
���(W�gJ~><��dt[K~���l2"���r^J�%�e9a�/
_�J}��E<!GH��*���J�^�(#��^z1�����Z/I����:���}�0F)�Vk#����	+5����
������4wc2T[��	!�B&����0��\��aZ!����4�/&�F��[�]���R�����������Po����h��4I���gg}y���}�s2-�|Dv���n�_�QZ$���H�	���F��(��	2$'/�����Y.�k�}��L�Qq��%�QN�����;��\��d���:��-L�
0X����zS0�L�<��P�|N��Q���3�����E:��^��(/������h�8k�2��	�A�
�1A���m=��&45�)Y+��X5���;����xJ���t����(U�0����wI���5p���%�V�������u�?����9G)���&k��"��[q�h�^*������e�gk���c�[�&��U��`�=0��j��jp/�D����(�i+��u������\��Mi�R���v�q5�0��`E���z&����l��E����+N�Zy���	��Tr�>`~��5t.�x��Dtk�&��\1Z��{
��ua��4C����#LwZf&��3���P�	�5�e@�D����i�IV
�'���[~/���F�e1n����kM��4_�O�~�E�h�����yW�m����O��\���wi��UVB�n#����\(P{M�p�(�u\Vdn���G�[:��+��$=_�x�����6�QU
�~� ��%4���F�S�b��������^�ID��|��!,?�S� �`+�v� C�x��4n�&�
EPoP�������l�&��Al(�}���B� O��'����[�W$C'���d��yp'�������C��5��������n���j�P
��y����k�t���;uM&�gK#5���{��o&n����[Zm�#4�������H��4pi�Vb����i�1����A��4q�H��82$�<�xM�upt9���#��!m�S�!��*$t��CH��D��m���]pz�����p��XZn����
�Z���k��������T����9����"A>�W����_��
�����PK�RwK�H8��'part_pa/18_4.outUT	�SZ|�Zux���Z[O�F~����
��h������m��+
��>!7��"���t�����;��	lS��#'�9�w�>!/�~����Orvz���O"���5 ��4���yv$(�
Ki!�I�o�#�����#��0e�`B�h�1Lh���t)[
J��&I2C�`��}�H�Oi��_]��U����K&%��<\�e���P�/���b-5 ����$I�a:��4���,���h�o>^�c��=-�Y2
S:�������7�a}P�"I�6� eH);@
W��T+U�4���,���n�����O;��8�K�]'�Cr��F7�%_�i��"���������<���T�Y���TI�Ke�Ro�G���RP�J��Nw���6�S����H�*G�0VJmD�6��.�,�
��h��N�j������6���!�L��@�2O����s�������w�$�1�2(S��[���#WQ;H�Q�X���Dqe����/O:}�k}�&Y���|1z�u��,���G�Np���0^`E��=G����o��_���������e�
��0�,��r"O:h	��i���Pe%WP�Bh�������[8gRkj5+v
��m����?d>���I�!��o�i��G�=~ g�,H����/zE�6�Z�L%i�i�7��q@�,���NZLGe�i���UROd�X���2��$����,��
�Y�
�C�+��.i�t%���pLM�
��79�g�o��"�%� &I��d��`,Z��X$��
�nj�s��*
q��Aw�%h�-H6
�h��(���}�F�!�^�h��(����Y�6#�Z�|_U��)��^�l�-6�)ve�n'�������X1_�j�mq"�T�w�U�H0���ZQ�F�`�{$�!��	��5��$�����e����G$�!��	s��(y�`����\^���.�5������� �
��&uK�#���9K���9�*��@xQ �!�q���
�����*[��
��
�-c�W{v���+o(fa�E�]Jx����r�M��z�>_���:�����J��d�[��k�K<�
��(�Z��^����)��k������Jb
����T�z(Y�B�OI�
��~'G]��NH��CB��?�I��I}H����~� }������ 	�'A�
S���O�O�y�$X��+x|?Ip>	��'���~��
��-�^4������zu_�B^=x�_��,��.��|
����6�o��n@�Zq�h5��=�lY������T���h|w�2Re����|�jR��Ul�4:�gp<i:T���@�7�z=��h�����
h�M}��K.l�'������7��A���[z@kov����3����xc�>�.A���|�����	�v���]_�����}�46�����*�zy���O?$����%�������E!��#����?��*���PK�RwKY���e�part_pa/1_4.outUT	�SZ{�Zux����]o�0���+|�&1����GE�����@cB\M�5m�4)i����$m�nY�!rA�JM��y�'��m�:m>�]}f������
Nz��"��cG�lY���R���q�=6�����H������cv��U��"����`���*����]I�-��x@\N^2v�Q��u����i�Q�K����3sP����Z���u�����_C��i��0��<��C����I4}~�D�cYD�jYwR����Vz��"�>��iGe������z�?���](f�d�������2�.�����}�����������ym9��y�����W%��1�-3����}����/�e�i �����h��g-��.���I��h9{,�b�eu�OH��Jr�ui�:��w��G�������%IH��b�����Bp��{����x���N��Q�	
� �Dn��(�RHc���a���G)���
�����r�����'��kRZ���B�l�qR�|�����,^L����g�4��h&���<K�f�RGcnUT���,
O~��eW4>��	���N����=��;����sS���-����)j�(l����/��kU��������/P��^�x��`��j����^-����9��^�:�� �����;��[�^l��L�^��
��������@�^�����9K�����i��h$��<���x5�C������Iq���{/��B\9(����9t��+�����u\�:���������z��L��"�����&�n��H�,���=�fM�A��FS�.m�����RL���wux�h�Z/{��t�]x%��{^���Vm��tZu=d�+�1�
���0^1�\���<��P�>�L]���PK�RwK�+�g�part_pa/1_3.outUT	�SZ{�Zux���W]o�0}�����,�����HC�Cb��	!����6Z��4���s��MG�l�����T��s������}����3��8���lt6�6b�2.;�f�b"��R������,�~�'���g�b�����hZl���2L���AY���]�$�V�{:"]��3�*N�$���<�����<��"� ���`�a�����U�fo��1K�4�EX�H��������$�?�����.�b�n)�\gy�z� >�]:�c�w�������8�](�l��m����"�.,���}���>%��Q���h9��<z��q��[�{�B���8�~�>e�m���2��4z�~�"���E��%��(/b��&Z/�s�8`Y���������,�Q�k����z�o�$!a��UHg;������p'��Jk�c��"J�J� e����(�4�QL�v���i��,= ��t���R����`JT�r�59)�P��L�n������$7�E���~y6aO$a:t!b\]O����rE�h�m�
�����_�"������0c_~�(���������zny�i�!/��|.96�ja�-/hP}�w�5���*�^^���%o?���b����4-z��"����r/
��	1g��K�L� ��B���H���a��}���� �1�{�6]�V�5�qD��@�A��yM�K�j/�s�����@�4#���j����A
P\�g)�������w��!�+�Ph�����9��]����um�:���w%�.����9�w�]8l��%w�}��5i�+wV���]��������r��lP�{����.��m]��f��)��A���k�!;N���c���s����/����I��:�zi�%���I]�NG�?PK�RwKX��+j�part_pa/1_2.outUT	�SZ{�Zux���W�n�8|�W��	�!H���Q/�z�i���S����Yre�����,[N+^�V���H��sH�zm?������.?�;.]���U��������Hk�����|wo<�����d��Y��(0��,��(aE�����(��{���d��~�G��������(��&�6���<���!Hna;�y@�� l[�m�.��$NC\�����]�M���D�?>(�������AJ6�Y����A(|�2������o[���H�`���Xd�1�����������r��_�����D�"����`k����S��jG^���8�~�>g�}���:��4z�~�*���E��%��(/b��6Z/�r�8`Y��S�i�����dy�������>��G��%IH��j�������D{�����x�
�wL�#��	5���v�[U��B{�'�����v�,KH��:��6��s�f9�����M��
��}�t��:N����Yr�^��Y����B�A2!���b<.��,W������0�_YN��
�e7�>���������o�;yU[^�M#/8-:���.�������Z�a�T��Cy��$�e����Pw������K��-/�p}��-���
�s��
���m"���������>���p�����V�.y��W{C�P7�j���E�C^���T���u�t�6%*��\�����>������{	�����Z`�{�������N#*_��(.mc��}������h�+Av��*�����h��?d�j"����)y
�4���Py��,��.He��]�{�]���dP�{����.�JO�E!,�f�m��08?h��/�Zu���y5��9n�aK:^����"&%�O����+@[>>���h�PK�RwKAc��v�part_pa/1_1.outUT	�SZ{�Zux���W]o�0}�����,���E�M�4������KS�����i��5+"���T��s������}����+��8�����wr ��.�i\0v4���@Z+%
���{�Y�������b2P`��Q4,Q��x
����7�z�+����H������i���;����t<��8*�n�[X
f�� � �� �����}x��$NC\�����M�E�~O�����?�"*�z���u�����J?����`������Wm��$�5��PL�Q��-���|I�C�f9�x���>%������ym9����������
y#]�M�d�j_��6�sv�Di�n�"Z��I��%��(/b��.�O�r��b�����	��T$�k�U/]q�;��N�Q�����f!��
.h-����:Y��W�~�� J�) 
Pp��Z
i�^L�v���0JY�n�luX�6��s�
&M����QUNH+����5�Y�!����f>�g��/���$L'�.���/��~��ff:#t������a�/^^vE��G�o�k�wx���UMyie���C�"/�R^S�l���(�a��y���FP�pPy5-�6��R^�����Z^
�u!�n�W��1g���T)������xp�����]���p�����M����D�������u��P\Z��v
��K���u��llR��M�S�����j�������h���}�]D�[�k�3�eh������/n�7q�Nn�X�5�����u%�m5i�
M�U�B�Y��P[���]h���vkK�F�e�,,T�P-��b�(T��L'�	������KVl+tY2 z���WJ �Y_ �{�v���C���HG�)-���E���'d'AE���||Tm[���oPK�RwKiE��part_pa/14_3.outUT	�SZ{�Zux�����o9���+�Vx�5�m�R)��]���r���=E+�`�r��i����^�KE���B�Z����_�gL��kx��_rw{��;!�w������yEH\n���iG�4uY�7#F>��j6|@����fR�K?r,BR�z��EY�������^�[^d���'����O��?�,�f���}����_7Z�um��}
jyl�Pp:�&T'��?�z��r����O�Dv�����xvrH��.[W9�q��9Lc�5Mn����Z����&7u4��o����Y�E��,s����,B�f����Z9N�P�Z
�Z��-���o8�o;W~)D�_���+��tq�BVs���H��Uzp��E��b���j��I�"P�t���)
:����������S�h����p�&��������~��,�a��Y�,���	���h�0]�2tW1�z�J��n�)���������2���f��&A��F�sN]���b8O������a/�(���p�����M�-W8�jVn����sY����L#�?���Y�����gr�(@XP��L;LZ���
����������������dj0T� S�=��LA�N��C�����D���)k�T``W�L�t�2T�2�u�u�~�~u���iW���l	����f���.�������-75di�s�K��xU�r!��x�2�/��G��!�Ix������^�Nh�t���^����bC
8jSF�!F���d?^��k��E�5�~
;�p�������������ktm���D�F-J��3�
�E�B,uj��_6]�����j�Z��}�`��gJr
f����b�H{���b�u�����1;�+JuH8����������
��C���<#�]�5�C�x�?���&������5�b����@�:���B
�Ta�E��kO���������dN�����:�`��_��O��&��!�N��o������B
�=�r�I��Wd8/�T�{�������@�KC����,7=�����U���_��J��c?��A��PK�RwKl��R�part_pa/14_2.outUT	�SZ{�Zux����]o�6���+xW�"�?��@�u����Z���&l���Y2����P"�J��A��&���I�s�:<$B�������������h��~�6B��&�����1a���R�c��]���Q�1]T�g4N��>�P�n��j�1U
���qBV[�����+�~K�$K?;t�\��2������,?e�i���I�r��A���� 	n�`P�S	#� ��� Ol�?�n�v%���<w�)C#n�}>_��}�KvU
~���a4]��v��6p�.7y���7I�Bi�[C-��{��k�����1���v����b.$�FX��a1��o8����_����4wi�6�g�[`�v�����g-@N�������"��R�,��� C������96��$������;��MZm�-z����y��"G����TV��r�(��5�d�[xa�& k#9<,��Z�Ai�wd<_��g��*�.�d_��j��"�����:A�oE_{G���0�;[V�f�T�b_����E�&���3
(����e�.�[���X�=!�j�o�0ZE
�>TJ�a~!����6������2��jH���A��(S���L���}	���-Q)��eJ�2��wd+S*��L!u[�����6����WP�e�/���|EoK�Pb���bH��NX�4
�"da�k�K�+;xe����l!�j;��_Ke�x�=_2^������X�R������0�U/���;���Ww���z����$��M���M�690u���t��>^��	E������Wj�K
�Zr��l��^{�W1��-^���^���0������h�p��T����)���Q��~��`eC>�����.�/��e}���H<�qF��@v0Lb%{$��@���M�#���h��s��&ce�[���Z��`I��p�q���]UN��V������<�#�_nS�>�e�tS�7D�o��������/|��'�S�
�S�#�q�������3l� ��#�����U
�4_Y����/���?8��PK�RwK���#part_pa/14_1.outUT	�SZ{�Zux����mo�6���S�]�!�p|2�����"kQ{6a�%������GI�+�rt��B�|��<���������������4BW��
�m�NKB���('���rNi�J��?N>��r1�bD����%+R�k?��C���������|��G�u���_�,Y��=����~���_0,���)�u�M���vLZ�uk�5�CcR�	����emRG�������o�~[��U�e~6&�7�6�e���!���d[���	vf?��k��NL��B�H���SG�{�&)��<���r��������_��n�
�/�s�i�	&P�y�Y��7<�[��Oy����4�i������7�k�?�		=����Z�9Y�<��l|6�sdT�K���L7J���H����G�Ey
�4V=yr'��I�u�!o<~��&�3��wt�+��+�����i����N�V����q�)��+���������4��}�H7� �W����_1s����8<���?������,�"���?�-�d�����|WV���y�G��/@mX����C3�=��@�3{��2��(
q��5:���4
JU�,t�j�j�{`�S��)���	���E��O�C��A�u
�Z�������T�I�Z�r���S%�a%��wA�����WPwE��{�BG�pp(7w�
p��O���j]@*�c�����g��Wu��C��Y<E�������

����W)<�.������I�GELe��<��T�&A���WJ�������t����QK
M���~8�|�;����&f��v��C����mv;8�z�\���2�`�����Q�|\\$]�����j�a��k�������x��WH}���r����C�k5��L�s��R�ARc��/Y���W��0��;�9��p�c�t3������SWJ��o{u�����0����r�T�B��p�g?�
�^C�U�	7������w���O��1|�u�~�o?�E2�c"���,o'�����P
�>���I��W0�ai�e'VS}s����B�K�?Q��u1 ��tW��I���,T�����s��`�PK�RwK�?����5part_pa/15_4.outUT	�SZ|�Zux���[mo�6��_��w�e@��]�v��7�S��D"��][n���%��%Yq2�k�k��:>|�xwd���8��������#�/�O�^\�A��:���|���g/G�U2M2B�G�evf�s�Q%(
������������8�9�L��p����%S�9��j�)�j9f2����AX��i���S����g�E�$_�A����	9��^��a�Kk$(��~s��mk��#���C��i3����o�����-8�����	*���i�JL���2�����4���ZQ���l5'���d��>����_q:�Z���[��9$�m>����gl��R	0T@�j�,Zd�%���(�h�`8�v� �a���2�;��Ww�L�k�����$�7�|��^q2K+s��f}��TES����|I)��<N����B
�����Sr|<�������<����'����9���>�U��:a�"�����`�2N�h{v3[e����Y��Tl**jP��
�FGK!]�<j����H�Tn*+jP����F�+y��!��K����������U�(�RS�����h7��B�B�W`7Q���������LQQ�P�����P�Q�@����z���T���3Lo�T���n@e`(���1�*�����V���$��y�a�U]�<���w��8�IB��-@��6]��H7�������c���D��Q����-@]Em���@M�T�Ye�6��V
�<����<g��=��MLfC��:�:	]a��@R��`X�z��'�i���_\{��,Ica�t�ds ������8c��*[n�l���-^��t�'�_�eYH�3���|������Jg�/h]��h������
���-�U��������F��m=pi�T����"�!�) �UA�n �(�5s�!�8*�]� �[���
���_{���v��n�����t�����i}L>���oW~����-_�����S��	�	s�g��141(&�As�F=O��$�����l[��&��E:Zda?�
ik���6�F�'��/r��b4���w�G����d�����I
�j��r���q��Vi6��b�].qj�W���??D#0��t�+�b��l\L8gXhk����\#�v���})��'����)�^����K��>[�����V^���j�1���tt���0�7�E��Z�'�~.�`�3��n�YL�(F^JAZo�����e����������'?���Tpu��
�
w�r���[�vv�d������[{=8��y�4[�|�d���|��G�*���z'8�
�w<�,W���\J)MA��MG�������G�n?�,�>�� W6cDyW�*���T3���4�X�i-,5R�-���B>��v7���m����*'6R��(�����0���|7>i�#��v�1�]����M_�Z�5�
!�b�&�B�El�6����a�g*�+�b6%,o���H��(������`^R;tc��4nR[R����+C��~Q�4S)�P�#���`�����
X��3��ts����6ub#W]$v�u���h-�dDkq0�������Q�����]�Rr����1jE�ec�Q�M���K��-��f��-��ub�f��*��jPv!�"v�c��<9T��)6�����������7���o��.=&6��HCm<����iX�*�%����]H��m7�����cl�g��v�"4:CH�����u�W����Q���9�O'������;�����������w�-��
���:4
���]�/��G��L����x����!�e�p������	���>"���#8��9�X?�]CjK����Vq
�K��$��)jC�\��I�b;��J��Nb(�c���g�$\bz�>��@�Kp�w��n�4�#1�
�V�l�
�����%
K�p�_X�[��W4����=�p7Un��i�.=�!>�����H�EI��f������\9���_yI~�.��OOI�����tyD.��h�%�d�+����c~{���:8:zv����`�
PK�RwKu@��,part_pa/3_4.outUT	�SZ{�Zux����mo�6���S�]�a!x��S�h��{h�.Y0�U�:B,��\[^�}�-KVbY�[umr���x?�u�;[�����'�w����
�qpx��@�N�i.Do���cR�=�D���&1�>��A���E><&z,z�A���D���cG����ru�3��)_<>`B�O�8�f[g,h�N��b@o����T=�����|��������^o�N�4O�� G���<�\$�Y:H���M��o�Cq���|�-&�c6�����?�lv����iv�������G�I�avq$�lz�V��t��&g3^��g�!��t���&��,[L�^^��K��d��F'��R;"B/��N9��&�3<��P�D���~�-�R`��d��Ur������0�2�l������n����d�\f�����V�������P���Tn�kv������_��x7�O&	o ��u1[�}|���)/p�����ei������5k��;kIK�����4a5��a���������Y���
�]R^*l��=�eR���$���wr��u������W���tpu34�F��G�mVhP��]�t��1'�4'o�s�Ns��{��w���]�����?���tRj�)e|���������N%�u��`�(Y���3%Z�rm��	��k��8��*��y�U���F�H<�N��,)(�8������nX��(����9H���2�{���{�������8�'"��^��Es�`�Z3*�mK�1�P�:9,���GG��LGy2�;�1��'�+�*}��3�x��&?�U���?��x]=��]��q�p� ^V\+a(� �SEXy����b���sa�����&�o�JI�%a�^�6��@U�����aw?5��
]�

��L������Fb�@������v�F���C]6�Q����Y���0'�Q��r����<�R�x#�$��k�v�t��Z��I�v��De���+	#W�Q��r�5a������	�"a�Iz4ag�K�,��:��a��J��k;�B�Ke�y���]�A o���o������x}��w��v�����z*�0j��>Fa��:O�Yx��r��������[���[	�����}
�eQ��S�RRV���:�Q�,�Jm�:�e5��J������$�	vKq�����	���� �����oo�v�34"���ldl���A��/�����qI��vA��z���e,�y�z�W�8��Ay����45p�R6_�U>|��Y�b�!��h�E���H��Y%��UKW&ed�N������p�gu�dM�:XqaU`�u3�O]E��m=��������zd��w���6�pu"��Z7��Zl����n�z����H����������#e�:B1�CTNBY�E�@
�7���������(�-���L�bK�}�M1�a��|t��l�g	W�)Xl)�����2�[�7dzb�w��6����@�[+��sr�-��z���'�]��@��~��H���wM�lX+�G�-��R�J�V^p��"'+��@oA�����X��R������c���3�����]�'4��jR�����QBT�w�P�e8�z������m+�TS�W������CQ�+��������Z�Y�v��Z�A��
h�_��j��`�4W��U9�k�t��E�8����������FRb���������>&�N�w���J����U>O.��$���������yv��E�~:�g_M��o�u����\:�\���A'����x�),���9��So]���������PK�RwKbw��C5part_pa/15_3.outUT	�SZ|�Zux���[mo7��_��:���4�I���Kz9���>�D�K+UZ�N�
����]mdW�%;	����>3�2_�\���5�����g�/��x}�_���������]|�rF�$�$#�|��fW�8�JNiP�@�lV�o�������+`bB���l7]�,Y�+�g\Q�j�b�Z�������${������g�M�%c��_���n������V2f��o�T�o��-c�a�Q%)U�3�����������uN���8��8c����6@+����i���8�^+
�v����?��K�HR�d~y�@���z���w�����>�Wv��5�ZHn5��x���9e�%l��(��3�+��A�3�[�����?�f���m���d�x'��O�s���*m�mo��	��S�*S�u�|p�|�
�+��/�0��U:�$���h0�������/./������I��/����a�2������t�F����.�-&�R?9�|P�P��Z0��%�p�
����t�v���=@EC�*%gJt47�����<�Z����{}���=@eC�
F(*e�g���!b	I!���VcT��jPn���
�Bi-\���P�[���
��v�Q���
���L�;�������bOU{��d��c��jjwS���+t���<sEENKc(�$a���=@mC��<���
��i>��3#Pg��/fZ�Qb���=@]C���P�j�vy��r��%������C������YS�b����X�Q'�P�&�Q,������O?s
����;O��J�XX:���a�-)+���V��������xm9�P=.�����$Wd����3Z��6]5��w����w��W��:[j�^�[L�@�.b�����c0� .=�
6���d-$A$QE����]�OJ�,��A���ya�[��o=r��u�����o�r������U{L>��/����}9�[���0h�+p�V��3a��JTV��?��I��"�����[��"8�"��EUNu�S�5J�P���o�E�7�S,14G���#���M2�����E
�n��r���y���vi6�L��b�_npj�W�7?'���F`\AV��iv_�wN3q�(���2�����Q�Pgr��=%/���eG��A~^m�fKB+/��py��7�]:�?rP����&Kp
���X?WP0f"�-:#1�0���1T����L��(�P�'��3��b������1�rk$zN�u���`�A���W\�/�����uZ�~�4{�|�d���|��G����z����9!�ruP�p.%��8���Ni�l�	=�~������'�e}���f�(j^�}��J�4��5T��6��1y�Z('��O+�=�q�v5��-sbK��W$6VQC��y3��G��2�_FLl�"�q��k5k�.�5�.,Q���B�El`��c5�33^�t �afK���B����=�X��3���������Y��-�r�k�R9�������J9�-����1fcmC���N��=���/#f�n[*�"���.�
F����r2��yMl�(F�@l��.�l!�LF��������]����/#&6o�xH�<W�$#�>���\9������g���Rjk9j&��������'���8��9�B�e�������;���8bc�Q��H���B�ElS;\OcyW�8�"v�|�Ts��Y���y�9h�����MlpQz�������}�\9�����*[���b�$�g��Abs���J�b�����8�VWP����#�j��=l/�D�0�t�#8�j9���p}w��i��J��g�<������Ilt,����LPL�$F��u�L5C�En >��F�|�N�1��^b[^"u}1|��M�K4�D�r��+d\^"��H�|��
��Ty��/1��;cb������2/��#x�ly��XI�]��;���<�B�d+W���K���p�&|zI�$������\?��.K��W
V��{^v'gg_���]����PK�RwK"
��,part_pa/3_3.outUT	�SZ{�Zux�����n�F���{�����'���NOI��5�^�DX�%Q��&��wV)Z�h�a��sa����8�� 1���~u~�{������8cG'�gG�M<�3�z�d��������������L
�)f�3�c���E��x�Y��sa,��C��k�I2���#"t����$�q;m�^^
����byScjo�����Ew��7_Lz�q<��,�\K>��>g�t
gi<��w��L>;=�.&�`'��?�� YL�c2������Y��tN�&���a?�6n�6�F���e���E��|	�m4IR�����K����?����Oi�����I�Z�`'�C�[�����[��H�����w\���0��~�-��c��d��mtW�����(��$�������?��Q�����WI��F���}.���Bs�t�+};}�����J����?�F��������������i�w>Y�
�-�� ����@��r���+C���(�Q\�����"d�������7YT�����K����=��s�����Azo\+�s[K����r�B>.���}i�
V�a��1��i�5�
�cM�fM���cMh��wb�5U�5Q�6k����?�_�xZ��BS�iG!�J9c8�V!��x�5!D�J{��!��ro��)���
Uv��t���+���������(G���{�w��f�Q�~@�
t�$eT��J���9������M^F���?e����nm�i���3M[�<�J�G�}~��tk
>�:��u<��4<��\
a��3�}"�	�@��G-~A�d�GC������ZQ���0l�$�a!�}�0���[N�.D���E�Z�#����	�����^!(�qJ<H�R�$+1l3��5�k���(8�2�
�~�P�/<���a��"a�B��	�m���l�H������0%_��.��c9@���h�5a���%��#U�	��D
����:�({�����k�v[��1,-X����R8��p�_$a��y�s�1�(*�J�#e:*���wEU�d!�p8��R~��':_x�����4�0%:z�$�2������y5L��)��P�Y�X��D'14N�EBf��T
�}:�eW�h
0<���p5����*zm	��65yU,�����Fa��r1����)CIn	*_���(�3��0��y�&��ih5��EP:h���^��A�Oj���]���]P��:~���y���Q2�R>(N>�>�H���J/��7�|�lJ��.LQ�W� �"�}V�`R�kj����$��,��-r��[V����V
Ce��~�J��K%��:�����z�x�k���F/!�B9��]G��_��T��@��p��W��%���%Un	y�D�9���B��+�6�m�w�{��-8����Jzl*?|�����7E����x�^p�6z�e����<�')$��P��h�z�����C_�vI�;s#	��T"	W�z������9����.���x��q7zJ�Ap��4KEk5Yn�z#�����zY).�n��Z��@�E{�Kxb�c��^p|]���@_��&�!�AjnlEp��O�v�u��m�Z*Q'���'�Y����TT�>)���G)�����zu�C�����W��j�B1Y9]�.G1RPKoM�(<��w�����hc#��	���Nx
E�1��� \On�yt3��;c�^_]������s
�����~�0z���G������}:����M�G��s4Xd1Q�?��C��
��|������PK�RwK��e�C5part_pa/15_2.outUT	�SZ|�Zux���[ms����_��9�3�`�O����I��K�N�L?y	cs,Q�D%���.H�H�g_���o;).<��]�_�\���5�������(�������7/�#�rv���y�-�����j[\�8*9�A�U�Y����[6/����	9����tA�l��0��Q����{���N�d�����b� �_�6����M*\|E�������>Z#�d�R�(��>�F��1BKTq��D���(���O���g���?�T���8c��h�n��Q���a5���Pz�����j�&��/�"�}V��-�]�nw�������m���<�����h!9�����4�)��m3��,�h�q1�7Y�l?lm^�s�H~�����Y~G���m��>[��[ ��1��a��3N���h	j��78�)�2
�
�A�+���z��/���"L��"/�9q���������������/�U��	��0vY~����m1]����~�+J�����ON(?�7�.��{��Jp����P��t�v���@EC�*%g��hn9�	�Z@-����{}���@eC�
F(*��r�!lP]���\#�i5v@����v��I�cJk1Cu`(��W���C+�(�����uy����*�0;:
(�.J�jQ�P)��5����]�x��)��F����Q���:de�����j�
����EM\���|h�P8��?eZ�Qb���@]C�����XGt4`V9�P]jB�M1����<����5�.��9�*�Q'�@�4�	B���}��S�����y��U����){ �������1��*��U6;^�V��:T���o�6���l��>�yF�������s�.-?'<�P�o�Ll�=�z�a���.�����_W1� .=�
6���-$!��A�T���]1�I�,x�:�!R>�,��`����GN ����������_�6�\=�j�S�m��/;�y_A������Bb����c�������T
X���L�6����w�Krh�����'��`��-}�1�&���lQ�M�K����c�����&�y�gr�"�|����8|�<��V���L&�L���/784r��������'#0���f��iq_�w��`��[.@!�v�qf���0���	�B����wg��O��A~Zm�fKB+/��p��{�Lw����7����n���K>��s��0��n������
a�H��:.��	PT�?��g����/Z==\m����]\0�G��L���i�q����i|������UV,�k���_1f5z'�z�W+��AY1����s)a8U*���u�~D��_n<2p���dYB{rc1F��5��:�~�qc>�yh�A�Zhs#���^9��|^i�q��FlW���$�d��'bcu�)���,��J��(�e��v-b���V�&k[p,�������]���
�^p��Q23s:�v��`^6�����(��e���1���t8��Yw��5g���r2�+�u7�rbP��\0[!�b6����>>���/#f�nAL;5��y_������r2�+�yMl���1r��Eh�#6�le�M��������������)����!6���p��RNv%�"v�c��=����� L�}5���
T�
����w"�3�����
-bk������8]bc�
{�d`W2.b���������h��$6����t���9����Y��������m��G�-���s���+#v���J�El���EH@�����=
��(�9���<$Hl��:��#(|��.�}�X����?"@W���|'Q-'	gh]����B��}!��G��q9�MN�%
}@n���x\�	%�����4d���n:���C���;���'��������m:-v��
���r���d\^"��H�|n�����(��p����P	���3�������O�%��%Bc%�w��o#JK[mD�r������%�^y8A^�$�L��,�g����vE�HVo)����=����on�������PK�RwK�A��,part_pa/3_2.outUT	�SZ{�Zux����mo�6���S�]�a!x��HK��K��]�`���u�X�m���6��;Z�������V�/A1)�O��=�Bt?~�8=�K�{���WX\����{�u:Js!��,?A���ZKY]��fg'����*� ��^?���"OG��s�<)�KW�f��/����3!�����Y��3��:hO�~�����Z��cKn���l>:8��$���%��e�)O�W��d����8������|��
q$�|�*����8?�!~<=����W�t���.�W�<���7I>���E�M���A��� �$�l���t�b5%�z��{���D�4����������-���h��NIi"j/l�Ng�����)�����59�
����J�����7�m3�;wg�t���i���_>��S/$S�2�n�5h0�z����n+zv1��iI�����m-���5��w��x����������A����z�<�
n|�&ZK(%x��@	�e��`a��@!��
l,-/�b"Z����n��?y��Y�����.*�@7Y��-5�&��r"@��2�+������W���y���+
��7uu�4
�x�5�Q��5a�5����5�V�����5�v�{�f���?�f�k��KuJYd�;���D����
�J���P:o���X��e6f3]U���*��z��T���d(�O&#K
J*�bB0F�2��AK@��$ �������R��w����o��y�A��{c��k�wm��J����2�m�,),�������m����}�J�y2�on!QC�3���`��9��3�x�Q���.������x[=�Zk8�uMX��*�
8���r��>Q�"�<lND��a�����i|���K�T�f�G��Qm�u�aH�����a�(tM�|����ab�o#���e��X���������]W	�D��2�B�\�aNh��T����� a�X,uM���3H��0r���0�9��#:G����6n	[�
�k�n]��>����->�����"k_���7�"�w�v.�]���]�3��a0m�C��\6��r�=�(`�����x��;O-�u���"��U6L���Ip��y���z_$��zg�������E:����Vp8����r6lRq.��%@"T�$���.�����2e�
z��f�����&q9��j������c��f)N_��� �;Pv���w��
���������t���F�P���Z���GP����b���6C>���C��F�T����zN$��}@� �����
������![��Y�x|��>��P2i��,{��
�o�m�>vL8��NW[6h����Z;S���2��&��y<����5�|��6bT��]lt���1��e��>�	��(�9z���J.1:T��"���V��zo�G�����0�T2`[�����p�ms��b��BO�����z�1�n)�F���;2U| ����#;~���	}��d�o����^���N
k�v��n3�	���������)��U��cK��$]��H�Jp��"'+�7�������&��ZrilkO�IZ_K.A?1�!������U+�Wm���0�\���=-����������RMb_�T)�mU����8T��8=}R�����~��O1�^^��78����z�AI��E�W�$ot��Vp�l��bt��=�sI����=?�O�������;fS��	����|�\��q.N��������qqv�N�~���_�
5�*ra��]:�^��K��7����OI��L���s@>�~-�� �����PK�RwK�M���5part_pa/15_1.outUT	�SZ{�Zux���[mo�F��_��u�;����@��m��^����O�*6a�R%�u��o���I����%���@����>3;3�����������/:a�|�����wo^~�G�������&�Os�Ng�M~a����B����T�^����[:�o/@�	;����t���>��R�Ax�k5�Y,�+�w�u�����;]'�v�m�����������Y'7�<��H��p�i��jv���1FZ���1�:dL���o�������^W�-�K]~
�QDL��Gm��T�]=�d��R�Y/�+����9[�Y����5\_\o���]��?��m�J�y�[���R�3�W+e��6.a�fA(� &���
7[����<y`?�hu��4�a�������6]�	�k`��5��iv'@+��������L�����z�N����R���2�����Em0�������������	{��W���_���������K��6��~E����m^X�~_f��9�@���u`9Ft�D��*�"@���S����*w�-��R((6T�����t0K���zgw��P��j�=@�J�U�>((�b��J���&�=v@�����E+�����y �m��p��6�9�c�P��i�}�����j�);�(q�.����c��j���@mK�oJ��v����������PGAB+cEJF��������^�@�
��M������P�9�~P�R�./��&�c���C1��������������E[�c�T,Bta��;F��R5X����gN����&a]�Y]Xz���S�����!��U�n�l���-_[M:T��4Y�M,$�['�&�6<��:[��`p��G����4�:�[J"v��rZ���C��)����`.�
6��\�$A$IEnt�^�N���BH�d��)�Z�C�
>�6!N��l������mr�\�w��{�S�m��/�d������/k&���3���	s?j��n�@�0���&��u�M����.��E:���-���HgQ)C�y�M����"O��S�S4h�N��?_��Y���N_���m������j�nf�m�O&r���5,W45vE��k2g?�����BrhdX�o��m�b��>���KJ�*��[�>�)� e�T4��Fq[i����i�����d�a���%Dqu��7�m6�=rP����:Oi��C*�#@���=:>�����A�U���%��BE)��;],�E�����#���*���	���m�i�/�E��j��?�:�+_���t��M�GZ��I��#����B��
Y�f2%��?
]���4���8��d�����@n������U��_p��!:T��6��+�g��O+���q��
����
bxkbS����5�W!M�B<�N��w�m)����vQ�'6Ez�H�<����� �
'�[�`6���P���#o������"~@�6,#fv�y������]��M��(�](��v)�����R^G�g�l�{���sC9#G���4���s�e��6]bS�5�[��.��.C���g��qb����&�|�	E���@E�	ye$����	�u"�bw�;��R���>X�Glz~!b�����]7�)p8�f@�!i��}�P{#��������hd��+dXFLl��H�ms0�\��GlG#lq�$(�v)�"�m�-C�Vtu���"6������"o��mc<X���'�r��.�}�X��6�,y |��#����x
N�=����s��'`�q���}�����)����;���>�
W�XC/`�tN��d5CNb�s�It�I4*���!�2$�B1�&U��!�q9�����)nM������ �{/��>�	J��as��� ���;�Dc+q/Q��d�sug��`��)��r���KT�%�Bq����U}W@�/��;%���)WM�e^���Du�D*J�.	��V����r�����8v�q���$�����]>$�m��������k��oO�k����������/X�PK�RwKK(��,part_pa/3_1.outUT	�SZ{�Zux����]o�6���+xWgX��<$��@���j�.Y0�*p!b[�,��~�-�#�,������ ���W/���X���������7/�}��;:9�v����8c�7Hf��������X�4�4;��}�o�������d���e�8:�F���k��CW�%�������s�.�t��J��eW��P�m��EU�E����o��)����^oO�8��������Y4��n�i<��w��L>;=����`'���o�� �O�cj����W��$�����\/o�Y�r��Q6LnNY�LO��a��� �F�$�Y����P��I����d>}q{�F�4n�NZ�Xn��j��q+wd'�37�;��rg1�	�M��-��c|�d�]t������0��$�����C�?��a����9V
��(�����A'�G��2Vjc�y�3���{?�O&�@���7��d0l�c��}?�b���;�B�H��tNP�\�|�
H#7�,rc����n���j�\q_��X��{����"�^��pM�Cs��|�TBV���Az�n���VY����v��>����Ck�����aS�5���.c*{�)w�(���������=�T��;���s6d�&�����8��#�j����NU��<�K����{�X��m1�W����5J���i���<�����Q4b/�SZ#
����W��E(Qi��$��N��l0�#���������7L�2��.�	K&��wm���Q�s��m��G�$=��e����Z�[����:eQ���.�����)��u�2�p�Q�_�,������}�//f�Q�k��N���������	��V�(��prs"�K��[�7����v��!a�IRPtJly��t��"����aj��U�5a��a-8�R�H��FaK���",��C$�i��]6�.!)M+5,��-������h��a��-H�*�]���Yr]�aMY�f�~A�;����P�	�Z�;��]�����k�4,ai-�%ak<$a������k�qY��Z�B�X����`��������p����t���&��B'�u�X� a4�S�_J�>�T����%Lj]u	�]��Y�5[�*K�r	u�+��tX�Sr.��%@D-9z�X.w�<��b[�z�M%:�c���eGD�,7_�wQ6;eZRX�����l0���,�����	��]v�w�l�7{c�����*TT�b�
|����j���,�7�Q2�R>(N>�:�H���K�v��B�
��^������Q�������Y7��I���-�2rx�[�^PN������Jw0����P��d?t����%(��5�t����EV)#�&����%���=6��M�7J����u���������9>H
T��
=�xb�ii��^7�Ee�u[�AV�A8�)5�=)f��G��o=z����u�����������T^R�/�5�O�&�w���W�M��Z���eXm����BO^���H�m@o��Y_�e.`��>���������.r��}�Q*�9z�����\*���jB�#���T�%<1�Ag�7���r/P�wm�M��U���k��O�qn���m*�D���_����9�^d�Fs�]�����I�G�r��]������Q-��q��W�[�+*�yqTtr
�]n�H��������}wbV���Xlm��pw<��
�@�E�����"\���Yt;�&;c�^_]������s���`�O����*��(��.��.���2|����#v�9�����?Y^Ei����3����PK�RwK�'�part_pa/14_4.outUT	�SZ{�Zux�����o9���+�Vx���c�R)��]���r���=E+�`�r��i������]XU�+
�b3���;�1!?������������h����}���t������	e�q�sJcW�u�y3��s:)f#)����m� E��#���NjJc/NX��
?=Du�����,Y�_=��N�~���fym�3[�`�����u�������Ue��C�����6�}�`����s����E�e~2$�k�m�����C��w��H��#�`���Mn�����a���ir�����.����y�E��D�`�st�������!��l�`�Q�JF�5�R�Kf?p�l;W~�3D�_��O���tq�BVs���H��U|p��*��b���j��I\"��ri3V�R�1s�^P��tb�^��n�b���;��>������;:��;5p)���2
��.o9��9{��[�5��S�]��G��qg����,]M�d���+���bp����0<����������,�"�
w8<8�)��
R��m�s�|�3?x��i���l�j���������e��Z��*RL�kC�P���)e�IP�V����x���LC�2��'�&�S�� ��U[2ZIy.S��)��k�r�:e�(Sa#������Y��;��M�M;���|U�HPLX�*��[������Re%*W�����x4��DW7��6]����b-�]p'��p|1M������%����^��gB+D�BX'�b�L�N�+%���$������T�
����3��0���t����6��6^�����[�8��n^<%,�:z�e�^����b��UM�����Jhj���T���H�X7�^��k�Dcu���n�
<���"lTo��%�W4��6]�0��X�+uJ�
@��K4	���s�[��Tt���F�������r�t��B��24�E&��^����7����!Q�)�#|�X���_���|��|�$S?$xZ��������.��mO�:k�8E�����>&�����
���Q@.�M��}��m��#�7��0*|���+<���PK�RwK�>��-part_pa/12_3.outUT	�SZ{�Zux����mo�6���S}X	��l������E�b�a\[����k�}��w�,9�dE�%l/|0�&��Q��#)�?��~q��\_]���������=ng�*��1!�Q���j��JNi�V�e�y5��s8�'0�7������,�P�
�����4������������i�- ���zqyw���q�t��<6:����FO-��|��i8�8��r:�]M��,��^��hY716�#��V�i]�-�d�&4��h�'���~��~��%����}�k���$X>tN8j��9�F6s����_�L<��]jD��`�"���|�pY��j���&5
�G��e��_��IE`����['
,�:����MX��=����
�jn��z���q��TgHJ���������Lo����L7�`gn��i0%��E0Eq0�:�T��A:��`p������*��
�d'
���������U�R�#��,�5(S�C�*�'�]���
�$�?��Z�<�:gQ�*����������������q���^o�d@.��'������xE._�<������?�����h2\�������`�Q4�R2��=#��e0
�E����p�`�>��A�|@��s�)�����|�4�iC�X?ya?a�x8[�db�\���o�<�8h�np��Y�	����|�8�L!�}�����/i|�W���D�1&�[~������>��o$I*3+E
�lA��[�h9hR�?Es����6��uv�\�yK,[�$�2������g�a���o�/Og,C�eM�(��`�
L�\[PT~���N,7�:�5���X�"�p�&e�X������Xg�������P�W:��t^����,vYv�������e�I
Y��q�x���Xn`5,�ZJ�X�E�eq�:e��2FKW,�XX�r'��[cs��e�u�dth�:cY��2*����ef|���U����Xn`5,;�����l�.�h��e��RS��Z-��,K��Z�E����~t�:���*�l$U�,��,kL<�c�Rm�M�Dr�G�r������ �v���d���7�9������d�����*?�X��*���Yv����
���j���2�6��
�~�,y{,�"�� �U��*�����������X�����
��� p��>	�,tV�Mi��Q[U�}�1�
�/6��
lQ6L�ndS��.����
d�������#jN�4��5�4g7��B6���� �RMvA���<�B��p$��/6��
l����kd[���������l��8
��G^��~�AI�	!FJTV����UdW	�+��r~�������N 7��m����Av�n����>��d�`��d�}(���mp�)@eOfew�����Zh
�,�e��]�f��������vb����� [<�����kg���a�<'Oz���>�����������q�N�_	����Y�m���]���PK�RwK��U��n-part_pa/4_3.outUT	�SZ{�Zux����[o�6���)���!���������]�a�S��D"��\Y^�~�](��,����`�&�?y��9�r���_���C>�\:I����U�3r�����I����kf,��R|����&��W�|�������|<�V�������XK�������,I�yq�t����Q<�E�=��&����}����o~;F
+�R��.�oMo/S�Z���$��ty�irW<.�(I��y�9�a�=��|��}�J��W
�"SJj�eJ[��
2��w�>�D�y6�c?�+��xO����<N����tU4��(���tm5���8I��eY�Fm���l�m�f���w�3����J9���%���v�h�����e*t��J��C�����&���]�<I�Nn@�����(�N��Cm��T�R��Nt�Mt�&t�
��f����U0�g3?#�����a<�j(�|��a(1�\(M�0(���0Zk������,Dn�<"�'Q�R���ZWa�$��KYg����	
n�T���n�Wr;�$��������(��)����R`��aT��9w�p�ez�2����y�4���^�7�9}��%�oF���r��W�����Ma6��������e6�/�%8�VY��|Obq��/�F���h��k��Z�8;�(��S�TZc���{M��fQ�����f=�g�����o)Nyp��o���s+J��^�0��P��%����Y-�\mM�������M�9�4�Ygw���hQ����xX3m�*~��n��B��`�!�X��D��6�E�"`m4-X��C��X����S`-6�GamK���h�������r��"j���9�T�����5lc�t��j�Z��<o�y�����[�;���Q:b
X�1X/����T}c
���Y'��V���h��\bn��F��!��%��k�Z6a
&xkcx[�{k�8� �����L�k`�X�
�����kq��t�/��
K�M��\ 7a�V��1�RjF�Y/��"}S���6�����h���B�H�jk�s�=D���'�ZmP����bpVb���FS����L�V��5��
�M��������B��"����j���C`'����j�(�d������"��z��RbG�@�7��G-�w�jt�o������Yj��y����$�-��
HP��
j��]����A�����+���2�^��J�i���9�O��������"m��4w�2F�i�+C��~:�]�z�����kjQG�R�9�rVS��9�!�i�Z��i����9���o����V�QS���Stt�'K�S5��U���v�n:0���a5�������>����C������#�nj���5�P�6���Pj/�nxPcdJ�E�=�n��5z=E��`�WPX����S��JqX��L�eFs�L�)���@�P�Q�z�qun��o��W�vq74���������d�<#���d�E���Oe��	�_�]��gg�PK�RwK�����-part_pa/12_2.outUT	�SZ{�Zux�����n�F�����D���B�m�����"(
��aIT(*����YR�$��H$�
��v������D�d�����zss���VrhO��m�@7�<L��U2�F1���o��O�E��I21}��8Y3��s7��P�%3���)�(Z����O�!�"\���C/�h������}����������.wg�Ug�,=-��}�Y�pa��w��V�p9�&n7��}�M���m�FY���l����Yz�6���r�4���u8~X��_�y��0��cq�w�e�L]������sk��9n��[�(��.�?����+�f,���q7�z1�6��	�$q��iE`������
,����s�os�	,�qO�����V���[�gy���%j��D�:�����)c�9L(�Zyp���l�f�j�t�IQeZc�����(�<w�{E���v��M�
L-��Vh8����6�	6��j� ���k����C�*���������{��SK��Z�eY-�%�Z�.DA��t���g���h0�)���������n��DW���~�������p�~9���]^����GsH%��S4��������|���|�n|����l�Sk�B��X;�i:Pe!��f�[%�|	�	{�:I�}����E���4��G��_�E��r)��0�^L��,>��pq��x���[>�/>f��8�o I23)E
�;���sb-G-@&��h�� �K�y���]n���X6�,DN�mbY�.O-�6�m����7�Y�e�/��`��w�f��t����,��&���5�F�5���,��X6�������Xf=�L��rW-HQ���0�N�\m3�0�7��!���	��iOm����ec����t���Yna
,�UQ�e^d���MYf���!Z�b���2�����M56Mkl+-T�4gYI����������e�{Ej��X+z����,���$;������2/�h��e��Ba�W����&�H��j���6�t�?�e�e-��e�Uy�@D�����s����,���j)�����<�e���Jy������e����/id=��${:,@�e�27�7�U�_��p�j+XV~�������fvf��5���v��*��NaYY�h��e�G�5(�3���O�|*<�x�,�3L���e]����a����e���/�=�����YnaM�21�u��.���bY�������aS2�S�y=�L��(�Lm�2'��n\g��lJi����[�2t�k#9���.6�3�-��di����f�A6]��������{��U���=�e� sk<����<������d[[�����qY���fv��5�Ly��A��������Q���^�ID��(&�]/C������*1�U����k�p�����
���@�+fO������T^s&:�����!G��3�Mg���^!�<��z���!b"�|u��v�u���G\S`Q�?������PK�RwKb��z�u-part_pa/4_2.outUT	�SZ{�Zux����Ko�F���{�}�bg�-��6	�:A�(z2T��	K�BQ��_�Y��E	���B�^�������$c�h����?������L��._e;c7�4�;%��J��q�8�� 
�������o�8{�Bu����l9��,�FW���p^��L�dN�g�A�o{����{�>��r~}�F��,�|ui���J���\]]n�z����?��K�q�.��'w��y'i�=�:�?��(e���EJ��K�h%�j1c�1��w�>����d8�E�S�z������.a��i�$�6]���a9Z�vm:8\A���p�U�F7Vs�^��6I��R�U��B��4��`�[c��$w�r��4���C���r��F������u���>F�$�9��V�?�MQ���A���4��l�e6� :���dSh;�&�����1[	�p2�&�z>�f���DiJ|���D�
f���X���u1��L��E��

b��4f�'���d�h�JcUBA�\������,�'4�ou5�6�6��nG�Kf�������Q�sQ��V������=��m�����&n)��>�dQ:`��%�����^�7���x)��`~�`��~ek�~��i����/��]d���VB���r���d]�)�Z#�B��)���Zbt�	�(���8z*��\��{���&�,��hZ,�1z&�����a���WFU�z��������Q5W��PH�K2#��'�����H��C�?�%SJiJ�N��h���{O
����z.K�����X���H���+����M�6�q�Xam���X���Bs�\���D��Fy�q
k<k�"��5�&U��5����7��H��������58���������h-�XS]a���-�(eF��V���$��5��VVJ���\�Z�e�_������Z���y���\�[�pM�QI��(Wc�j���n���V�������B�T��$\ru���/e�^7�R�#`���V
�����M�M�:9��U����j*��w��VU��aH�T�&��{Sk�-9���)Q���2�'I�Ew����A�����X����Z��"�5���Z��CTT�pt�3��/Zwam6����*�-X��QPe!j��P����Uc��h��hSJG�����
��QZ���������qW���6�]�����N���z��n<1SU��J�aMC������Jk����5Nk���=(ZoM�IU}cmOk��_Ek�rb2C+���)��`��i�nce�u�L����<�UJ�
}�[���^�I3�s����_�����v���d��1�h��.���9YK
��C�7��
j��P���]C-��<E{�P�c$�~
j���t�z�!8��jzPS6�}UW����e(������K{�'Q��X]=>�^��GQBB��r��SpkN���1N�IAk������/��
�IY�?�"^
��#�a��F���8Xd��{���YL[_��Td����s_���8;�PK�RwKm��-part_pa/12_1.outUT	�SZ{�Zux����[o�6���)���V���Q���+�Y�a(�!Pm"b[�,�6���P�|�-j,a{�I�81�}~<<���o���y���.���J�����3t����8Z%#k�q�R�8��Q�9�$��h��u0CI8w#������/Y�eEK�yq������,����8Z//��bw$�~t�]?0�T����=�tX���:D�p����o)�����rM���W�.��&�d�0��6.���F����Y:f�{���dM���:��R�o�<���0y�c��w�U�L]���f�1�E;����s�����Z�2�#��]�B��`�p \4��
��������:�������%w�/�x0��K��u��o�d���q�{C����Z���G v�v���*�p!��u@�F�KF7}�PD���`n������r���(���V�}���S�)j'��[%n���kA�������6�	��`L�Ta�6�2����������}D���E��P��%cc�������Z���ZHP�/�����e8K\<D��.H�]�}��?�y�
]��y�����������4����������8�C*�x����m��.\��o���m�k�}�z>B��Z��P��k[>K��!"d�~C?a�$�/a2a�\'��o��]<j�n`��y�	������Qj��|��^L��,>��pq��x��-�]���>f�?�7�R23)E
����0'����	���-`5D��6�st�����eC�'sK�mb��$
�s +g�<����,+�xYm�U�Z�=^mf'�[X�J���F�5���,�G��3�M�e-}��z`�B������,s���L��&�t���X��,[B����V��)I�2S�[r���N,���a/�:����2�4/C���2��e�)D|N�m�cS�������y�cJ��X��L����e��g�P
k�<^mf'�[X�����eQdYt�2DK_,�>X
����0U��4�W����9e2?�a��,K(����V���Dn9��)�1b3;����H�Vi�yw �"����l{Y�2cr��K��/* j�����=So �r�l11e�U K��
�v���N ���
���KAV]V������J�R�=�B���A�)$�J��6�#���oQ�0��T(�#,^[2��;pt���N ��:��5����td�iF�J���d*��^�p���8,^���8�]Uo��2��y'������N #��fv��5dd�y�5�)�l��@������ ��A�V1l��]�����SVV�w'H\�we+I�e��g�^�4�@M��W��	��te���L�E��1,���Z�ve{aY�U�D4�C�'�0�4�1��#F��|)�����rYn����D��kA����	��t���01SRzB���.�/�="ByN���yQC���
�Wg��7^'!8�����:���l������PK�RwK���<�m-part_pa/4_1.outUT	�SZ{�Zux����]o�6���+x��"yH��R ��b[Zd	�aW�g��reyM��w�A��lY6���D�8�D���s>H1v���_��a�7�_��=;�|������8c�|���+%QX����>J��+�&�fW�}�����v�d���e�8��R���X��Gn%��~_��]�g�c<������&����c=��h������+���j�t���e+���^{,I�Q:{�<y(>N�8I��uqq.�S?{�R�9J[TS>�f�
-�(k�
�{p�������$�����I4�1�����|2xZ{I>��~���d���b8*�Q�H��IWJs������T������h�0��8��4[�ZZ���$8��R.��ra
;(���$�@�E~�
{����?G�SB��:���b\��q�R����?�uQ���^���3�So�Ko�J��7����n��.�Y�l{o��(���4��z�R���)���.�y�{�=���3�(X��%�e����M�>���$��*�h���S���M�&m��`��T~���4a��Wv?�OX2��B�4��sQ��V��BDS%�A�t;�����\wWY���(��;?�`�N��{'��K�B����o/���_��u?��]���?t}�v���S	��yV�g��It���;#�������Z18������0z)�1���G_F�$��h\�9z%���������\BX��V}n%r�����Z�j�	(U��L���Q-�]5)}���G�dL)Me��C
�xZ�} ���a-�z�����������1�@�,���D��6�a�a�E-XS(&�C	�Qcm)�;E�AkX���K���5���X��aM�$r�3R*�6GkC�(�_�.�
��Gk���2���C�[���)G����iFk���ZY�#�j	k������b����X���Z8�V�*/�kOq����p
��m��������V6Dkke[�Gk�$���F�����m���G�Z/a�XW���P�J�Ww�
K�j}rTS�*8��
����(���|QZ���j���r�!Xh��)LSi-�V�T������;�f�j��Zw�Pb

��T������X����2-X�7P��o
�	�T�n�mT��X;�Z���B������i�����]���4Qp�q	j�+�%��QX��CC�'������E��)O~�
��@iv��_hk����.�!T[�Z�F���5/w9@���]���F�# m���{���E���B[����������m�^���HK��>@��E���vkkj��o������s��.�V���loi�`�1Ro����i%
��h��h��;=��:����-��2��'(��6��Z����~m�H{J*[���O���J���z+�o���#�~	j��;��MgZrW��jzPS"�}��in9��6�,��
���@�"uxm�������Z-��Y5X<�}2P���&-��"���7n�����x3h���s�O}C�1a9���/�`��4���$��������o�]���PK�RwKS�X��1.part_pa/5_4.outUT	�SZ{�Zux���Z�n�8}�W�-�!�/��@�Ng�����S��D"��\Y�._?�Z(Y�U;��P�ZV!�{���~J�������w�?g=���~�v����8G�d�.�s����an1n�(K�,�)�O��siN�I4�������+a�T�C��2M�9�}z(�=C�:�z�����V��j|�je+���}���rvr2��nvC���}�]2q�y��
����(Y�|@ghe�$^��e��BC/_]��,���w�d��t~v��\4_��w�4�����L�X���hw�M�.������r�
�V6�0�<��������(%Q�	Nn�h���6���bF*2��a�h)�����6�t�
�����x|�a���*�&��\������JU���;����U5\��j��I�{�-��i�$���~-����{5?DY��D��M&�[Z�Z�>��a)
~����&({u.��F�Y-��3��ZE��sUhB4xF�4S���w�aF}\z
���h5�����l�d/������r>�w��9��i�2��?�������r������em|�|��D��;�����Yn��YZk56O�c��`�#�b�Y9Q3C
��u�V`K���J��������T.M�n�.�s�5"�*��	�5���k
�v���(�RJ������g����3�G	J�V�Z�(lh#���Op�c5�1xK�(0o��D{H��:�$�!^ne{��C	`y�4�E����������b,��'mJ	��������.�`�& �����-�k-����[�[8ojH���=r�"�J(\)7-'b���p����3LD#�V�Gn[%De������Jl6��z���d�n�[���VnE8�no�����6�pklX#7�D�]n!0�f��H�����n����D�y%����b{��30��������H�}v���LbUs����
*���1�p��Q��B��&Fo��40L�V����������r|���Q" rB�x�P��/��}\D�n� [�m��#+����5�~/<�`
J!��4�.���l���U��:�,�Zq���yl6}���XV�'�B��S04 B�/��
s���4����g���K��2�{�V^l��Ej��b������4w���Te��o9���cH���P�9���j���{1L��!���{-y4�����_��^�{����B��
0AW����	��-h�����i�&���-�zL��j��e���O�V��S	ij N�;��6�*b��z�SJ
NaGn(m8��.e�E95�gb��-�v7Q{�\�5CP���~�$pJ)���5�7�jO���^�����NU(��6mO>��#i�	Z�������_�S��@�v7�%�5C�q��Oa/H
�)���������S������D�OE]�J�������7,��M����U���b���=�?5��5C���9���)7X����T>�Hu��S#����GZ���P#1�mO>��W��
��S~�s�v�59UJu��=�+j�HH�G�**g�����t�P�R��q�]�SQaa��i^�k���9��#��fA]~0�9������.>��.���h�\�� "�b�g�R�����D�~I��G]�Mwb��������k�������}G�n��������A%�z��D�r�n�,�rC�;���!����w;,5e�d����������>��wQ6���"����v��Q_�
�}��7��x��4a����m2q_Ka��8�E���M}��z*�_W�Q�A��\o��S(
���;���-���[|vXK+U�.wV4�^|�{:7�1(p�����+�� ��3�2Z�Y��0�C�b�Q���d/k�7yu��~�j��#P�S%������/�$z��SF�[X������������_)Y<������3==:�PK�RwKmu���/.part_pa/5_3.outUT	�SZ{�Zux���Z[O�H~�W����_���nw)����>!7��"�S�����=c{�����> ��:53�9�wn36B�D�����_�����������s�t�L���a6+N)��j�����<�>;��{2*nO�9FG���GcT$��T	m&Z`�~�8������x�
��,��L�j%��F�����*)��+�1@G����h��qR��k����}��x4��a�~CG��p0H����4y����<-�A��wWoZ�|���l4@E6=�@�q4���~�'Y�3y��Y��x���8��y6������MT���le���C�e*��jH�x*)��PJ�"�R�^��$n�[O���3R�AH��F����6��I�6�������<��1���
��q�d�����T���[�*��j�������wq>C��Q��`��f�y4O��Nwf~��"u��f��\�/X]���].����Hg�j��]Pv�\��N��,g�3KRo�"�p��*-!(	��e���5�m
3���9k�6Xxk�%��`Q%�fo��i���|:���S�F�$���?<Fgo����� =��u�����xk���%������,7L��,����'+�	m%�%�P�B�5dg�Vq[�-�Gq^�Pg�4Ow���h<���l:���!�K&,���0, �y�am0`�<�'�x�5���<U�����a��,]([+��
m4�v)k@O��LG�yk��}��V!'�
qz+�	8���)���X���T�������T�t��c���+�z�t1B���Wq���7q��?�C��"�c�#u�z�U�M	�����7�����<�������V���Fo�D�Zc�r,%ir���V����^�oW�u���7�NmF��5E������fm
�7jS����6p>��I,l����1>(yOj�uj[��6Qh������hNm�hC�[���6^B9��.��`��$(9g�������H�c�)�j#KL���t��@�Z�0��A���4���z>����Q�H��:*��wO��}�E7�A6�l��eK,�z=��+�;���A�gZ2p���Q�b�����.�j�����vs��m����W�BH�8�A�%�dD��
�Xp�=@���|�5�C,�:(�8�P�
�i������~}��O�E���O�������):��+N�����=�=��~/��j"dP��|g���&S0N�������x��P�\&�"��x����0p�mO���a� ������Wtw,���/w��J�Nk��TB�����98�
���p�^����S
���
�t���g�����F6p�>BA��C������NLsZt���Nk������i�x	���	8���z*�����s�S��v��*��S�LO�������Q���z�I����������^�8���8m�K��T������8u}/Q�&��B��i-]�T��{��6�p��0�2��{5wq���

qJ�K���+N���Gj�h�qJB&e������S�9k8�|���<9UJ�u����+��	i�H5�Tc	��I�F�\:x� UQN�v��x+*,�[hZ��y�`D� �FA	���
��������$\�������v�s��qr�w��@�r-�t-���TvqI���u�:��I�Ws�VR�=��_w��M�����y�W
�Q��#����6+���P����F��S��H�S�5�n����������7g���N��Q����2����Ei�Q9_�
��
��������?�?����Rv>K������K��O���t9Q%f���{��q[l!�Q�
i�������zZeZ�rg�s�������]��8q_����+�d����V��%��8�/��siG��F�]�W����`�n�=���E����?�]���"���[2@��
�dv�����yV����P����o��Or��PK�RwKH~��/.part_pa/5_2.outUT	�SZ{�Zux���Z[O�H~�W����_���nw)����>!7��"�S�����=c{l�ILqG������;�B�D�����_�����������s�t����q����[A5��b��Q�|�N)�M��Si��Q0����,<U�����C?e�$s���P<y��U��.&d9���V�jb�j������l1;:�Fq������ux���$���h����!=���,��-�<��q���c������"���6��P��O.�m����O�,Ia}&�^7S��8�F?C�!M����4�	�p��l�az;x$o����#��SJ�~�R�Q��:fa����5����B�U�z�U�Y�t�����Z�-��]��f�k�� �
�%S���U�*
�oi*��
�4����.L3�y�qv�m�>���������4�@�?��v������[���Lrlyat1�l�e���l����r��?�(��(B
h�Ua	��7��e�V���5�(�(S�Y��P��E���Ib �����|~�@����Q���?<Fgo��E�'�����e��������KZ�!�����L��YZk5�}���%��BVW�	������PCvFl�%��t�%�0��I�t�U����p���s��	�X2a���z��(���u��IR1���0���8k,��y6�z~CW� FI�*[+��
m4���8��RZ�*iQ
�[����So�
9�������;���;B4zE��z����2L+�9!�X��7�5�oF����X�[���o��?�
AIR�xc�Xo����� �O��=z���
����DlW���Z�o��h��J��mo�d��o]F�TTj����+�M,���so
�oV�@��������P50�fn�
k��P�{�vnB)���4z)�N'v���U.�&�Go)��h�E���9�����J��K������	8�A*1'��[r��w�����H�]�
��a��%�I�k:A��[6��j%���������.���D�H�� ��=�[�%n��|l�����X,�zJ�{k�#�B
aph���O�
�R�=_��s]p�������Vs��m�����*��BH�8uAq�[F44|����X#� iTm�g��P�*e������(y��T*��~}����y���O��O����):��+N�����=�6=��~/^��k��Q��;+�<���@��/��.�3���wB�r	��K����	������Cpm���	H��6�"%�+���i�D�����p*�L
�iw����SEL��^��:�����S���R�3�����m#8�n� �B�VGt0z3���:�t�{�O~�������@�v7�����#�M�WO����\��S��v��*��S��@�v7�%���^�iOk�a�@
��I�#��^�8�z�8��L���A�T��H��������
�����S1X����@�h~<C����+�-��F�3$�/=���85b��{�U�)�;����{��S��u�#q��s��<ON��C�9t������o���#U��T*��7����K��r���������R���Q�m���3@��V�Q=��������$\����v��v�s��qr�w���]�#HV�R>>������<|/�W��gx��+U�N��g�uH��;�����P�q�<����Z���}.�M��A]n����{������(�|�m���Rs�Wn��������A��:���n�tOnF��e�t�.�{����������h�O��R�E�7(��_���+@`�����������6c���Id���1�+>�����Uw��3��{�;��1(p�e�T\'��3�U�b�4�q���)F������xu��_[v��pw:��/(X������%����J�Yv�����EV��������U����PK�RwK(��2.part_pa/5_1.outUT	�SZ{�Zux���Zmo�6��_�oI� ���2 ���uE�b�)�l!bK�,��~��z�d�V����"��e|�����$!��������won��?�rv���y�,����i�*��p���Q��I�}\]s�1����^��hZ��9)�E|�%��:�)
�f�<�����Z��;B>d��dRU#A=6�����*)��%�<!����b��qR��[N����"Ng�l�'��|C.���d��1~ Wd��,YM�uZ\���_}x�������fRd����>���R�_�E����^�C�-^'i4O���y�^�����]T���a���<`3���1�U��v���r��JiT$YJ��4Z��~�	�"�XMc�@���h�j����s�Z�^'��#��H_���>�7TU�
�U��jq��J�FUe���V�G�?�����Gi����o�u:���W�]�	��1Z��rA���l7Z��0JS	���t�.�u��P����W���%i��f�
K�.5aF�V�L�Z��EA��C���8�8�Q%%��Y�����j�\>���5��y�t�|�$7o�GU��"��D���Q�����#�i/���u��:',���s�����o0k$�y�ACA�����m�m�m�-�gq^��l��k�9K�$��<�������"�2\�
�C_��������@�8�ct!1X���r�����!��|�F)������j"��?�3!���z��z�v����6g!��������r�(�������>p��Tc�:6��cR��S����bd�r��}���_p�
���`ok�2#�V�.�������$�L���<ls~:n���@�lq-�����0���v����6[��&4��o������!B������7��o\�v���Zhqs������J#q)���5�#k�0l����Xi
pk}�*;�fZ��&�������.`�O�ui0���[�Jx"���4��UL�E��4���m��N1��~Ko��8����n����!.V�����1�ET`��4U�}��������@[l�{��c�i�Q�DX�Z�p�<�9*��}���'�U��I���_0������q�����R�&w�����_���	c���&i�v_c��q��1hB�F����6h�4<5e�'��y��������9��&����+���|2�_�����������&���Z��h�D��_�^��,�/���=������O<#}�:�0n$N��0`�����]��d8�2������gNk�T�65��J9�-���P`�X��-������H��.�T��rj��=���(����������{N
��6��T������a$N�&r�Cr$��t��c>
�����Sa���g[�NN5��8�W����� �@j���G�!�*p�V���q��m$N�'�~*��W	��~N���������F��i-C��-��Omp�Z$�a�����yN�����\=�H��S+����GZ�:e!G-��e������B����+_'�Z�����}���y�$)7H�������'���<:42@����iW�TT:��:Ki�j/|�8�w.�4�V�y��X<�tT1!��NG��y��������Kq��:8z���b��J���y�_�,\y4���q�n���M�zZ���g�aI�y|5Tm��O����6;�T��Y)����Gs7rG��|G�\��Q��$p���n������zy�1����>�G�t_���d�+}6*���!��>x������W�O�,�T�]����$�O���G���|�TVK�����B�JH���:'�m�����V�V_�l�h�<�*�\��]
���Ob�F�����`6�V�ucc��(�h�X��^�j~���w��E��[n��@�j�;�-��G��%�\��|��X��W����\V���9F8��_/��uyv�PK�RwK;%���-part_pa/12_4.outUT	�SZ{�Zux����[o�6���)���V���Q���+�Y�a(�!Pm"b[�-�6���P�dG������F����?����F�d�����=���|�_+yhgOO�����p�
��:Y���XP��K�V�������$����@�`o����q%��a0�.m�eEKx�8����s�^��`~s��*�,/��V�.�]��z;�~lt�]U��Z2,��}�Y�pa����n��p9�&n���}��&���0��6"smd�MUhK-�����o\<�&C�q�����7n��C��������*��n������sk��9&t��r;�������V�n�F��`�p \4��
6�������:X�!��%XO+K�L.�x4�(!���K/����{����
�jn�z3���`�8��4p��P��������.3&�PD������������-&EQ�i��n+�'��V�E���u��M�t-��bS�L%��L�o��\c�}��I]���U�O�;��-�Ba�bk
��e�T���QCa�c��6���,v�!vA�F���{48�����W�����n.__�s>~X������..���z��9���������������������}��^�|�����)���k[>K��!"d�<����u��0��Fn�d
��h�.���k@s}����5�4J-������������f.�P����wKo����������
$�Bf&�H!>+��BFN���	H��-`5D��6�s�p��^��eC��B�t�&��g�-`����_Oo,�2�
+^V[���5�j����6��-��e�tM�QgM,�"���M�e-}��z`����Z���e��v��(	��yY���!��(T+��`YP��aeeP����vb��5����0/�"�� �U��.�e���X�=��9���6�����(��d�Q1%mo,�R^2+��`��Y�j�l����N,���$;�������e���X}�,�y�j�j���Y��h�5�Y�?�e�e����{j+�2K�#�w���M��rk����R��,�,�n����e����������eYi_�j�=�y�����eU�/[H�e�,K��E
�69^mj'�[X�~Y��XVE�U�,C�����c�l)fy���$���p��6X��A���e]�/S�/��`�����j	~���N,����21�u��.��;e��/�u,S�M~��5�g��F��A�������J�u�u��)�e���Nl���k-��0=^lj'�[X��*�y��eS��LA.�H
:}X���t���a�����5�x3���l{pJ��
d[	�-���+��eC�������v��5�Ly��A��V����(��zo�ID�i�g�,�p���kb���))���Mg�,�e�l��Rz����vb��5�����2%�����p�	��q�x���"�!����o^�1�[K�|}�^|q�M���[\I�0���;��_���PK�RwKqp,~�p-part_pa/4_4.outUT	�SZ{�Zux����Ko�F���{�}�bg�-��6	�:��(z2T��	K�BQM�_����IQ�J.���������\�������������Q�g'������p&��N�Ur!���r��%H��dq�eu�K8M.P����$Y�g,	�� �������.�(Z���Z������Y�-`�h�������q4?]Z�����;�.l�ta���������i��x��;��]v���(����S�?��� f���EJ��G��.�0�m�'<�}L��y�;�IDv3/�t���;��������t7�8	I�m�����t:Z�v-:8����f^So�.���������m'�R�U�z���+$S��z��S��NJn�*�Ey�f��y��8`?��$���:�<��y}�QLc�F<��6D�w&FF��T����h$v
����N��N3%��V�0����\.����'Q��>ObSdA���N�.@;-;���L��*	���:��`���p�R2k9_`&����
�%����-�H���tZ���3���,ZT��,�H��Q�sQ��V������B��q)]!%y���y{�� ����i���^�7��9x.��h�~z�.�e��T���6&�(���SiW�x�������L~�-Zg{J����>���G������s�>J�m1
���X���=�_�f�"�`�O�1x"�����a���W���[/=i	���?/�j��= ���hA��*1�E������v7����j����$���=)��Z��,�I��kK���u�X��D�pmXc�Xbm���X��D-yQ����R"7D�Qk�����X�L�3U��5k@�R��H���������D�-�TX��ek��K��q�%Z#��6e����0��5��V���k���<k)2��x�u�������Z8OU|�SK��SU����R�D��-�[�6�U�����ZhK�e�����
k���2o�k)��VX����%X�<X������V����1PkmZ�VT)�w��<��7��N���6=[rp!��p_��H,�j��T�
�u�j�?J�TU}c����T0;U��R�`m�(g7�p�P�B#����j������B�K-K
�=Z;��2o�jP�
�����TX�j]+��7}Cm5�w�U���
�Th�r)M	����o�m�v�*C�U�j��W��K_��� ��Nj��=(R�jY/���K��N����O�/#�m�-�B�5u��T�
����v�U5V��2mP��!��+p����B���B�
j{���m0�az�V�7�nxLk���v�E�{t�nB
�`��>���=P��@]2�)�0��N�V�Z�aj_;h����P�
�}
j�!P�<P�:�����	�����*������TRk�w�z[-�=�!�������m�<���s�&c�f�[e(��N
�8�"J���M����x5hg�C��O������W'���`�NBZ����X�����=������|PK�RwK�����:mpart_pa/10_3.outUT	�SZ{�Zux����[s�����+�f�cap�h��(���u\����'�z�J;^��������IPK��\@�R$�p�%@�\�sN�������w�N���|����<;�?-�����E�$'��6��R*A2��Z$������_W�
\���d6�v�e�-n��%�)�Py��Z��w�r������n��'��6[��4�h/?��^��[�n���vw{r�\��E��~$h�1�������n������������j�-��YR���b;_�V�)�Z���<�4[�HV�l�^�U���O��6�v[�ww�^�5_���������$y���GRqB�F���b�������T�;�E���jQ�q�[��<��'?�|�C��_��f}u�d���7�M:���2�%�]o�}�?����O��l��-M~��ww�����z��N
��\Q��@HhM
Q���j@9�I
*��Pi��G���
"��>�:%�B+|����|�
��Yv�n@��k��(�H0�q��!j0�b�x�0.����a�uai�o/���������f(��_���t�M�.g�U
v���z�[�oz~h�v�� �N�dV��D�dFpiU�l� �f��#`�Tb��&�)-��K	�	/��b~)�{m�B����h�f"A4D�)]%X17����I�f�5��
�A���	]Q�G�L5s
�l_Q��@��wVZ2�f���	C�%�UT����Y�A��hj�#(%B"M}�C��p�}^
^�k^<��Q2��c>���o gE�t�2���\@�e����2�s)�~<�U�Mn@�LR:�R2�R���#*��*�����a�J*��i��aA�P�Q��2��JI�nsx��L@
�C�����!j:�~�Kc<w�|&5%!9��x��l����o�S��~�����{��l{��u�XU>���q�8N�G��k��3 (�1��J����q�����vsE�G���
,�d����^j���\x�>x���e6@VM���91����H���`ai5��@�M��Z#���R��;�W���L3�Z#��
�O����l�����g�e�L.��@-��aQ8DB�C���P��������aB��1X"C�Q�V������mHXw �1Sk�������n�l������v sA�B ����M���\@�A;G�NQ��I�Yz����[�������:��	>FF����-��?G�/�g����<���@w�4V3������|�J���n�i��#7
�b����V�D����b�m)a��ac �*1|��%����j�^~	7�7��b��@�I��+;+��E����>�6e�wO��7?&����^�������_x��6����xB|���aL~J�1P�`��w�������? :�=��Ne=$����
L�k����bu�,��~�l�I�^��-��l n.���v2v
�������7twB�W�}�rCZ�p��!����������&�v����kh��;0�Ow���&���W�~M	In�(�����ZUO|��P��g�����$�B��(4
IO�w��!I}�t1-U�\vBs�3jc�{w�1I��$�I:���B2��=FKYJ=2$��t$Y��t��AoBBr�Oi��I��$�I�� �V�^M	�IRH�!�t)����ve7���T��$�B��(4IOJ��!�����R����d�.�8"n��d�;(��>Y���I6���B�)|��h)�A�G�$�A�M�$�]�-��		��>�q;$��,$�S�d[�{5%d&�%'�J��R$�j0���vS11lHHr/$y�B�0�����nDHr$Y%&�t7$�]����[�.�2I��$�I>���B�u|&�R��R�I��$oA���(\i�>�����S�cA�{!�cA�?5H�U�WSBf�D
��c��yR&)����|�0�<$$���G�E�5�C��I���Bn���/�dv��I\�����@R���I�h!i���d{����#CR� )Z���$e��AoBBr�Oi��I����I�� ��&��������!Q�i��/#-V
�V��u��V����QhkM�Pw#BR�2IR.��X��qG��$��xS1�[��2$�XH��
IAL������#CR� )'M����*�t�P�����I���A��&��)Aw�2������oML1�&��!j��O�! ���T=
��M��|�U�lB��Swge $�w���c������$�� �<�,}�P����T�>Q����H5���Q2�pN��0�D���$
�H���iW�H�c���H�b����
���.�q;#���*#�Sc�g�U�����}�/@0C�M&*nD#���K	� ��X�d��2R�(���o��ua!���0rtl>����$�� k���iO�!�c�T��\�g��0R�a��H�h)�����#E)��){(Fj#�$F�1�i1R��y�P�������H����1�s�NKSr��"	���9F�`k�5#�!-���U�jF/#M�B�X�����h��Iu�5�i�YdEH�x�z���1e�,���OP{�"��%�y���z�����i'a��V$���d��>]e�����G5��]�0�dAw�u'����h�|4��h���a�>l
	�*g�������!���
���1p�(A��Sp�'��HB���2;Umwb�'���D����9j��[�0��7c|�������c�W�U�������{�����-��a��U���_�~�/���(5%��=~�L���*H1�8�S����3�R�hM�i���H{���6>�2���Q���<�k��������s�m�Ng���Y6�I�qG�%����<��o:�o�t�E ��vW>9|��E%���pM�����~���U��ZxA_�s
�U����A�,��X�cDp;�g�>����4��(�V+�������)��&���^����
D��iys�Xwo�'S�@����Zk���v����u�Ap�����6\�[��k{���b�E�O�mG�NY��=S��v��gR��l��	h�E�l�K�t��}�<���l�^f?��v�������b���Z��M���L�9GD� ����zq~j����;h;t���wfOD���Vg����X#�Ox�#}�[�f�mB��f��S�t��Q0��*V	��d���]�{f��R�1l1������E�~��!B[�)����A�����<���S�I9/���#Pk���D��C��Vh�75��c[�{F5��G�,���y��'��g����|�+i�GL	� ��OOh��q����PK�RwK�r�1*=part_pa/6_3.outUT	�SZ{�Zux����Kk�@���{�}��3�6q �i)�����S��Ee��e���wdy�J~��!�hx��zF�i^b�u��o�_~�����W�X�����JfI�X�/��rh�s��)�E~��O&�t$q����XE)+�Y<�G�9�fN�i���w�#���3�!��4y�����"����x�&��f��j��?F�4^���jW�W��A ����!�����x�d7i�e�d���W�*Own)u��EBz��<=F5u���X�����0k�Mo=FC�(McRd>��I�BpM�u�jn��,V�	m-�~�q2:��q�Zi#�~F��|�&��:�2�Z&�K&�i��I�n��d>!���3V������>0�J�]����������h��
��R�]��~?��d�#v��St������~���v����]L��������"��I�b������1���e�,��*+Ja��^8[/�z��*����]�EEb��,^$���c�&l�d�0���8��1I�	��%W0Y�E��xL��I��	,r���00	_z�D�A��$����a�gL��I���H�%oR�P�S��^�� T�	�w�����tRf�����v����I��PcB"�a:�i`2[AO+�����������;�5&�]nzL���nUz������P�'%�$������i7������\�ka2�:�ig�B�7��O%H�-P�����|�o{�I�vz������-1�7��%&c;L�c��������	��Y���J8UvN�r��SIn�mf�������!z��g�e�]>��U����%���*�X}�zPK�RwK�;U�<mpart_pa/10_2.outUT	�SZ{�Zux����[s�����+��cp�x����f�i�i�4��SF�Y[c]��M����h�)��v���hp.x�$�R�����'��\��c�ON��O�I�f��eIr:]m�K)��!�*?�d��msIq���:���,9�L��d�d�Ez��fq��}��|h�Z�o.��	������6�H��M�Z�k4�d?��__��-'��er��.NO��e:���'����/Y��N����i��)9}A^\\,���l��'�/_�6��v��A�4?�N�����d9�f�%Z�:�����:�l*u���e�[��b�.3;������j��I�	eq�������OR��`��,��;�E���o����?>����j���f����$[���Mn���&��/�b�������+��f��|�{���^m��nn���$K����Q������=��2�RF�p�$b���{Ua�B���GR����]h�o��^��Z��<�n�5Hv}��% 	�;LZ`��*#$�j��a�c�*��2k���M��z���Wo��E���W���w�z���O����w��d���v|�
��d��@(�����A�(,9g��`��"�t�%L��H��AC��H	^<I���IX�H�x9��cH�(��kS|���Ol��HB2��D�a�_��I�eJ�����k�#e)t��2���C��N~��k\���Y�e��sQn[f�#B��e�$}Z�/h*�#(%��Q���7p��{���F��p�r���[�K��<�u�a�s���5�4�ng���Z�0e(�_������(���FI�F5�t@��_�����>��>k6lN�nL����HJ��;6q�
�e���m:3M�!RN��:C�R���{\�������AX�������:6!�3�
��j�H��o����L6��_W�e�����xdI���������1cWD	
�2����vsE�g���,�t�������fr�{���[�"r&s���k3��*��lL�1��+P'j�y��U:����fc
1��ig�Nj�&E���T@�glE�-�'�y:O���A/��a�p}X��w&M5���[D�,#�1XI`��c��Y�i
t�`
����GiL�����L2c�����#��1zS��5���Cw��1�S�@o�M�^'o����h��;�t�|0��TR��>�w����l�S�Z��p�tV1������d�������|6En���Dr���N(,�NL�)%Rp0�Bt�&
K�c�T�.Y����r|G?��Y���������]&/�1���slcv[{�\��1�?������'_����v�6�dq�	�m�Oc�;`����z��������>-�����
�*�!��U��`r_
-�nf��dv��Si{N���:�`�UgC�����N�e��������	��#z��������"��6��<�t��K{�t�N��z����:|������v�����U�[SB��	rk��U�C�IK
x@3�@�r�3$���C�iHzVE���I� ���6Hbf�pI�Jw�1I��$�I:���B"��=FCYJ=2$���H�������d_�R��I��$�I�� �T�NM	IR"���(J�C�T� �B+IeHH2/$Y�B�0��l�nDH�&$�sj4Q���vg�m��Kw)Y{
E��%����dC!��,$��G{����zdH2$�(H����������d_�R��I��$�I�� �T�NM	Ij�!Rr,V�Q|����*�(v�����		I��$�Ph��}���F�$�A��bbL�$�w)��Bp!��H�8��C!��*$9����
e=(����>H�$IH��������		��>�V��I���A�����2�$RH$\��m��I��	����50�<$$���C�E,HnDH�&$2n��&pt@���Sy�G�.1�I
$EH���O�FIz y@��!)|�
H��$-o��hBB��O�U����BR���� ��)A�[9H�r�v�|Qj�A
GB��^�n�^H����w7"$�/�$�}4 u���$��	�D[��r�M��n�H�8��C!)�*$
���L�9$����>H�Q��m��MHH��)��X��^H�X���
����.M	��j_c�|8�(�z'�m���iVR�(4�T^H��V�w
7"$�7q���J�vHbi���<^#jTxH*$UH���TO�lhV�~i�b8$���T>H�Q�dk:�&$$���Zu,H*/$U,H���!�;rhn��# �!���(nD$�=����n{��_�B+��I���:Z���_�{��%q�yHj�;I���!��H���Q%��{�)/O��RQ�D�W��z(#��d$��
!"$1Hbl�'�[���������Ht��k4���Rj���������#=y;-���}�W���eB�����I�k���*�!i��4
m"���e���K"#]'%~���
#KBB��NHA���	������1E��^]B���4O��R��U��������LZ�#�����!�����G�����3�����u'��X|4^>�X|4����);��)V����T����;X���P�Pn<���B�"�����3��HB�=��}��*�������MJm���F;�Mn�)���q�eW����|T�
��������8��$�A���6�z}�#��K����=t��(S���v
��Q�SG��K4�R�iM�i��E�����2����W
��7��\���/�o�wi��H81�a5�lz�B
q��%7���"��o�>W������D��cY~�I!��K��NS��O��j��E �6�[�O��j��%8b�B>�)@W���0�r��a�(��.H����8����<��(��K������-�S���je�z�����+ ��Kw�������Dk��@�*�y���y����.�9Uzk��<s�P$M��5s�!����;��z�o�m����T��L�����C���jyq���1������(���c����o�gX�e���0��a6K�.�=}O��\�����f��M�x�n7���s���7�e�*�<}z2��JO)Dq8�Q_�b��
Q���r������<���n8��]!���.��"��5E��.�K��������0
U�V,x_M��7��-D��6��UQ��>H�#Y}�����<���U���h8Tz�$���	s���B��/T�M���1�H�.���EB)��f]lN�W_��6W����d��1���)���8;9�PK�RwKI���*;part_pa/6_2.outUT	�SZ{�Zux����]k�0���+t���B:�M�l���m���b��9v�����q;����)���!"�r^����!�u����_������l�����r-#O�p�f~"-hpTsJ�)�uz�M8��f~10"�`�� &>Z�c���[J�3W��t��������Q��cH���u8|x�YY������0�1��p�2�MmP����4(�m
�����r�H���uFn� I����c;n�<�.n)��k��~���(��)�i�(/NT�����m4tq���*Lf�
F�F���Q[��m�L�V����R����*�+��zB�5��Y�@�M���Y��I4{���$�|����j�>��$MH�5o�� 5u�/e�v���`5��<`�����(E�K�>��Lr9!g�9y�8^g�q���\}~O��v����]�������������4���1Mv�2��i�'�0�(���Q/����G�Oa$�I�e�����
L�~�b&m�:�I:�Bb���0�����D�hG7@�br��x&���G���:��;�X{L��I60�v4i�0W����I���������N���T`e�����v�@)����g���-%�Sz>%����rF��u�c;Jg�-���K��R����L��k�8P[�<�;=���9�3Yw��t:&��d[��am1I��h�P��\!�]4q�G�`r
L�M��p���5�F�k��Z�I����b���1a��x�e{/NT�
�U�#���:�������������7��Cc����e6 ��4���\1V`�^,�|�G��_PK�RwK;7�y
qpart_pa/10_1.outUT	�SZ{�Zux���]m�����_�o>wl�_nz��$N��q]��N?yd{��N�HTc��wA
�H�_����d���
�>���u��!�^��w�����?F~����i{��^�.�$9������R	�����"���^P�����n��<K�f�l7[&��6�`B�B	�����/-����y������n��'��6[��4�h/?�_��[�n����vw{v�\��E��~$h�1�������n������������j�-�������b;_�V�3h��g�y�i�|��f�b�B����G��6�v[�ww�^U���oo�UfG���$y���gRqB�F�M��Zh�O�'-�Sq�|�����������y������5!������<��w/�$7��n���/��z�������?-V����4�y���]^_o��Y�:5 sE5�
�55D!*{�e��hj�)c��xUI���0B!�M�=�C���.��7�����v�~�e7�Vvs�_-J`}$�w�j1�U��n�
���V����O��o/�������k����V�h�Zo>��m�v9[�R��~��l���t<h��l�-`QZ���QXr��{T1E� +��DD2Xa�-��,-��-+<}��/�}�r�����	>O�K��F��X Ch�4�w�=�z����]t��?�z�f@���3
i����Q��(9�gV��P����B�����
�>=��
p�DH��8D������i�'2.�2�p�6>��y�����hGgD�:s�����s��!�n����v���N�2R
�����
���Ni�N�;1�S�k����asZ7c�Vf�
�"!��X�n}z���`�S��{�
�zp�^��tta(��!98A�
��k�0�&�Q�����0����o��_f��������q��!.��1������V�]�@�&�i���M�������zH:[o�����v�L.<c�=}~t�9���������UR����A�f5�Y0(6a!5��8�DB`$.�
��V�.f����.���	����}��r�.���;PK�;��GN$�{��sq�8NXB$�x��Ih��BMi@8P ��sI���19�!�NH@DB�����u9�c+Gs�X��������������)�S4�@o�m�^%�����hs�;����F@O�j?����1zws6�i�F��1?�u�O
w��
����&���U�^%w�O���Y8,�5�Driv�Zs�_&�8�4�1?�P�g���U*Z�Z��/��xA?-�Y��;�w{ew%��H�B��_�����}�\��1�����y��	��������m6�����t�����,�l�D��iO��{�_�����x`�����)��U���r_
-�m��dq��c�=����> �jlrvS�L�I����@>��;����������}8o���y�4a�%�*�6i������]�������7�M������$�1� �����������Am<T\kh�$��$i�B�0$��VN
7"IRIR�+����$1��?1�Js�1O��C�4I��$I,IBD2�b4����G&I�#I:�$Ia�D���hB�d_�R��$��$i,����$�*��)!#Ijw��b�$� I@G�q���iow�^�%I���,Iz�BN7"I�&I��h�:H�nHcm��di.%k��������^�dCI�=X�d
O�
e=���I��H��"I�f�`4!I��M���E��K�,I��F�M��������� �r�-CI�� 	�v��`4��^��
����g��p#�$��$+��1�N���
`x�v�F�$��$y��CI�?T��BM�$�zr�#�$��$o�$�A�<7}�M�MH��kSj�c�$��$�E����dS�;5%d$I��H81��I$i�=�0^*����_�$��$E�B�X���p^(��*I�2e&�d���(?X�x�mK*���E��m2�S���Y�\��P�L��E�R��{�Xec� ��J���R�g���!!��I�Z�<|�����'��'�$�#� 1��U`4B�k+k�c������E�����am�ba�l��[����X %��P��9Gm?5�����G��t5�������\
J�e
l��p��L\���:5F���GN�}$ZI_��>�a���'��p��g$b�8W�I���WC�q5�PWC>XWC2���>��rR�_�L��?�b@rH�d�{r5����S��Lc�U����k+k�c���j�X���\
�����mziQ4��]~3K��������&�c�x���u5T|T�L�����(o���4iw�����YRNJ�
���<������z�@~>>4�����&�V=2%+%�Q����Q!)��M���E��K�*%��F����BS�~��r_�S�������*DI{j�
��p8���fw��#u�>�X��J������|�c�
�k�����E)�������$���kj:N�m�'-bdr���i�P�.�8����P2����$*7�$�
�	����jf|W$6M��S���8d*��'��>����_���a��	��������_{�_��~����s���V��g1\��|u���I�P��4����+T@(��qJ,�!w����0�1�>|�v�#�>?���{�~�
�K�g�w���{���*-�^<�������M�7C��<H����uK��>��VX���\���}%���LL����������n�bb�eb���cc������Z6��_;w���a�n�IV��BZ�k�����Dr�[��VU&/2��:�����C�X����=����Q�%�����=d9�yH������&tY�
Z<�7�������^B�(y�
��RJY�o�@��!~A�]�|'he}z"M�:�2c�\�U�c��za��l���7��q�~7��f�����pg��oR�C\!���vv��'���^�,�[[�=���{����AJ�
-m���
'�Q���� �]��,���4���*�(oL�29b����%	���!�U������Z�`?���M�f7���B���_WP��/��5��k�J�B��-�$��Hk�gQd\��p�<'D6T������L��$$�ErC��������S��c���Hb/����H�h������u�F����R�O��gn?g�����������fi��b�H;�������$� ���^Ef���U��������+�I�<���}QX9aC����?�M��,@���p����nR�x��H0�t	�$�V��O�P�i��x�����,^V��]�G8Y-tS(\�lz����hN�G��%EkUb76n*��[�gC��t�u�JbS��eSb��	�]*��yBmi��n�$/���]���_1��=u��o���������PK�RwKX����ppart_pa/21_4.outUT	�SZ��Zux����Yo9���)�m����Q����Lf�����>��%�$�$���H�%����ny�%m�T��/�X,��I2B���o>�;y����wK��W���%og��M���.��!%���)������K��]on.�L^L����<��n�f��R�!�`�j�&����^�a?��>I>.W�%3�[��[��o��������b��O/_&?���������n>KWd}����Q�_�����<�,�^�Kn��������v�����eg�O��d>�o���Z��]~��J?M6i�dV1Y�1�1b��L�%D��-e;l��"��hi����T[��-���2o)S���������Ru�w�J������P�*Ql����M���Yc�K����$���`3H��Q�m�_���t�N��'�E��@��~;�_Lo:���~��������*��k����rM5��v��D+�)t��t�jz(�}u�}������WM��Y��LM��L*{������-�4WM^/�mz>��^_,��_��������_�E�/v��.]o���-zKr�����/g��9���"Yx�7��7���q�\T� ����j5�6k"ln�4��%���AKK�����
��>-U�@����3���RI�����C�RY%M�����������n7��;�:�yP�i�����8UL��t�\���������N��3���+k;��v����� T���^Z����&����F����,��@l4)�h>C3��X-�y�`z�^����7��r�<z1gW��������5�H�|�����w�O��&]�{�;�����/�'Yk�Oz����d�p���u�)�Xm�9���<�c�{�q:Y$�E2�-��&��B�v����A����5�����}b�`
7���V"�3�G4R*� �J)��*��������P�����}���K����p�V��L\���;��$��������Z����_V+�a�xiE������3��)�*d6�vP�Z�+
v�	���Y�N�J�,�\M�r��`����<(sy��|4��s���+>j��+��5�2(����f�gn%.��������o�G����������28�m#r%���:V����0�/�h�<{�~va���hWjR;q5%BdY,�:� .����Z��`++�K�
�� ��W�F\U#�:q��j\�jon����S��� �IO\\QQmQ������H\p�C#���]V(�S\�F'EKC&���l�^Z��y<����r1��E����`P���[��7$��\K�B1p��f�1e�+-��k�o���k���!W�����\�7=.r�7w������'��u,mF�s{!'�D��U���\����������&YR�4�NO�xqY�(a�P#r��u����H���h��zP�:���r��c7r�D�
�W��-��QD�������K�\SC�
���\s�������\T�+[��|�k%��D��f��:�'�\���P�	�h���\Kq�K����JE����k� ���~v��%X���A���&�� ,��QckG����`8Q�HM�qq�&!W��+G#��W+����K�v�*on��}!8�`9k.��"���+|k�\c��;����ld3��!���%�O������(�U~�����]�%��}b�����\��#��S�]�U.���"
�
�&�"�4P��C����F.�����#�q����[��U�����<'���qy������*>�����p�9�����n�?�^\��1�qoG7.`�r����RQQ;�P?�O?��5
W��)�h��������d��z�)=O���2�M�/���4;���{]f����a{����
g/�T�u����\��o,�R�W@����?�c����s#�MC[���D��7���n�D)L�FF�s
�D�<A�++>M���U	0��%X�;�ZK>���"d����=%���z��~Pc��u��.�����k;�1���������XC����Et�|
�W`j��j�=��C-��#_��'A��/��u���4���Z�is����%�-�o��R�����b��-�pYA�[�n�A�tJ~Y�H����`>��2_z�����{]�������T��l+�����AM����qN�;C
���C�F���	�}KD}Q�����F}q��"��'��������.��EW��ua��a�4����T�2~`��db�pY@���c}j��\ZqY�(��B��}a��>x����G��h��l>��@=����:�L�_������<�������� ��g@k���q��=s��jh�;/hV8�(�������{���������R�~x����nx�?��zz�^������W��1�`�wp>z��n����2 �-�[37�����������~�����������LV�U���k�������Yy���Xzq�>+�]z�����Yy���Xzy�>+�]zK�N�g��K�c��I��<w�M,�9I���.����'�����g����'����*�N���l�(�?�R3�W�=��Aj�-���J���-yW��A����f?�N���6R����w��B�&)�~_��&f��b���03�2�c7K�wZ�>��5��]�*6+��w���JQ��.�\���;2�t��cC([�6$���o&��~tmh~���x8�����G�E�U����7�1gN��/��D�j)o<�L������� ����+p����8X�q��_�C^��+���`��p�����W�#�"�@x���� ��g9��:~���w5��5A���,�����]�VBO��2��m�A�N��r,�m��'0"h����&`�O���K�4~����z x,�F�O*Pk��<����#��r��>O�!gr�>K�|I��>A���53a$�����������PK�RwKJ};�*Dpart_pa/6_1.outUT	�SZ{�Zux�����k�0���W���C�N��B��1(��cO�$&1s�,�i��~��vf'q�R�R�@�%�����t
!��>������\���aB�o�
�M��<!�i����\sG5PZu9Y���C4����#2�>b��e8�S�9�UWTK�4]��h�^=�$�C�q����|��nLm�!��V���E�n�,��]e��e�I�leR��.QLn��t�3\g�.�$���<6�&������B�]��*9�;����k���RP&�=��kzN��FCG�!
Y��dV�`Ti`�1�U��0m+wB[CW��)4S-KxRif�3:�����G�u$$��dN���}%a���}��V3��=�4!�ch��\j�L%^�.�ZR�n|��]��U\�
�m+�]� ��0�e��	9��9��������}O��v�����c���3,W��/��oT��4���,��i�����.6��������.�h&���:��^o`��M���0i�8&	��u�T��p��t:&��$���S�JL�a�<�I��h&0y�:��;�X{L��I60�v4i���U4I���WMV�R%jL�]��tL��I�1q<�EM�a�������h�H��t2&�������X�����8���
�;�(.�1I��M/��40��J8�U���U�m�&pV`�\�x������m`�-LZ�%E��%��Jzl�I	*tu���1���50�v4����������K>X��Sj�I�����5��l��T�WE�*wV�z(�+�
���������1��q����1��>B��Ck-Z�]pt���h4�PK�RwK,��1�2,part_pa/7_4.outUT	�SZ{�Zux���Z�n�F}�W�[l�^�����:��&i��M�E�F&l"2��T���;��(��i[B-�����9s'����������g��~c������6�;M��T�qhG�yy��pl6���J�5��oN=��h���1�����x��{C����-��dJ�Hs��{���8�����|zv}=���<�~�E�7��L�a���P�������	K%O/��6����^��i4��}������<����q��I�^J>��n�i���~&Y�N	a~3��.�}��G�'����x�����M�+������#�4���wN�������d�9�e��8J������+�F�tt�yI��=)(!$���L�4&iw0�Lg8�!����;U��*���L?��U�|���
���KQI$��J�C�r��P�^W�sV�{�Q����H����!��2OF��BK�����������J�d�d�f��������A������A���)��A���(�a�O��6I��Q��e�� ��!�Rz
��E�?VTQ
:7�"�U@�e��X��6�]�$�7v�����N��v.�|���xF{6vO;>;�x����7�gd�����hv���e��K���\��/.8b^��}������A��Q��%� r�����o�������N���\.�7
)��|������f�o��Q��!=i.D~�3���.�F���tZR��I<�Y}r��UF���x����8����,��������V�Jt����Hr�$U�������C��}E)��l��5�I�����)��F"y3��V��������p�E"`������XH�GV^���Z��\^f�%���G�X���S��mS�T8�*���c�� W����
��
���������B�R������c��Qa�WxJv
�$g����
�Nn�
\�
�(��T82�>*D�
/V6^�����
��,�M�^��zJ��t&�
A�J��i�0R���"d���
�^i��%@��wM�B����b��	�:��r���g���
�N
����T�)U�Pg�LhC��~3��w�f�����������5r�X!�3�R}3�P��X�����b�J��3aBtB��z��O('q�}�J���g����������)D5�� w���)SP���l:���I�����f�QO>T5��*6B��Vb}�a�(���7���#�����8�N�4�Q�"�G7q�=�c��:�R@�<�[��8�Iko	����AB��G5!���)C� �[��(O���9a��=�@�+�V��OAV������p�|������c��&���Ju��$�f���e=U�,�fm�B}z�b�@�*�q{q�|��Au����m�	Z)e��Y��VC����g�F-�.��9��	IXe6��k�t�
���FP��wS���
(=*�,�fxd���2�yS�x����7�����Z��x���j��[���<�K����6��Q}����I�� ���~�A{,�X��|Yc���{l3<�jX�K���������<�b�*�A���.���6B��h��^�Q/�uPUThA��0��j���W^;���JOo�i�=P�0�:��C��Pm�v��*�f�:@��I�;�j��b��}P�[!*hw�v�*�B���j(���+����N[m���j��@�{��3�"��u�0�F�����j�X�������Ub��[�R`�Oxs�.�@�R"|�U��2�-$(5���E���}��i5��z��y������[-�0M�f��'!5�P<]
�]]<��
�s���u����Z��b������� ����kcV����h��.[��BrS9��!�nv��qO-r��%���m4c�Zj�F�-��d�w�A���4�����f��]<��X�d���	�V�K��PK�RwK(]�[�pppart_pa/21_2.outUT	�SZ��Zux����[o�8���)�6�bK�rx6d:��K���,���$Bc��3��������.�,Y�-���@Q�	%��)Yv���__��w�����c�-�.^<�t����OVY��f�\]	)9�EH���?�W,�cr����<{6�Y=���jr�_1m�A�����(������_`;��>����%��[�|���M���t7~���������W�_�9[>><L'��,G��}���%_��o/�������.?,}������0��8�~���������������E�a��&�������ZBY��B�c+kK��C����ZZU~�8���pK���!��XS��X�P�Z��J���T��qr�q] u�R��������Y��u���h�	6s�5���!�k���/����x6��������������l};^�&X�V�d5�C^o�Lc�1 ���HD��&�0M������W7��xo����q�f����A��&�bNH;�s�:<�in�<����d������f�.Lw�0�ua.v���wa�������U~��Fo��g�I���dV8'�\R$��&l�M�
'Z�~����37W�^EM��U�������"~��T�!��>���
���}j�����l ���PSF8WS��8������Q~is��p�>�f������p�'��
����@	4A��U�q�O��4�~x�g�kUW�����([L�b����S����Lh�ea��W�u��������������b�A�Y}
h/��_ 9z�Q--N�x��}����r>C=���%��|q�#�>gW���}s�;���i2]�w�p'7v�[���,�&~s�[��V��!s��]����U�)�6��~k��>�={3�e�Y6����*��W;{�A?@�W�b���\qP�'%
��X���X�!"TKK���xNH���E~�OV�n����n����s�o*aQ��MZ=�M�8���k��E6J����-a��z2f��0]�����x�z����FY���LjS's����\$�0��k�!qaOp���b%��5��e.o0�����\~2�����n�
on���,�r�U�^���N��j[!��
��B���74r����2[Q{�mC�e��Cc���azxeE�����v���+D�J�13�M�����q�-��Ns����1�
�p��
������5%�U
���!W5��N�\��:.r�7��\\��re�,�I�\t�M�\�E��M���F�b���K���5���:�U����rc�^�
��++�<�����]XF���"`q��5����������H�py�7�r�K�J����
���W7��OD\t7}\�o���Kk���CE\*:��9`�-��EI\���m�<�x)q)#,Nr}��h��0(�;�*�����OA\�;y����r�E�����2�7i+qq�%Zm%�0@�C�2�pv%q��M������A\s4��q����q�k����[�3Wv�y�R�����F�a����B����3�`��F\��u2qh%������7fk�' ���|3q�����B�1+5����m���u��]��$R9�
w�)x����0�q	qe���h��
��S�*(8��������$.�A;8~��m��u%lyt�b���}\c�>:�����=��D8�k�TR���l� ?	p���
��`W<93�Z��kR7p�*"0��U�E�������L`a)J�*�)�OJ���eG.k���F�#Oq�7�\�U
���;�vA:B��1J���6�B�
�K�z���e��6x��&�o�b���Y��
*z�=�������(��k,��������
a�T�t{k��R\n�3��o���_Z|@���=mS����.����61��
�Ft`��-������B ci�uX_	�k�,�TYy�������*�F>[�E����*E5A���E�^�������c�Z�Bi�9S��xus����������C~���ih}�e�����i����i�!��F8���k7��C������8�va���Bq��@
�k���Acn
G�[Ccn
'"�����&���6�_EH ����pqAJ3\���������|���!i����.dk�a���N�����2[+H�=����T�����I����
q85U���-�I}�������
��=
��uw;*R*KDI�����!�G��hP_�*L�>�����m`Y^�
-:����Q��X)��k�ojG�j\��}��a��%d�1����"\DC���z@�v��0��>x�4��C�������j��u�mv�
&<.'���#�.4����pH�D%�_6�lu����{�/�j���3F�`C���������s-��P'��m�:�.6.6���^�Y�H2z�V�����#V�X|�n���\n0�����2 �5�k#7������^��x����}��w���=����U������������1=u�E*�8K�S�R��,}LO]z�J/������W���,}LO]z�J�������7���,}LO]z�Jo������g���C�����*�F�g��d>+�+���W�=��Aj�.���J��y�����Q�a�w�����=�T
;�/����~���!dm���v��mb��*O�:b9�����q�_�h=���l������<�(�����J��#�tG��{���o����!�5O��������7?�:���oW<U6����d��s��a�q�
�'V��_p�@�ou���~`�����[���Cl��*�cw������g�����i��H�X�^�/�Q��y�=�J�>*���-N��oM��_���)�^�_�Z�Y���T����G"=J��X�i�����������WDz��H�T���A�n0����=�J�4}��H�z <�����
�0g��:�#���r�����&Xv���^}�o�q����ZVXB�q?��������PK�RwK�p��0,part_pa/7_3.outUT	�SZ{�Zux���Z[o�6~���[`!��{�H����f]�b�S�:B"��]K^/�~��(���(��&
D�I�;�;<7���?�?����}}v��'rp�O����n������(O9h+���P�e��&��SN>gW����#r���y2&ev��j��� 8�����d2��G��������d�}K���d>=�����I�v?�H-����1����.aT'��_OH�i~�'��$����
A]N�Yy��k��������gy����%����&���G�3�(��v������p�o�<=j=���eR��3�&�]o\8%��A�8������(8�z��<��&���� o�I��W'Dn��:�����)^�����l$�T2�`L�ja��:�u�I���D-�Z��{���w�Y�AFI���10�V��P�*@xE0$��z7J�;�F�{k��M�k��>���������v2C�R����o�`���������[7h3����66�����5��7��I�G�TRX-�
~^��
�H��S&-Z�q��������0*H?Mrk��&����JNIuY��Isw����/�q��p�����q�'g/���������tt���������9~�������#��Y�gyF��?a\�.���OzE>|]u���{t2��r�n��T������"�
�NR#���Qc�������v9/��i��_��tF�xs��uDJ��tL���4���3ZH#&R!��(V�rEt\���,f6::^����o��.�D����Lr���d+Q����9�d`�3�\�����p�12�i�S4r��:f���W�!�����4}��X4�c�lDXV(p�M�Sa���
>OksL��T���73�mS!V�P\S��9���;��F���
���=�����B��
!(�H�E#��AI���u%.����
)�H�M�Z��8C��s��T`�E���b
�Lh��~3����C�^O4�����	L����4"y��a�]3����4T��l{���Y��)��'��)�w(�����L������a��o��~$,M�	=!�s\"���>X�:d%0!��	0Va��m&\��c]/��%@Q'x����r��&�k=��1��ck�	�g�P0���Q>����&dc
����B	�M�#�=D���/j��EU���P�mF��,���0�l>����w��b��,)G7������Er��
�����-�yZ����oaa�^B�}�TL��������>	yRf��J�����xXJ�L��S�,����w}�ocw�����C��*�J�u��,�&�����S�����6d!�^��$�����^�:�]����f�l^��Gh%�P�	�t��-����g�z��������I��Bo�?,Wq2fJ�6��B��7S���
�;)X�_0XA�����B#|��+:W�S�=��c�����<�v3�����2���&������t[���B��W���(=�Rn�t/;p��k�����M����a$j=����>Mjcb���
kP�CX@��l��ztn���z�����*�����i�U�|/�����W�U�M�z�y�JQ�����u��y�TAe�G�x��b����tT��j]TSAU@A��C5]�*u�X9������N��l���
��@e��I����������c����������j��5~S+�	XJUXV�KX�#�[��(`>������"�����>q����P�Rc�L����\J+{�"k���RK�O�f�:��lO
�]U<���U������FW���U�r���E�cAl�cc�VI�������8�_��8�n�'������14q`�q��b,���l�\��JIV}U�E��N����-��/�h^�b�I[��&� ����K��PK�RwK�����pppart_pa/21_3.outUT	�SZ��Zux����Yo9���+�m����Q����Lf�����>	��%�$�$���H�%��P�ne���lv�X���bu;IF(�����'o__��nIr��)�����n�M�g��f{%��T"!e=Y���\�������J����t�}�.���.�b�
%5��R|�:kFi�X���������I�~�>�-U�����e(���H?_�?,�����y����/��l���tM6���.����noW7��vu��Mr�N�7����w�5^N��?�M�|?������i��z�p���:�0��5�YEd�Gd�	����������9�� z��e��*��Sn	�=U,�)����P�O=�R����{����|�qWA
�*�%�y��M��2kbd)3��2s`�X���\
K���e�_���t�I�.��e��@��~=}X�n;����v��������*�����Lm���Pp�FK�W"4S�A&�l=����_��z��D���&i�t����|�q�8�h�
h�p��l��z	f���>����^�s����������������7�f��$��Z���v��}5_���-���+�	]�i�E�x��ej�2��1�����.j��!t������3�!\5G�2�8�Y�C{������`
S��6D��)�����L<M��K�����
��n�����s����Z�T�6oJ0��:\v���b�.����ty��uE������������;���*����8z�K	�:��&�����Sw��4���z���g��K(�e�� �^���~h���Z"��-����d��I�X���d�w�9���p�4_l���w��[�����}��&����~����C����o�w�����������������t����b�L���n�V��������(��k�E���'&
��	������h�����XN(�'�t����7��C�Mf�;�����7Va���M[{~�d�a�����W3�Z$�
:b�[E46����b�)^J�hz<6���6�	E�-��a�j��4�p��t�[o��6�zH�t��!@E]c��yP��t�h��5���AW}��CWxqk���e�7 sq�6�=s�������E���b���74s�����ZRg�mb�6� ���g�jCf��R4Y�-�0����fZ@J[E�������0��u�(
���zp-T���T�A\U#����F\u"����q����u���NU�+[�\���B�"�������������H\�p��ddr�
ep�k"���y����j�!f��R4Y�-�0�qq��F�r�5�H���DF������e��W@cg���VH\i9�A\]#����F\}"����q�k���v/��wk%q�h��W���eA\v���WH\���������e>������>�����������5��I^'n�q�#.X�+��"u�Emq�;�"��Z�C��q\/D�����Z"��q�h�55��SE����^�}���J\��������3L�5�����h6q�$T������JA�HQ�4T������O����'���joH"�
*R+q�Q%,�Qck�K{�p�H$�.�k��B\\��	�E��5����+k����*(8�������['.#��
~����x�R �`��8
�-�
c�>:������f�"���������j�!?	p���5� �m��
F����dj'�Ry
&5�L�h�q-vYP�|\��
��@$�2-���)qY��l4��q���k���^�qq�V�R�up�pB��q�P��P�B�Re���a>������o:����nP�������$����C��ra��n�$�`����i�T*�:S���g.5OP%�y��p�4_-����q;lc���n�FNCY��u�[P���:cQn��K��qm%��
��Se��:c�j�Z	0��,�%$7AV)F��N�����
��f��]���L�W���6u_�9�������2A��p@6}�����������:��Ck�3c���@��Nz�)Tp�;���p�S��$l���y��=�Z�@�	�\k����k
'">�s\�K/m��e�2_��	��������%�-�o��R��3����6���j1L4��I�n�{���JCZ���
�S=*��M�����y=��4�=�b��V�s�r��AM�����p��6=J���Q_�F}Q��8U��9~nU;���[�>�����S��
aZTn����
�����B	t�=�B���1F7����(.�PT�
�PQ�r_�q�55�0��F��c���9;����o���S�a3_~(��xt�X��<��DJT��e���,0����R����dA�����W�{�����C�8
.Q�E���=����?�J�=
{������z�����E���h�����1����q��r�!��,x��k��n6x��~��������t}�[5!���6��X�����<u��X�����<u�C�z8�>+O]�2V�<�>+O]�*V�:�>+O]�:V�>�>+O]�&V�9�>+O]�6V�=�>+O]���b:������*�����|�,�>��f>��{�ZU�]U)���1/��Q�rz�������G��P����k���������&U��.�&f�����=������3�������fV��O���"��+E1#F����Th"�t��#C(;��$���&�o~t}h~���x<��W?�S�E|T����7�1�
g���`"~���G&��K��YNeb�����	���=&.��k�����'���������-�=��r*���qP <j�B�x�g9��u��1�"<j�+�k�:����J�&~����z�?lc���\��c�
@w9��m���'0"h��&��&`������J����U�">V�I���X�F�O*Pk��-����#���r:tc}�I,��ns�����|�8���#1�����g<�<���PK�RwK���~1,part_pa/7_2.outUT	�SZ{�Zux���Zmo�8��_�oM�A�o�f�����]����8��@��PG�Z�m�����(���(��L�*��3��p^$�<�����������$�N��i����cV2v:�����8�����s��b�{q)���]�p����&�r�LY�=���	�i8��*.��fs�{vB�;�����<�f���b��_��/���L��o�(��m6B��m1���0��z8�g�������y������^����y�(O�|O�����2�V��,O�2}��|z[<d�0��~&E�<�	a�0[�.��,O�:O:x����}J��W^+�e-
	n@p��i�
.}���^���_�E�>O�<O�.���1Y����)A�������$�V2����U��2����w$��RrmP52�-��lb7U�e�(w(�jD��H���"l�"�sM�@�pJq����y%�PF=* �F:P>�����e6�ZTZ��>�%��o�v��������Q��1�����8dn�fj�f��f�����,�&�Q9�\�������HJX�07�Z�Q�z(tn�E�����YNb�[�lq�����%�.�;k�������lZ������{������w�?��";�u>yHg���o��u�0�����j���
��g}F�=��z��.�
�������}]��6O������%�h+�7��&LX��
/h6���4���[r��^�����v=vr^,����p5��/����7)�N�)�����.
/��,=%�d'Y��E#:����FR�cDZR������s}'�/�o��$��,g+mlAvH�������)����2
j���d`�X9�*������\H�GV^���:�~���s	����"��a9���7�M���qz�1v�
r5)�7=�T(����
tB��
�I������K�
Q�**�'�0�
-��#���������Rp�
GF�$����H1���@�m����[T����B����(�,�	Fju�L�`��Pa�
AEy�g8�&��\��p��!0fp�L ��hFd���4v;R�n�P0t<Q��5
�	kZ&��b������}3��}�q�
@����(�w�eDE�'&k&�12t��	������	K>�����	�$�OP����#�;�u<��":�G�n*���;��tm"�T�y��I��5E�����4>�����Q�Jlw=BzT�m8xa��v9����#�%�o�r�����F��H�S�A��p;�_���(Ik	����YBW'�r����e^a�zB��Y���� �M�
�B;��h��Y=V�?�"
�����?~g P/U���.�V�{Yd�=�������m����j#��u�1�j����m��gP������}�&^��Z��OX4���`��mn�x��i�%�������9!	��.�CBs�=��e������hO��>,�f����e%T��������~�����=6C_*y=���N��,���p5{L|o�����V��d���
D�tZ��J.:��{��x�j����V�;��"�Cq�}<��W�Pa*�@��*��	U��A�0KY��ZU�A�T���
^�"���M�{��K�a���'����xx�z�m_.UP���P�	Z�����A�T����T�W����j�l����f@�������E�9 T�k�]�CP�'����!������BG�f�X���������R�
��X�}3����F��D��x� d8�H(��\���_��6��s)�������K-�<M��o�}�W��{S�
f_�/��T]u��<���}���g�eW�'���
�;����vJ�%�]�&/)�4�CJ�������D����>��z��d���6J���� r@w�����������N��)�?�rK.C��6/?�NN�PK�RwK;V[p6,part_pa/7_1.outUT	�SZ{�Zux���Z�n�F}�W�[d�^��b��6I�:n�4(�d0a�I��;_�Y.o�H����"�7�^�����Y�=�?>���}�8���N�R;B�m�#4�&Y~F�2B,4��o11(M�fg}�f����c4	��*X�<�
���0�2�Su�)�$Y���#���K��Dq�����m����77ix�a��Z(N���b�p��`�]|+6G����(�8�������;���e����a��8=���~XDq���5���l-��c�Pdyp���<Y�.�-�����No�|��}���	���XR/� �/� X�Zp����:���$�����8��"�7�"X��y�'�PPHz���L��6�Lm��T2�Ik� �O��LrK��M��j>&i��-�����u���J1��P������e�V��iAy�0�V��KVh�}x��Pp���Z��%"�#c��Z��Y�*9j16
������bx1g�����Dqe�Rp��>�K�lk�E
���Hc�Ej)1��Oc\3�"}+ ��� ���%�,���*�<M��#V.�|-�0��'���a��Wo__�������r:�ct~�3��.�F��:�l�����:�#:���h�;|�]�+ ��p�>��:���5:�
!USm���z�4��>^������8�!�~�4�I��}�%=[-��s�[Da�������#)X,�:_.�xVIO0!�d���B�A�R
6LJ�����0���a."�� �����^����i�$F�6� �����-���&�KLT��q%$1�7�ojN���Z+O�`����]�5�G?L�Pv��aO���,�%����
�M�(��0���T���VTp�6��k*�&�*,e��p,�!*��B����H��
f
�5b�+8�U�g�#��e��UaJ�M�`���BnQ�]XR�er�
��WH�
�lM���6�4��Y��3
�M��al�
� �frZ���X%����
5"�m�����Go�
�'Y0|h�0!�����4���	�����`�
��8&���k&�w�Lh#v�l��d�0�8(�1��
��������l�'�8����X3��dcAD����?I�ptUQ_�Tw���(;e8Q�T�����C���+:���X��ZM��yv�����WA>�����m��7!<�]]v�|f9���56&%6X�QC���y��
��/�A�?W���l�n$���������=����#|��}F��S��.��w^��,�oP4���.������u�P=^�E�@+
�4�b�����hQ�r#������+�J������-��c3@�����������U��	���.��u����}���v����Vp24$@j|d��f\AD����S�$v"C�H���$��}#���������z�m��i��	W�]��rKcq���z(����t:c	���5���.�c��m��������=SE)�����h�����2��*��*-.J��gkj�T����2���A�����^a��J��*���r�P�6T]�a*��U;��"r�ZU]PE�{���U��
��j�Pu�������~�.�Si6m��T�i�m�CZ�e������-��n��* (��u�V��Z9Ia�t��*,@h��ic��xy�N�q����X���S@&	����&����9j��)�������EV���R�?N@�uo�8��������Uu�'U]���d�+��o]�����eW�w���
���:�Jh[%
�R=]�&K	��7_����c�]�B�n�O(_7�(�|[+���%�(��o���)�[��mv�^���Ua��'e�!!����2Y9>:�PK�RwK]����kpart_pa/10_4.outUT	�SZ{�Zux�����s��������������;�M�li�%���)�����,9�&����(�)�\�Cs-����$�K���W����{s����?99�?M'����,K���r�]J)���P�,�����%�������Yr:�f��<�f��%TJD��U��|�|X_�����o��Mv�L7�ly�����}�K����[L�����zsz:�-�Y��"h�)�������a5����������b����yR���l=]n��J���4�<��L�l�\������\_������p�\T5]���������$��\�{RqB�F�l;�p-4�����]bo����E���o����������J~I����E�-��&���a�����~�����~����O��d>�-M~^-7W77��f��N
��\��@HhM
Q���j@)(F�0�R��*)�WF(D�.}$u(R���V��Y��������6]�dW7~iQ��`���%B������#��:,��.�������|}�&�,[�JzE��ru������d�H��x���L6��m��`�MV���j��������s����*��C�Lp0_X$���-��[	�	/��b~)�{m�B�/��h$�kEC�FJh'��p���dR+
�P�-�c������0R2��0I�3//�P�Hi@��^2������!5�R��L��Sr}@S1A�[$r�j8D�
�Hp��h S[���Q�2�"c>���o �gE�
t�2�������f6���e�`D���L[����B�4J(��*TS-J�J��f@����r���i��af
�*������}���Mg&�%�� ��Txp�~��Y��
�ui�`������M��|.	�x�H������/��m���lQ�8B8�
d�>����cy8C�F���P����o��+���)���������w�QK����������.C4H��\�	E��q�U��fb��x�X�an ��j��U*���Ta���5��a7�jg�Zj�f����sF�q�V���|2��������F0G02�D"���jc�x����aD8��!�������5��Q[�0paG�L����N�,m�	�qZS��6�1���
}�sp���Rk��saI+�s���*��t����x��<����!�c�
�a$��{���� �;9���f#���h������b��/���d�,����|6E�vb��G"���p3�Z�.���R��]q(�m��>4��'u�j//����4�g�
wZ/��NK|w�� ��s�����m�Yr����������1�I0��������������,���7�����=y�����}�V�D'��nU�CR������Z�Y�7���������Aaw:�����!���mw2�����q&���&����U��W��K��8o�0�����/��R�*�6�������{0�����*���W�nM	In&Hm�0TY��_�q�T��QC����^H���a ��hnDHR$��2U��VHb��,S6d-�%�,<$��4$�PH�'I�H��
e=(����>H�Q���+�w�Z�}}J-;$��4$�s�dS�;5%d$I!�GZ�EI�I
�UA�@B������d^H��fa ���8����dMH����D�C���Y�0W��R��-�S����"@�
�${��d
�1�zP��!�|�d� I��@4\�&$$���Zv,H2/$Y,H�����wjJ�HRK����U�Q|�����<���<h$����
��@�3�}��!�}�,w�*�tG$��E+�$w�g��4B$�=��q ��B�?UHr���$�zP��!�}��
H������u�g[�}}J-;$��<$�s�dS�;5%d$I��@I�8��v_)�R���a0���^H���"�C��I���B�m��;8�!i��)117[O������@R���
I�d!id�H���#CR� )��#I�p}������Sj�� )��� )�$�#I���n�X��F��=_�S$����7X�:�*���
-cE�����I���1���x iL��� mx9��bL�J$eH����O���nm�I=2$��r�t++\�nL�����}}J-;$��2$�s��'������[��AE�(�Z�����d�����D
I����Phk��=A'�&�rM�z���I����P�4���=RH��mL!��(!�KT{y��2R=IF��161��4�F���[)U��d���T>F�Q�$n]h2*$#���Zv,F*/#U,F���H����%I��aO�`���C��]�v����
iIv��� ��^F����N�PK'��F��`��'q������$�� k��;i�6�Wr��\�g��0R�a��H�d)����{7�����p"��H�C�:���	F��(��X��^D�X���
�����U��\���G�?��A:��b���Y�������s9�j��4�l�$dOho`WB�1�-l���q�\,��E�Gc<�2q�e�R�<Ij��M��}4b|1
����,tSK_��e�b��2��b�y�9�P
O�3}�L�fh$����oPcvWB9���'�q�1U�8~�`_4��-:�tD#���v5��=�~��c�.���q"���7��s�9 ��'�q��*����W��vJ{]|E����O���wQmH1c�����{r{# ��0i��P�8)�T.��4�V�A����Hj��w�����.��	'�3�!g�MoS�!�����zr�^$��
������v��b�p��'� i��Z�{
WU��)g�]�/�U��v���*O��*�}oI9������w�{�R+����"�^,@��/��l�U���p���+ �����Q}��Va�aAvp�u��<�ie�j��_����9����Jdd��5w������5�-���9�w�R�3�{��Z�k�<���k���\z]�8�.�h��Xr��������uzS��������
�w�DR�`zP���u5��e�z�������@;�
���:='���k9	g����X���(����g�X�������;����=m�`|�`��3A#E��n���j#��EV����T�W]�3������i���������1��v!���B�B����i�U�yB�G��J�����Vg��RcS���^���B;��jcS��,�H�5���;Me{�P������$y�5�nr%-�
BCco�b�oO))�����PK�RwK�����mppart_pa/21_1.outUT	�SZ��Zux����Ys7���+�-����P�R�8N�p\^�R[��b�)�e�THjc���n�I�Iq����C�����h�L�������������w��[]�z��"z;��o���l��^	�8�MHQ���?6W,�c~����e�b:�>L�v~_1c��0	�����(���=�^^�8��>�>��C���n�\�-��o����%��a�}���/��||��h�p���k��,�wq���x{�������W���xz����%�[��r�~��h���4_N�����������O���tWDf%�U��1�R���9q�$����F���^$�7JZt�V9���������8+��j+��2i&$=5u5TUGk����|�yWARjQVQ(����m�����
����Y;&�����0j���I��j�9^o����r�8����N�����������z�h��${R7��	���0�����R3���Tt�*zR��j�|��V�	��M�P�J#a��6SZ�=�I����p�gF�&���l"z�����}a����s�������������
��w�f�Do�Z���v�}5_f��W��K�)�lS%�0�<��E������h�qi���v�@���EJYk��Nn����}��S�Mt.�S���r��E�jj����U������`��n�%��0�:����������p�mX*��E��������N�5,�Y����������r�#�q����H]/��ts}��*Os����1��`M���My�KBG���.�������}�����p�b�&�,&��M��]E����w����|���x��N�t����}��&�f�[�Q�>r�3����]a�K����1�������t���b�����nF��f�C�����]�q��<5�4������Kj9���^��J���:�����
�z���lu�>�8�}Cf����tCk�O�T9Lr����pg�TEF[P��#��EWf�rC1�/�h4=��~�A\�e�96�L��U�	�{���,l's�^g�+l-��9s���Q��+��Gc.�0����j�Y��\��� ��<���/���y�{u�\�o
�f����o�jA���+��d���Zn(����M��C��O.h�����k3�H����H�X�9�"�a��)��U}��H\A�1��+��G#��W����p}\�/n�����%��'7�L(X����\�E7G�/O��0�F'.'��d�j�2@�&��af�rC6�/�h�<Z�~rA�>.�DX�G�v���"}�N���������eDN\��<qM���h�5���������k��9A�����:������������[}���T
���������ZP�J���2*�}����RC%���4��I^%n���-K�v��QE�$.���������'�E�������5J�%^��
q���k+����*H��K\���#.e����L�("���k�g�N�$.�Y����&���L�L&.���p��'�$����?q������'�qa0���'�F���������J�R��+��L!q�V�hW�qa�'WU���F\U!�:UTA��7����^�*q,����Yp��?�����o-�k��'g�J��&�tf3P�u�.��j������$��~�W���\\��
�	�nT����u���v�k�\k3�cW�;������R��
p����*�e'����..��V��.��.�r�p/<\3V0*)O�T����x8����Z@��l����h{V-5��$1��x
p{�E	�s3g���U�v�JMh��H�_e��
�����Q��<�1��W�l��#7���`���u��N��Ia k������V�Q��8%&;*�X�������<5�rN�S��s���c�����,���������$��Zn������a�9�n�����4|5��nc�*KH�~�L?������bz�����������k
��0bK����Nz�	O2$���=����5���I�����F}�Od���Gs�e���'"�6�������U�	�|�[�#�t��s�M�|G���|C06�a������ZB��u�
L��HFV-5�y��I0����W���|�AS�~��E�s<3���</n��m�>�� �FG���7Tk�l�Ld�`Cc��
��(Y�B}q4��
����4���U���^�
�1-r�S#Z\}�E��Ry@�����
�P����/��
�SR�!c�5��>�
f��Zn���������������=���C;�J'�C�,
���M�%�a3_~����xt�X��<F�D�^VkM�����4�:f�Y���D�����W�=s�#��D'�l�tq�.jF�=�?�!
zj�������z�����d\���=�g$���������xn3�}z��v�}�����������g���@�jB���6��P�����<w��P�����<w��P�����<w��P�����<w��P�����<w��P�����<w��P�����<w��P�����<w�3Z����������Z�/���j�k;�X��N1��[�&Q��U�P�*�y}��z�����~w^��~}��r�.�[@�F?\�{�f�6���w��01�D�'_-G�������~��z���~������$?�������JA����Xw��E"C�86@���X�b��6�~]�����v�Lq8�������"<*�g
�
��[~H���q�=@��t���a`_[z6��r*��!��
X���?�)q1��q�����U��&��������8y��������AEx�p�`�'����r*���c"<E8h�k�kJ}�g9��m��#%��?K��v�/�`���.���_�"��F������<�$�4���]N��i�"u+�$�L,@��8uV��t��p���\�q�/#��2��\Do��� N�	��wo)������//.�PK�RwK�8xo%<part_pa/6_4.outUT	�SZ{�Zux����]k�0���+t���B:�M�l���m���b��9v�����q;���c)���!&�r^���C!�u��o�_~�����W6L���������'d8M3?�48�9��-�u��M8y�f~10"�`�� &>Z��Pk%��;Um��t��������Q��SH���u8|x�Y�*��2[�����/�u� 7�Ae����������,��t�3\g�.�$���<��&������B�]���8�7�{���>�7m@�������7��
A�(d�
�Y��Q�Q���(j���a�V��m��T�*�n���J3���q@��d>��� !y%s���(	#.��E�����9IR�[�-HM���K��]�0�����X����?J�����q-�\N�wN�3���x\|;"W�����.6�T��06�\�����`�B�~��~��<���WfQ6M���e|�b3!��_y����.��D3I���t�����	�/�CL���-W�1	-���W��p��t:&��$���1�0�����8&#�Q)YM�O�=��$�d;���PE�t�	�a49�
���8����T�jc����&�0���6���������N����^�S�P�����	���61A��5&	}mzL����uzXpl�����Hz�R�����f��a�	zL���60�&��EX[LTG���1i-��@\2�zL�cr
L�M�*���Q�t����$��>B�Z�����Y�|��NNT�
�U��:���C���3�6Am��)<_�y}���\?���G��r�p�����B���h0�PK
wuwK
�Apart_head/UTI�Zux��PKluwK���b"6��Dpart_head/8_3.outUT4�Zux��PKuuwKX?,[g#6���part_head/8_2.outUTF�Zux��PKouwK'��<}#6���part_head/8_1.outUT9�Zux��PKmuwK��3�_"6��kpart_head/8_4.outUT6�Zux��PKsuwK�}���'��part_head/18_3.outUTB�Zux��PKuuwKSuF���'��%part_head/18_2.outUTE�Zux��PKwuwKe�1���'���*part_head/18_1.outUTI�Zux��PKtuwK��	1��'���0part_head/18_4.outUTC�Zux��PKuuwK�+UdQ����6part_head/1_4.outUTE�Zux��PKquwK�vNP���o:part_head/1_3.outUT>�Zux��PKuuwK��aAS���
>part_head/1_2.outUTE�Zux��PKpuwK�(+�T����Apart_head/1_1.outUT<�Zux��PKruwK�(��*x��GEpart_head/14_3.outUT@�Zux��PKquwK��f�7y���Ipart_head/14_2.outUT>�Zux��PKruwKb���4{��@Npart_head/14_1.outUT?�Zux��PKsuwK����#=7���Rpart_head/15_4.outUTA�Zux��PKsuwK�OL9�,��/[part_head/3_4.outUTA�Zux��PKvuwKc�0�=7��^bpart_head/15_3.outUTH�Zux��PKquwK����,���jpart_head/3_3.outUT>�Zux��PKvuwK����>7���qpart_head/15_2.outUTG�Zux��PKwuwK�7o�,��Uzpart_head/3_2.outUTI�Zux��PKmuwK����B7����part_head/15_1.outUT5�Zux��PKruwK�*�l�,����part_head/3_1.outUT@�Zux��PKouwK�Z��'z���part_head/14_4.outUT:�Zux��PKpuwK�a����,��~�part_head/12_3.outUT;�Zux��PKpuwK7�mp�G-��~�part_head/4_3.outUT<�Zux��PKsuwK�*�Q��,��r�part_head/12_2.outUTB�Zux��PKnuwKIPv�G-��i�part_head/4_2.outUT7�Zux��PKouwK�`����,��_�part_head/12_1.outUT:�Zux��PKruwK&�S-�H-��U�part_head/4_1.outUT?�Zux��PKouwK��:'�.��T�part_head/5_4.outUT9�Zux��PKtuwK��06s.��!�part_head/5_3.outUTC�Zux��PKmuwK��;�.����part_head/5_2.outUT6�Zux��PKmuwKNf�r�.����part_head/5_1.outUT6�Zux��PKmuwKx�n^��,���part_head/12_4.outUT5�Zux��PKmuwKO��
�H-��u�part_head/4_4.outUT6�Zux��PKtuwK�53Z�*���v�part_head/10_3.outUTD�Zux��PKruwK�!R� ����part_head/6_3.outUT?�Zux��PKsuwK�]������part_head/10_2.outUTA�Zux��PKluwKc3*��part_head/6_2.outUT4�Zux��PKquwKU�(���epart_head/10_1.outUT=�Zux��PKnuwK{�|���s��}!part_head/21_4.outUT8�Zux��PKmuwKXH�)��L-part_head/6_1.outUT5�Zux��PKouwK,fq�T,���0part_head/7_4.outUT:�Zux��PKruwK-�s���s��F8part_head/21_2.outUT@�Zux��PKouwK"�xG\,��4Dpart_head/7_3.outUT9�Zux��PKvuwKM�j��s���Kpart_head/21_3.outUTH�Zux��PKpuwK�d "b,���Wpart_head/7_2.outUT<�Zux��PKluwK����[,��l_part_head/7_1.outUT4�Zux��PKtuwKB��5�!���gpart_head/10_4.outUTC�Zux��PKpuwK^�'��s��ypart_head/21_1.outUT;�Zux��PKuuwK��� ����part_head/6_4.outUTF�Zux��PK
�RwK�A]�part_pa/UT�SZux��PK�RwK�3�M{:6����part_pa/8_3.outUT�SZux��PK�RwK��p�?6��c�part_pa/8_2.outUT�SZux��PK�RwK��e�>6��5�part_pa/8_1.outUT�SZux��PK�RwKL�-7�>6���part_pa/8_4.outUT�SZux��PK�RwK4�T���'����part_pa/18_3.outUT�SZux��PK�RwK��*���'����part_pa/18_2.outUT�SZux��PK�RwK�c����'����part_pa/18_1.outUT�SZux��PK�RwK�H8��'����part_pa/18_4.outUT�SZux��PK�RwKY���e�����part_pa/1_4.outUT�SZux��PK�RwK�+�g�����part_pa/1_3.outUT�SZux��PK�RwKX��+j���>�part_pa/1_2.outUT�SZux��PK�RwKAc��v�����part_pa/1_1.outUT�SZux��PK�RwKiE������part_pa/14_3.outUT�SZux��PK�RwKl��R����part_pa/14_2.outUT�SZux��PK�RwK���#��k�part_pa/14_1.outUT�SZux��PK�RwK�?����5����part_pa/15_4.outUT�SZux��PK�RwKu@��,���part_pa/3_4.outUT�SZux��PK�RwKbw��C5��`�part_pa/15_3.outUT�SZux��PK�RwK"
��,����part_pa/3_3.outUT�SZux��PK�RwK��e�C5����part_pa/15_2.outUT�SZux��PK�RwK�A��,��part_pa/3_2.outUT�SZux��PK�RwK�M���5��>
part_pa/15_1.outUT�SZux��PK�RwKK(��,���part_pa/3_1.outUT�SZux��PK�RwK�'����part_pa/14_4.outUT�SZux��PK�RwK�>��-��part_pa/12_3.outUT�SZux��PK�RwK��U��n-��2$part_pa/4_3.outUT�SZux��PK�RwK�����-��_*part_pa/12_2.outUT�SZux��PK�RwKb��z�u-��y0part_pa/4_2.outUT�SZux��PK�RwKm��-���6part_pa/12_1.outUT�SZux��PK�RwK���<�m-���<part_pa/4_1.outUT�SZux��PK�RwKS�X��1.���Bpart_pa/5_4.outUT�SZux��PK�RwKmu���/.���Jpart_pa/5_3.outUT�SZux��PK�RwKH~��/.���Rpart_pa/5_2.outUT�SZux��PK�RwK(��2.���Zpart_pa/5_1.outUT�SZux��PK�RwK;%���-���bpart_pa/12_4.outUT�SZux��PK�RwKqp,~�p-���hpart_pa/4_4.outUT�SZux��PK�RwK�����:m���npart_pa/10_3.outUT�SZux��PK�RwK�r�1*=��
|part_pa/6_3.outUT�SZux��PK�RwK�;U�<m���part_pa/10_2.outUT�SZux��PK�RwKI���*;����part_pa/6_2.outUT�SZux��PK�RwK;7�y
q����part_pa/10_1.outUT�SZux��PK�RwKX����p����part_pa/21_4.outUT�SZux��PK�RwKJ};�*D����part_pa/6_1.outUT�SZux��PK�RwK,��1�2,���part_pa/7_4.outUT�SZux��PK�RwK(]�[�pp����part_pa/21_2.outUT�SZux��PK�RwK�p��0,����part_pa/7_3.outUT�SZux��PK�RwK�����pp����part_pa/21_3.outUT�SZux��PK�RwK���~1,����part_pa/7_2.outUT�SZux��PK�RwK;V[p6,��r�part_pa/7_1.outUT�SZux��PK�RwK]����k��+�part_pa/10_4.outUT�SZux��PK�RwK�����mp��A�part_pa/21_1.outUT�SZux��PK�RwK�8xo%<��2�part_pa/6_4.outUT�SZux��PKjj�#�

#131

Rajkumar Raghuwanshi

rajkumar.raghuwanshi@enterprisedb.com

about 8 years ago

In reply to: amul sul (#129)

Re: [HACKERS] Parallel Append implementation

On Thu, Nov 23, 2017 at 2:22 PM, amul sul <sulamul@gmail.com> wrote:

Look like it is the same crash what v20 claim to be fixed, indeed I
missed to add fix[1] in v20 patch, sorry about that. Attached updated
patch includes aforementioned fix.

Hi,

I have applied latest v21 patch, it got crashed when enabled
partition-wise-join,
same query is working fine with and without partition-wise-join
enabled on PG-head.
please take a look.

SET enable_partition_wise_join TO true;

CREATE TABLE pt1 (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE pt1_p1 PARTITION OF pt1 FOR VALUES IN ('0000', '0001',
'0002', '0003');
CREATE TABLE pt1_p2 PARTITION OF pt1 FOR VALUES IN ('0004', '0005',
'0006', '0007');
CREATE TABLE pt1_p3 PARTITION OF pt1 FOR VALUES IN ('0008', '0009',
'0010', '0011');
INSERT INTO pt1 SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i %
30 FROM generate_series(0, 99999) i;
ANALYZE pt1;

CREATE TABLE pt2 (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE pt2_p1 PARTITION OF pt2 FOR VALUES IN ('0000', '0001',
'0002', '0003');
CREATE TABLE pt2_p2 PARTITION OF pt2 FOR VALUES IN ('0004', '0005',
'0006', '0007');
CREATE TABLE pt2_p3 PARTITION OF pt2 FOR VALUES IN ('0008', '0009',
'0010', '0011');
INSERT INTO pt2 SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i %
30 FROM generate_series(0, 99999) i;
ANALYZE pt2;

EXPLAIN ANALYZE
SELECT t1.c, sum(t2.a), COUNT(*) FROM pt1 t1 FULL JOIN pt2 t2 ON t1.c
= t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

stack-trace is given below.

Core was generated by `postgres: parallel worker for PID 73935
'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
238 if (node->chgParam != NULL) /* something changed? */
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
#1 0x00000000006dc72e in ExecAppend (pstate=0x26cd6e0) at nodeAppend.c:207
#2 0x00000000006d1e7c in ExecProcNodeInstr (node=0x26cd6e0) at
execProcnode.c:446
#3 0x00000000006dcee5 in ExecProcNode (node=0x26cd6e0) at
../../../src/include/executor/executor.h:241
#4 0x00000000006dd38c in fetch_input_tuple (aggstate=0x26cd7f8) at
nodeAgg.c:699
#5 0x00000000006e02eb in agg_fill_hash_table (aggstate=0x26cd7f8) at
nodeAgg.c:2536
#6 0x00000000006dfb2b in ExecAgg (pstate=0x26cd7f8) at nodeAgg.c:2148
#7 0x00000000006d1e7c in ExecProcNodeInstr (node=0x26cd7f8) at
execProcnode.c:446
#8 0x00000000006d1e4d in ExecProcNodeFirst (node=0x26cd7f8) at
execProcnode.c:430
#9 0x00000000006c9439 in ExecProcNode (node=0x26cd7f8) at
../../../src/include/executor/executor.h:241
#10 0x00000000006cbd73 in ExecutePlan (estate=0x26ccda0,
planstate=0x26cd7f8, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x26b2ce0, execute_once=1
'\001') at execMain.c:1718
#11 0x00000000006c9a12 in standard_ExecutorRun (queryDesc=0x26d7fa0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:361
#12 0x00000000006c982e in ExecutorRun (queryDesc=0x26d7fa0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:304
#13 0x00000000006d096c in ParallelQueryMain (seg=0x26322a8,
toc=0x7fda24d46000) at execParallel.c:1271
#14 0x000000000053272d in ParallelWorkerMain (main_arg=1203628635) at
parallel.c:1149
#15 0x00000000007e8c99 in StartBackgroundWorker () at bgworker.c:841
#16 0x00000000007fc029 in do_start_bgworker (rw=0x2656d00) at postmaster.c:5741
#17 0x00000000007fc36b in maybe_start_bgworkers () at postmaster.c:5945
#18 0x00000000007fb3fa in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5134
#19 <signal handler called>
#20 0x0000003dd26e1603 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#21 0x00000000007f6bee in ServerLoop () at postmaster.c:1721
#22 0x00000000007f63dd in PostmasterMain (argc=3, argv=0x2630180) at
postmaster.c:1365
#23 0x000000000072cb40 in main (argc=3, argv=0x2630180) at main.c:228

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

#132

amul sul

sulamul@gmail.com

about 8 years ago

In reply to: Rajkumar Raghuwanshi (#131)

Re: [HACKERS] Parallel Append implementation

Thanks a lot Rajkumar for this test. I am able to reproduce this crash by
enabling partition wise join.

The reason for this crash is the same as
the
previous[1] i.e node->as_whichplan
value. This time append->first_partial_plan value looks suspicious. With
the
following change to the v21 patch, I am able to reproduce this crash as
assert
failure when enable_partition_wise_join = ON otherwise working fine.

diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index e3b17cf0e2..4b337ac633 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -458,6 +458,7 @@ choose_next_subplan_for_worker(AppendState *node)

/* Backward scan is not supported by parallel-aware plans */
Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(append->first_partial_plan < node->as_nplans);

LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);

Will look into this more, tomorrow.

1. /messages/by-id/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=
f37WsX8ALF0A=KAHh3w@mail.gmail.com

Regards,
Amul

On Fri, Nov 24, 2017 at 5:00 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:

On Thu, Nov 23, 2017 at 2:22 PM, amul sul <sulamul@gmail.com> wrote:

Look like it is the same crash what v20 claim to be fixed, indeed I
missed to add fix[1] in v20 patch, sorry about that. Attached updated
patch includes aforementioned fix.

Hi,

I have applied latest v21 patch, it got crashed when enabled
partition-wise-join,
same query is working fine with and without partition-wise-join
enabled on PG-head.
please take a look.

SET enable_partition_wise_join TO true;

CREATE TABLE pt1 (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE pt1_p1 PARTITION OF pt1 FOR VALUES IN ('0000', '0001',
'0002', '0003');
CREATE TABLE pt1_p2 PARTITION OF pt1 FOR VALUES IN ('0004', '0005',
'0006', '0007');
CREATE TABLE pt1_p3 PARTITION OF pt1 FOR VALUES IN ('0008', '0009',
'0010', '0011');
INSERT INTO pt1 SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i %
30 FROM generate_series(0, 99999) i;
ANALYZE pt1;

CREATE TABLE pt2 (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE pt2_p1 PARTITION OF pt2 FOR VALUES IN ('0000', '0001',
'0002', '0003');
CREATE TABLE pt2_p2 PARTITION OF pt2 FOR VALUES IN ('0004', '0005',
'0006', '0007');
CREATE TABLE pt2_p3 PARTITION OF pt2 FOR VALUES IN ('0008', '0009',
'0010', '0011');
INSERT INTO pt2 SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i %
30 FROM generate_series(0, 99999) i;
ANALYZE pt2;

EXPLAIN ANALYZE
SELECT t1.c, sum(t2.a), COUNT(*) FROM pt1 t1 FULL JOIN pt2 t2 ON t1.c
= t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
WARNING: terminating connection because of crash of another server

process

DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

stack-trace is given below.

Core was generated by `postgres: parallel worker for PID 73935
'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
238 if (node->chgParam != NULL) /* something changed? */
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
../../../src/include/executor/executor.h:238
#1 0x00000000006dc72e in ExecAppend (pstate=0x26cd6e0) at

nodeAppend.c:207

#2 0x00000000006d1e7c in ExecProcNodeInstr (node=0x26cd6e0) at
execProcnode.c:446
#3 0x00000000006dcee5 in ExecProcNode (node=0x26cd6e0) at
../../../src/include/executor/executor.h:241
#4 0x00000000006dd38c in fetch_input_tuple (aggstate=0x26cd7f8) at
nodeAgg.c:699
#5 0x00000000006e02eb in agg_fill_hash_table (aggstate=0x26cd7f8) at
nodeAgg.c:2536
#6 0x00000000006dfb2b in ExecAgg (pstate=0x26cd7f8) at nodeAgg.c:2148
#7 0x00000000006d1e7c in ExecProcNodeInstr (node=0x26cd7f8) at
execProcnode.c:446
#8 0x00000000006d1e4d in ExecProcNodeFirst (node=0x26cd7f8) at
execProcnode.c:430
#9 0x00000000006c9439 in ExecProcNode (node=0x26cd7f8) at
../../../src/include/executor/executor.h:241
#10 0x00000000006cbd73 in ExecutePlan (estate=0x26ccda0,
planstate=0x26cd7f8, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x26b2ce0, execute_once=1
'\001') at execMain.c:1718
#11 0x00000000006c9a12 in standard_ExecutorRun (queryDesc=0x26d7fa0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:361
#12 0x00000000006c982e in ExecutorRun (queryDesc=0x26d7fa0,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:304
#13 0x00000000006d096c in ParallelQueryMain (seg=0x26322a8,
toc=0x7fda24d46000) at execParallel.c:1271
#14 0x000000000053272d in ParallelWorkerMain (main_arg=1203628635) at
parallel.c:1149
#15 0x00000000007e8c99 in StartBackgroundWorker () at bgworker.c:841
#16 0x00000000007fc029 in do_start_bgworker (rw=0x2656d00) at

postmaster.c:5741

Show quoted text

#17 0x00000000007fc36b in maybe_start_bgworkers () at postmaster.c:5945
#18 0x00000000007fb3fa in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5134
#19 <signal handler called>
#20 0x0000003dd26e1603 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#21 0x00000000007f6bee in ServerLoop () at postmaster.c:1721
#22 0x00000000007f63dd in PostmasterMain (argc=3, argv=0x2630180) at
postmaster.c:1365
#23 0x000000000072cb40 in main (argc=3, argv=0x2630180) at main.c:228

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

#133

amul sul

sulamul@gmail.com

about 8 years ago

In reply to: amul sul (#132)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Mon, Nov 27, 2017 at 10:21 PM, amul sul <sulamul@gmail.com> wrote:

Thanks a lot Rajkumar for this test. I am able to reproduce this crash by
enabling partition wise join.

The reason for this crash is the same as
the
previous[1] i.e node->as_whichplan
value. This time append->first_partial_plan value looks suspicious. With
the
following change to the v21 patch, I am able to reproduce this crash as
assert
failure when enable_partition_wise_join = ON otherwise working fine.
diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index e3b17cf0e2..4b337ac633 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -458,6 +458,7 @@ choose_next_subplan_for_worker(AppendState *node)
/* Backward scan is not supported by parallel-aware plans */
Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(append->first_partial_plan < node->as_nplans);

LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);

Will look into this more, tomorrow.

I haven't reached the actual reason why there wasn't any partial plan
(i.e. value of append->first_partial_plan & node->as_nplans are same)
when the partition-wise join is enabled. I think in this case we could simply
return false from choose_next_subplan_for_worker() when there aren't any
partial plan and we done with all non-partition plan, although I may be wrong
because I am yet to understand this patch.

Here are the changes I did on v21 patch to handle crash reported by Rajkumar[1]:

diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index e3b17cf0e2..e0ee918808 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -479,9 +479,12 @@ choose_next_subplan_for_worker(AppendState *node)
            pstate->pa_next_plan = append->first_partial_plan;
        else
            pstate->pa_next_plan++;
-       if (pstate->pa_next_plan == node->as_whichplan)
+
+       if (pstate->pa_next_plan == node->as_whichplan ||
+           (pstate->pa_next_plan == append->first_partial_plan &&
+            append->first_partial_plan >= node->as_nplans))
        {
-           /* We've tried everything! */
+           /* We've tried everything or there were no partial plans */
            pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
            LWLockRelease(&pstate->pa_lock);
            return false;

Apart from this I have added few assert to keep eye on node->as_whichplan
value in the attached patch, thanks.

1] /messages/by-id/CAKcux6nyDxOyE4PA8O=QgF-ugZp_y1G2U+urmf76-=f2knDsWA@mail.gmail.com

Regards,
Amul

Attachments:

ParallelAppend_v22.patchapplication/octet-stream; name=ParallelAppend_v22.patchDownload

From ad7b87292bd973c96e4720279c255b2f78124310 Mon Sep 17 00:00:00 2001
From: Amul Sul <sulamul@gmail.com>
Date: Thu, 23 Nov 2017 09:34:28 +0530
Subject: [PATCH] ParallelAppend_v22

v22:
 Proposed fix for the crash reported by Rajkumar[3]:
 - Returning false from choose_next_subplan_for_worker() when we are
   done with all non-partial and there aren't any partition plan.

v21:
 Crash fix[2] wasnt included in v20, added in this.

v20:
 Added crash fix [2]

v19_rebased:
 Patch given by Amit Khandekar [1]

 --------
  Ref:
 --------
 1] http://postgr.es/m/CAJ3gD9es=aSpwSkRW4ei-fRB119dUkS75iivUnh_BoCF7a9Bgw@mail.gmail.com
 2] http://postgr.es/m/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=KAHh3w@mail.gmail.com
 3] http://postgr.es/m/CAKcux6nyDxOyE4PA8O%3DQgF-ugZp_y1G2U%2Burmf76-%3Df2knDsWA%40mail.gmail.com
---
 doc/src/sgml/config.sgml                      |  14 ++
 doc/src/sgml/monitoring.sgml                  |   7 +-
 src/backend/executor/execParallel.c           |  19 ++
 src/backend/executor/nodeAppend.c             | 323 +++++++++++++++++++++-----
 src/backend/nodes/copyfuncs.c                 |   1 +
 src/backend/nodes/list.c                      |  38 +++
 src/backend/nodes/outfuncs.c                  |   1 +
 src/backend/nodes/readfuncs.c                 |   1 +
 src/backend/optimizer/path/allpaths.c         | 216 ++++++++++++++---
 src/backend/optimizer/path/costsize.c         | 164 +++++++++++++
 src/backend/optimizer/path/joinrels.c         |   3 +-
 src/backend/optimizer/plan/createplan.c       |  10 +-
 src/backend/optimizer/plan/planner.c          |   5 +-
 src/backend/optimizer/prep/prepunion.c        |   7 +-
 src/backend/optimizer/util/pathnode.c         |  88 +++++--
 src/backend/storage/lmgr/lwlock.c             |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/nodeAppend.h             |   5 +
 src/include/nodes/execnodes.h                 |  14 +-
 src/include/nodes/pg_list.h                   |   3 +
 src/include/nodes/plannodes.h                 |   1 +
 src/include/nodes/relation.h                  |   6 +
 src/include/optimizer/cost.h                  |   2 +
 src/include/optimizer/pathnode.h              |   9 +-
 src/include/storage/lwlock.h                  |   1 +
 src/test/regress/expected/inherit.out         |   2 +
 src/test/regress/expected/select_parallel.out |  86 ++++++-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/inherit.sql              |   2 +
 src/test/regress/sql/select_parallel.sql      |  23 +-
 31 files changed, 935 insertions(+), 130 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3060597011..49c9ae68e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3633,6 +3633,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-parallel-append" xreflabel="enable_parallel_append">
+      <term><varname>enable_parallel_append</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_parallel_append</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of parallel-aware
+        append plan types. The default is <literal>on</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8d461c8145..b6f80d9708 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="62"><literal>LWLock</literal></entry>
+        <entry morerows="63"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1116,6 +1116,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>tbm</literal></entry>
          <entry>Waiting for TBM shared iterator lock.</entry>
         </row>
+        <row>
+         <entry><literal>parallel_append</literal></entry>
+         <entry>Waiting to choose the next subplan during Parallel Append plan
+         execution.</entry>
+        </row>
         <row>
          <entry morerows="9"><literal>Lock</literal></entry>
          <entry><literal>relation</literal></entry>
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53c5254be1..ab832767eb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -26,6 +26,7 @@
 #include "executor/execExpr.h"
 #include "executor/execParallel.h"
 #include "executor/executor.h"
+#include "executor/nodeAppend.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -249,6 +250,11 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecForeignScanEstimate((ForeignScanState *) planstate,
 										e->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendEstimate((AppendState *) planstate,
+									e->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
@@ -448,6 +454,11 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecForeignScanInitializeDSM((ForeignScanState *) planstate,
 											 d->pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeDSM((AppendState *) planstate,
+										 d->pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
@@ -862,6 +873,10 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecForeignScanReInitializeDSM((ForeignScanState *) planstate,
 											   pcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendReInitializeDSM((AppendState *) planstate, pcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanReInitializeDSM((CustomScanState *) planstate,
@@ -1150,6 +1165,10 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 				ExecForeignScanInitializeWorker((ForeignScanState *) planstate,
 												pwcxt);
 			break;
+		case T_AppendState:
+			if (planstate->plan->parallel_aware)
+				ExecAppendInitializeWorker((AppendState *) planstate, pwcxt);
+			break;
 		case T_CustomScanState:
 			if (planstate->plan->parallel_aware)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 1d2fb35d55..ed50002800 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -61,51 +61,26 @@
 #include "executor/nodeAppend.h"
 #include "miscadmin.h"
 
-static TupleTableSlot *ExecAppend(PlanState *pstate);
-static bool exec_append_initialize_next(AppendState *appendstate);
-
-
-/* ----------------------------------------------------------------
- *		exec_append_initialize_next
- *
- *		Sets up the append state node for the "next" scan.
- *
- *		Returns t iff there is a "next" scan to process.
- * ----------------------------------------------------------------
- */
-static bool
-exec_append_initialize_next(AppendState *appendstate)
+/* Shared state for parallel-aware Append. */
+struct ParallelAppendState
 {
-	int			whichplan;
-
+	LWLock		pa_lock;		/* mutual exclusion to choose next subplan */
+	int			pa_next_plan;	/* next plan to choose by any worker */
 	/*
-	 * get information from the append node
+	 * pa_finished[i] should be true if no more workers should select
+	 * subplan i.  for a non-partial plan, this should be set to true as soon
+	 * as a worker selects the plan; for a partial plan, it remains false
+	 * until some worker executes the plan to completion.
 	 */
-	whichplan = appendstate->as_whichplan;
+	bool		pa_finished[FLEXIBLE_ARRAY_MEMBER];
+};
 
-	if (whichplan < 0)
-	{
-		/*
-		 * if scanning in reverse, we start at the last scan in the list and
-		 * then proceed back to the first.. in any case we inform ExecAppend
-		 * that we are at the end of the line by returning false
-		 */
-		appendstate->as_whichplan = 0;
-		return false;
-	}
-	else if (whichplan >= appendstate->as_nplans)
-	{
-		/*
-		 * as above, end the scan if we go beyond the last scan in our list..
-		 */
-		appendstate->as_whichplan = appendstate->as_nplans - 1;
-		return false;
-	}
-	else
-	{
-		return true;
-	}
-}
+#define INVALID_SUBPLAN_INDEX		-1
+
+static TupleTableSlot *ExecAppend(PlanState *pstate);
+static bool choose_next_subplan_locally(AppendState *node);
+static bool choose_next_subplan_for_leader(AppendState *node);
+static bool choose_next_subplan_for_worker(AppendState *node);
 
 /* ----------------------------------------------------------------
  *		ExecInitAppend
@@ -185,10 +160,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->ps.ps_ProjInfo = NULL;
 
 	/*
-	 * initialize to scan first subplan
+	 * Parallel-aware append plans must choose the first subplan to
+	 * execute by looking at shared memory, but non-parallel-aware
+	 * append plans can always start with the first subplan.
 	 */
-	appendstate->as_whichplan = 0;
-	exec_append_initialize_next(appendstate);
+	appendstate->as_whichplan =
+		appendstate->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+
+	/* If parallel-aware, this will be overridden later. */
+	appendstate->choose_next_subplan = choose_next_subplan_locally;
 
 	return appendstate;
 }
@@ -204,6 +184,11 @@ ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
 
+	/* If no subplan has been chosen, we must choose one before proceeding. */
+	if (node->as_whichplan == INVALID_SUBPLAN_INDEX &&
+		!node->choose_next_subplan(node))
+		return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
 	for (;;)
 	{
 		PlanState  *subnode;
@@ -214,6 +199,7 @@ ExecAppend(PlanState *pstate)
 		/*
 		 * figure out which subplan we are currently processing
 		 */
+		Assert(node->as_whichplan >= 0 && node->as_whichplan < node->as_nplans);
 		subnode = node->appendplans[node->as_whichplan];
 
 		/*
@@ -231,19 +217,9 @@ ExecAppend(PlanState *pstate)
 			return result;
 		}
 
-		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
-		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
-		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
+		/* choose new subplan; if none, we're done */
+		if (!node->choose_next_subplan(node))
 			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
-		/* Else loop back and try to get a tuple from the new subplan */
 	}
 }
 
@@ -298,6 +274,237 @@ ExecReScanAppend(AppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
-	node->as_whichplan = 0;
-	exec_append_initialize_next(node);
+
+	node->as_whichplan =
+		node->ps.plan->parallel_aware ? INVALID_SUBPLAN_INDEX : 0;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Append Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecAppendEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendEstimate(AppendState *node,
+					ParallelContext *pcxt)
+{
+	node->pstate_len =
+		add_size(offsetof(ParallelAppendState, pa_finished),
+				 sizeof(bool) * node->as_nplans);
+
+	shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeDSM
+ *
+ *		Set up shared state for Parallel Append.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeDSM(AppendState *node,
+						 ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate;
+
+	pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
+	memset(pstate, 0, node->pstate_len);
+	LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
+	shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+
+	node->as_pstate = pstate;
+	node->choose_next_subplan = choose_next_subplan_for_leader;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+
+	pstate->pa_next_plan = 0;
+	memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecAppendInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate, and initialize
+ *		whatever is required to choose and execute the optimal subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt)
+{
+	node->as_pstate = shm_toc_lookup(pwcxt->toc, node->ps.plan->plan_node_id, false);
+	node->choose_next_subplan = choose_next_subplan_for_worker;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_locally
+ *
+ *		Choose next subplan for a non-parallel-aware Append,
+ *		returning false if there are no more.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_locally(AppendState *node)
+{
+	int			whichplan = node->as_whichplan;
+
+	/* We should never see INVALID_SUBPLAN_INDEX in this case. */
+	Assert(whichplan >= 0 && whichplan <= node->as_nplans);
+
+	if (ScanDirectionIsForward(node->ps.state->es_direction))
+	{
+		if (whichplan >= node->as_nplans - 1)
+			return false;
+		node->as_whichplan++;
+	}
+	else
+	{
+		if (whichplan <= 0)
+			return false;
+		node->as_whichplan--;
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_leader
+ *
+ *      Try to pick a plan which doesn't commit us to doing much
+ *      work locally, so that as much work as possible is done in
+ *      the workers.  Cheapest subplans are at the end.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_leader(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+	{
+		/* Mark just-completed subplan as finished. */
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+	}
+	else
+	{
+		/* Start with last subplan. */
+		node->as_whichplan = node->as_nplans - 1;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[node->as_whichplan])
+	{
+		if (node->as_whichplan == 0)
+		{
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			node->as_whichplan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+		node->as_whichplan--;
+	}
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		choose_next_subplan_for_worker
+ *
+ *		Choose next subplan for a parallel-aware Append, returning
+ *		false if there are no more.
+ *
+ *		We start from the first plan and advance through the list;
+ *		when we get back to the end, we loop back to the first
+ *		nonpartial plan.  This assigns the non-partial plans first
+ *		in order of descending cost and then spreads out the
+ *		workers as evenly as possible across the remaining partial
+ *		plans.
+ * ----------------------------------------------------------------
+ */
+static bool
+choose_next_subplan_for_worker(AppendState *node)
+{
+	ParallelAppendState *pstate = node->as_pstate;
+	Append *append = (Append *) node->ps.plan;
+
+	/* Backward scan is not supported by parallel-aware plans */
+	Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+	LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
+
+	/* Mark just-completed subplan as finished. */
+	if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	/* If all the plans are already done, we have nothing to do */
+	if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
+	{
+		LWLockRelease(&pstate->pa_lock);
+		return false;
+	}
+
+	/* Loop until we find a subplan to execute. */
+	while (pstate->pa_finished[pstate->pa_next_plan])
+	{
+		if (pstate->pa_next_plan >= node->as_nplans - 1)
+			pstate->pa_next_plan = append->first_partial_plan;
+		else
+			pstate->pa_next_plan++;
+
+		if (pstate->pa_next_plan == node->as_whichplan ||
+			(pstate->pa_next_plan == append->first_partial_plan &&
+			 append->first_partial_plan >= node->as_nplans))
+		{
+			/* We've tried everything or there were no partial plans */
+			pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+			LWLockRelease(&pstate->pa_lock);
+			return false;
+		}
+	}
+
+	/* Pick the plan we found, and advance pa_next_plan one more time. */
+	node->as_whichplan = pstate->pa_next_plan++;
+	if (pstate->pa_next_plan >= node->as_nplans)
+	{
+		Assert(append->first_partial_plan < node->as_nplans);
+		pstate->pa_next_plan = append->first_partial_plan;
+	}
+
+	/* If non-partial, immediately mark as finished. */
+	if (node->as_whichplan < append->first_partial_plan)
+		node->as_pstate->pa_finished[node->as_whichplan] = true;
+
+	LWLockRelease(&pstate->pa_lock);
+
+	return true;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d9ff8a7e51..82a511a5dc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -242,6 +242,7 @@ _copyAppend(const Append *from)
 	 */
 	COPY_NODE_FIELD(partitioned_rels);
 	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(first_partial_plan);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index acaf4b5315..bee6244adc 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1249,6 +1249,44 @@ list_copy_tail(const List *oldlist, int nskip)
 	return newlist;
 }
 
+/*
+ * Sort a list using qsort. A sorted list is built but the cells of the
+ * original list are re-used.  The comparator function receives arguments of
+ * type ListCell **
+ */
+List *
+list_qsort(const List *list, list_qsort_comparator cmp)
+{
+	ListCell   *cell;
+	int			i;
+	int			len = list_length(list);
+	ListCell  **list_arr;
+	List	   *new_list;
+
+	if (len == 0)
+		return NIL;
+
+	i = 0;
+	list_arr = palloc(sizeof(ListCell *) * len);
+	foreach(cell, list)
+		list_arr[i++] = cell;
+
+	qsort(list_arr, len, sizeof(ListCell *), cmp);
+
+	new_list = (List *) palloc(sizeof(List));
+	new_list->type = list->type;
+	new_list->length = len;
+	new_list->head = list_arr[0];
+	new_list->tail = list_arr[len - 1];
+
+	for (i = 0; i < len - 1; i++)
+		list_arr[i]->next = list_arr[i + 1];
+
+	list_arr[len - 1]->next = NULL;
+	pfree(list_arr);
+	return new_list;
+}
+
 /*
  * Temporary compatibility functions
  *
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c97ee24ade..b59a5219a7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -399,6 +399,7 @@ _outAppend(StringInfo str, const Append *node)
 
 	WRITE_NODE_FIELD(partitioned_rels);
 	WRITE_NODE_FIELD(appendplans);
+	WRITE_INT_FIELD(first_partial_plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 7eb67fc040..0d17ae89b0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1600,6 +1600,7 @@ _readAppend(void)
 
 	READ_NODE_FIELD(partitioned_rels);
 	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(first_partial_plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08ab37..9cd2884de5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -101,7 +101,8 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static List *accumulate_append_subpath(List *subpaths, Path *path);
+static void accumulate_append_subpath(Path *path,
+						  List **subpaths, List **special_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1331,13 +1332,17 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
 	List	   *partial_subpaths = NIL;
+	List	   *pa_partial_subpaths = NIL;
+	List	   *pa_nonpartial_subpaths = NIL;
 	bool		partial_subpaths_valid = true;
+	bool		pa_subpaths_valid = enable_parallel_append;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
 	RangeTblEntry *rte;
 	bool		build_partitioned_rels = false;
+	double		partial_rows = -1;
 
 	if (IS_SIMPLE_REL(rel))
 	{
@@ -1388,6 +1393,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	{
 		RelOptInfo *childrel = lfirst(l);
 		ListCell   *lcp;
+		Path	   *cheapest_partial_path = NULL;
 
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
@@ -1408,18 +1414,69 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 * If not, there's no workable unparameterized path.
 		 */
 		if (childrel->cheapest_total_path->param_info == NULL)
-			subpaths = accumulate_append_subpath(subpaths,
-												 childrel->cheapest_total_path);
+			accumulate_append_subpath(childrel->cheapest_total_path,
+									  &subpaths, NULL);
 		else
 			subpaths_valid = false;
 
 		/* Same idea, but for a partial plan. */
 		if (childrel->partial_pathlist != NIL)
-			partial_subpaths = accumulate_append_subpath(partial_subpaths,
-														 linitial(childrel->partial_pathlist));
+		{
+			cheapest_partial_path = linitial(childrel->partial_pathlist);
+			accumulate_append_subpath(cheapest_partial_path,
+									  &partial_subpaths, NULL);
+		}
 		else
 			partial_subpaths_valid = false;
 
+		/*
+		 * Same idea, but for a parallel append mixing partial and non-partial
+		 * paths.
+		 */
+		if (pa_subpaths_valid)
+		{
+			Path	   *nppath = NULL;
+
+			nppath =
+				get_cheapest_parallel_safe_total_inner(childrel->pathlist);
+
+			if (cheapest_partial_path == NULL && nppath == NULL)
+			{
+				/* Neither a partial nor a parallel-safe path?  Forget it. */
+				pa_subpaths_valid = false;
+			}
+			else if (nppath == NULL ||
+					 (cheapest_partial_path != NULL &&
+					  cheapest_partial_path->total_cost < nppath->total_cost))
+			{
+				/* Partial path is cheaper or the only option. */
+				Assert(cheapest_partial_path != NULL);
+				accumulate_append_subpath(cheapest_partial_path,
+										  &pa_partial_subpaths,
+										  &pa_nonpartial_subpaths);
+
+			}
+			else
+			{
+				/*
+				 * Either we've got only a non-partial path, or we think that
+				 * a single backend can execute the best non-partial path
+				 * faster than all the parallel backends working together can
+				 * execute the best partial path.
+				 *
+				 * It might make sense to be more aggressive here.  Even if
+				 * the best non-partial path is more expensive than the best
+				 * partial path, it could still be better to choose the
+				 * non-partial path if there are several such paths that can
+				 * be given to different workers.  For now, we don't try to
+				 * figure that out.
+				 */
+				accumulate_append_subpath(nppath,
+										  &pa_nonpartial_subpaths,
+										  NULL);
+			}
+		}
+
 		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
@@ -1491,11 +1548,13 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
-												  partitioned_rels));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NIL,
+												  NULL, 0, false,
+												  partitioned_rels, -1));
 
 	/*
-	 * Consider an append of partial unordered, unparameterized partial paths.
+	 * Consider an append of unordered, unparameterized partial paths.  Make
+	 * it parallel-aware if possible.
 	 */
 	if (partial_subpaths_valid)
 	{
@@ -1503,12 +1562,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
-		/*
-		 * Decide on the number of workers to request for this append path.
-		 * For now, we just use the maximum value from among the members.  It
-		 * might be useful to use a higher number if the Append node were
-		 * smart enough to spread out the workers, but it currently isn't.
-		 */
+		/* Find the highest number of workers requested for any subpath. */
 		foreach(lc, partial_subpaths)
 		{
 			Path	   *path = lfirst(lc);
@@ -1517,9 +1571,78 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		}
 		Assert(parallel_workers > 0);
 
+		/*
+		 * If the use of parallel append is permitted, always request at least
+		 * log2(# of children) paths.  We assume it can be useful to have
+		 * extra workers in this case because they will be spread out across
+		 * the children.  The precise formula is just a guess, but we don't
+		 * want to end up with a radically different answer for a table with N
+		 * partitions vs. an unpartitioned table with the same data, so the
+		 * use of some kind of log-scaling here seems to make some sense.
+		 */
+		if (enable_parallel_append)
+		{
+			parallel_workers = Max(parallel_workers,
+								   fls(list_length(live_childrels)));
+			parallel_workers = Min(parallel_workers,
+								   max_parallel_workers_per_gather);
+		}
+		Assert(parallel_workers > 0);
+
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers, partitioned_rels);
+		appendpath = create_append_path(rel, NIL, partial_subpaths, NULL,
+										parallel_workers,
+										enable_parallel_append,
+										partitioned_rels, -1);
+
+		/*
+		 * Make sure any subsequent partial paths use the same row count
+		 * estimate.
+		 */
+		partial_rows = appendpath->path.rows;
+
+		/* Add the path. */
+		add_partial_path(rel, (Path *) appendpath);
+	}
+
+	/*
+	 * Consider a parallel-aware append using a mix of partial and non-partial
+	 * paths.  (This only makes sense if there's at least one child which has
+	 * a non-partial path that is substantially cheaper than any partial path;
+	 * otherwise, we should use the append path added in the previous step.)
+	 */
+	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
+	{
+		AppendPath *appendpath;
+		ListCell   *lc;
+		int			parallel_workers = 0;
+
+		/*
+		 * Find the highest number of workers requested for any partial
+		 * subpath.
+		 */
+		foreach(lc, pa_partial_subpaths)
+		{
+			Path	   *path = lfirst(lc);
+
+			parallel_workers = Max(parallel_workers, path->parallel_workers);
+		}
+
+		/*
+		 * Same formula here as above.  It's even more important in this
+		 * instance because the non-partial paths won't contribute anything to
+		 * the planned number of parallel workers.
+		 */
+		parallel_workers = Max(parallel_workers,
+							   fls(list_length(live_childrels)));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+		Assert(parallel_workers > 0);
+
+		appendpath = create_append_path(rel, pa_nonpartial_subpaths,
+										pa_partial_subpaths,
+										NULL, parallel_workers, true,
+										partitioned_rels, partial_rows);
 		add_partial_path(rel, (Path *) appendpath);
 	}
 
@@ -1567,13 +1690,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			subpaths = accumulate_append_subpath(subpaths, subpath);
+			accumulate_append_subpath(subpath, &subpaths, NULL);
 		}
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0,
-										partitioned_rels));
+					 create_append_path(rel, subpaths, NIL,
+										required_outer, 0, false,
+										partitioned_rels, -1));
 	}
 }
 
@@ -1657,10 +1781,10 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			if (cheapest_startup != cheapest_total)
 				startup_neq_total = true;
 
-			startup_subpaths =
-				accumulate_append_subpath(startup_subpaths, cheapest_startup);
-			total_subpaths =
-				accumulate_append_subpath(total_subpaths, cheapest_total);
+			accumulate_append_subpath(cheapest_startup,
+									  &startup_subpaths, NULL);
+			accumulate_append_subpath(cheapest_total,
+									  &total_subpaths, NULL);
 		}
 
 		/* ... and build the MergeAppend paths */
@@ -1756,7 +1880,7 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * accumulate_append_subpath
- *		Add a subpath to the list being built for an Append or MergeAppend
+ *		Add a subpath to the list being built for an Append or MergeAppend.
  *
  * It's possible that the child is itself an Append or MergeAppend path, in
  * which case we can "cut out the middleman" and just add its child paths to
@@ -1767,26 +1891,53 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * omitting a sort step, which seems fine: if the parent is to be an Append,
  * its result would be unsorted anyway, while if the parent is to be a
  * MergeAppend, there's no point in a separate sort on a child.
+ * its result would be unsorted anyway.
+ *
+ * Normally, either path is a partial path and subpaths is a list of partial
+ * paths, or else path is a non-partial plan and subpaths is a list of those.
+ * However, if path is a parallel-aware Append, then we add its partial path
+ * children to subpaths and the rest to special_subpaths.  If the latter is
+ * NULL, we don't flatten the path at all (unless it contains only partial
+ * paths).
  */
-static List *
-accumulate_append_subpath(List *subpaths, Path *path)
+static void
+accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
 {
 	if (IsA(path, AppendPath))
 	{
 		AppendPath *apath = (AppendPath *) path;
 
-		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(apath->subpaths));
+		if (!apath->path.parallel_aware || apath->first_partial_path == 0)
+		{
+			/* list_copy is important here to avoid sharing list substructure */
+			*subpaths = list_concat(*subpaths, list_copy(apath->subpaths));
+			return;
+		}
+		else if (special_subpaths != NULL)
+		{
+			List	   *new_special_subpaths;
+
+			/* Split Parallel Append into partial and non-partial subpaths */
+			*subpaths = list_concat(*subpaths,
+									list_copy_tail(apath->subpaths,
+												   apath->first_partial_path));
+			new_special_subpaths =
+				list_truncate(list_copy(apath->subpaths),
+							  apath->first_partial_path);
+			*special_subpaths = list_concat(*special_subpaths,
+											new_special_subpaths);
+		}
 	}
 	else if (IsA(path, MergeAppendPath))
 	{
 		MergeAppendPath *mpath = (MergeAppendPath *) path;
 
 		/* list_copy is important here to avoid sharing list substructure */
-		return list_concat(subpaths, list_copy(mpath->subpaths));
+		*subpaths = list_concat(*subpaths, list_copy(mpath->subpaths));
+		return;
 	}
-	else
-		return lappend(subpaths, path);
+
+	*subpaths = lappend(*subpaths, path);
 }
 
 /*
@@ -1809,7 +1960,8 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d11bf19e30..877827dcb5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_parallel_append = true;
 
 typedef struct
 {
@@ -160,6 +161,8 @@ static Selectivity get_foreign_key_join_selectivity(PlannerInfo *root,
 								 Relids inner_relids,
 								 SpecialJoinInfo *sjinfo,
 								 List **restrictlist);
+static Cost append_nonpartial_cost(List *subpaths, int numpaths,
+					   int parallel_workers);
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
@@ -1741,6 +1744,167 @@ cost_sort(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * append_nonpartial_cost
+ *	  Estimate the cost of the non-partial paths in a Parallel Append.
+ *	  The non-partial paths are assumed to be the first "numpaths" paths
+ *	  from the subpaths list, and to be in order of decreasing cost.
+ */
+static Cost
+append_nonpartial_cost(List *subpaths, int numpaths, int parallel_workers)
+{
+	Cost	   *costarr;
+	int			arrlen;
+	ListCell   *l;
+	ListCell   *cell;
+	int			i;
+	int			path_index;
+	int			min_index;
+	int			max_index;
+
+	if (numpaths == 0)
+		return 0;
+
+	/*
+	 * Array length is number of workers or number of relevants paths,
+	 * whichever is less.
+	 */
+	arrlen = Min(parallel_workers, numpaths);
+	costarr = (Cost *) palloc(sizeof(Cost) * arrlen);
+
+	/* The first few paths will each be claimed by a different worker. */
+	path_index = 0;
+	foreach(cell, subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(cell);
+
+		if (path_index == arrlen)
+			break;
+		costarr[path_index++] = subpath->total_cost;
+	}
+
+	/*
+	 * Since subpaths are sorted by decreasing cost, the last one will have
+	 * the minimum cost.
+	 */
+	min_index = arrlen - 1;
+
+	/*
+	 * For each of the remaining subpaths, add its cost to the array element
+	 * with minimum cost.
+	 */
+	for_each_cell(l, cell)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+		int			i;
+
+		/* Consider only the non-partial paths */
+		if (path_index++ == numpaths)
+			break;
+
+		costarr[min_index] += subpath->total_cost;
+
+		/* Update the new min cost array index */
+		for (min_index = i = 0; i < arrlen; i++)
+		{
+			if (costarr[i] < costarr[min_index])
+				min_index = i;
+		}
+	}
+
+	/* Return the highest cost from the array */
+	for (max_index = i = 0; i < arrlen; i++)
+	{
+		if (costarr[i] > costarr[max_index])
+			max_index = i;
+	}
+
+	return costarr[max_index];
+}
+
+/*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * We charge nothing extra for the Append itself, which perhaps is too
+ * optimistic, but since it doesn't do any selection or projection, it is a
+ * pretty cheap node.
+ */
+void
+cost_append(AppendPath *apath)
+{
+	ListCell   *l;
+
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+
+	if (apath->subpaths == NIL)
+		return;
+
+	if (!apath->path.parallel_aware)
+	{
+		Path	   *subpath = (Path *) linitial(apath->subpaths);
+
+		/*
+		 * Startup cost of non-parallel-aware Append is the startup cost of
+		 * first subpath.
+		 */
+		apath->path.startup_cost = subpath->startup_cost;
+
+		/* Compute rows and costs as sums of subplan rows and costs. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			apath->path.rows += subpath->rows;
+			apath->path.total_cost += subpath->total_cost;
+		}
+	}
+	else						/* parallel-aware */
+	{
+		int			i = 0;
+		double		parallel_divisor = get_parallel_divisor(&apath->path);
+
+		/* Calculate startup cost. */
+		foreach(l, apath->subpaths)
+		{
+			Path	   *subpath = (Path *) lfirst(l);
+
+			/*
+			 * Append will start returning tuples when the child node having
+			 * lowest startup cost is done setting up. We consider only the
+			 * first few subplans that immediately get a worker assigned.
+			 */
+			if (i == 0)
+				apath->path.startup_cost = subpath->startup_cost;
+			else if (i < apath->path.parallel_workers)
+				apath->path.startup_cost = Min(apath->path.startup_cost,
+											   subpath->startup_cost);
+
+			/*
+			 * Apply parallel divisor to non-partial subpaths.  Also add the
+			 * cost of partial paths to the total cost, but ignore non-partial
+			 * paths for now.
+			 */
+			if (i < apath->first_partial_path)
+				apath->path.rows += subpath->rows / parallel_divisor;
+			else
+			{
+				apath->path.rows += subpath->rows;
+				apath->path.total_cost += subpath->total_cost;
+			}
+
+			i++;
+		}
+
+		/* Add cost for non-partial subpaths. */
+		apath->path.total_cost +=
+			append_nonpartial_cost(apath->subpaths,
+								   apath->first_partial_path,
+								   apath->path.parallel_workers);
+	}
+}
+
 /*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 453f25964a..5e03f8bc21 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1232,7 +1232,8 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NIL, NULL,
+											  0, false, NIL, -1));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d4454779ee..f6c83d0477 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -203,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 						 Index scanrelid, char *enrname);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1059,7 +1060,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 * parent-rel Vars it'll be asked to emit.
 	 */
 
-	plan = make_append(subplans, tlist, best_path->partitioned_rels);
+	plan = make_append(subplans, best_path->first_partial_path,
+					   tlist, best_path->partitioned_rels);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5294,7 +5296,8 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(List *appendplans, int first_partial_plan,
+			List *tlist, List *partitioned_rels)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5305,6 +5308,7 @@ make_append(List *appendplans, List *tlist, List *partitioned_rels)
 	plan->righttree = NULL;
 	node->partitioned_rels = partitioned_rels;
 	node->appendplans = appendplans;
+	node->first_partial_plan = first_partial_plan;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f6b8bbf5fa..bdd09ce78b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3686,9 +3686,12 @@ create_grouping_paths(PlannerInfo *root,
 			path = (Path *)
 				create_append_path(grouped_rel,
 								   paths,
+								   NIL,
 								   NULL,
 								   0,
-								   NIL);
+								   false,
+								   NIL,
+								   -1);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index f620243ab4..a24e8acfa6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,8 +590,8 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
-
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
 
@@ -702,7 +702,8 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+	path = (Path *) create_append_path(result_rel, pathlist, NIL,
+									   NULL, 0, false, NIL, -1);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 68dee0f501..75c53fbeb2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -51,6 +51,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
+static int	append_total_cost_compare(const void *a, const void *b);
+static int	append_startup_cost_compare(const void *a, const void *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
 								 List *pathlist,
 								 RelOptInfo *child_rel);
@@ -1208,44 +1210,50 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers, List *partitioned_rels)
+create_append_path(RelOptInfo *rel,
+				   List *subpaths, List *partial_subpaths,
+				   Relids required_outer,
+				   int parallel_workers, bool parallel_aware,
+				   List *partitioned_rels, double rows)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
 
+	Assert(!parallel_aware || parallel_workers > 0);
+
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
-	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_aware = parallel_aware;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
 	pathnode->path.pathkeys = NIL;	/* result is always considered unsorted */
 	pathnode->partitioned_rels = list_copy(partitioned_rels);
-	pathnode->subpaths = subpaths;
 
 	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	 * For parallel append, non-partial paths are sorted by descending total
+	 * costs. That way, the total time to finish all non-partial paths is
+	 * minimized.  Also, the partial paths are sorted by descending startup
+	 * costs.  There may be some paths that require to do startup work by a
+	 * single worker.  In such case, it's better for workers to choose the
+	 * expensive ones first, whereas the leader should choose the cheapest
+	 * startup plan.
 	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	if (pathnode->path.parallel_aware)
+	{
+		subpaths = list_qsort(subpaths, append_total_cost_compare);
+		partial_subpaths = list_qsort(partial_subpaths,
+									  append_startup_cost_compare);
+	}
+	pathnode->first_partial_path = list_length(subpaths);
+	pathnode->subpaths = list_concat(subpaths, partial_subpaths);
+
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1253,9 +1261,53 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	Assert(!parallel_aware || pathnode->path.parallel_safe);
+
+	cost_append(pathnode);
+
+	/* If the caller provided a row estimate, override the computed value. */
+	if (rows >= 0)
+		pathnode->path.rows = rows;
+
 	return pathnode;
 }
 
+/*
+ * append_total_cost_compare
+ *	  list_qsort comparator for sorting append child paths by total_cost
+ */
+static int
+append_total_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->total_cost > path2->total_cost)
+		return -1;
+	if (path1->total_cost < path2->total_cost)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * append_startup_cost_compare
+ *	  list_qsort comparator for sorting append child paths by startup_cost
+ */
+static int
+append_startup_cost_compare(const void *a, const void *b)
+{
+	Path	   *path1 = (Path *) lfirst(*(ListCell **) a);
+	Path	   *path2 = (Path *) lfirst(*(ListCell **) b);
+
+	if (path1->startup_cost > path2->startup_cost)
+		return -1;
+	if (path1->startup_cost < path2->startup_cost)
+		return 1;
+
+	return 0;
+}
+
 /*
  * create_merge_append_path
  *	  Creates a path corresponding to a MergeAppend plan, returning the
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5c3e86709..46f5c4277d 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6dcd738be6..0f7a96d85c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of parallel append plans."),
+			NULL
+		},
+		&enable_parallel_append,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 16ffbbeea8..842cf3601a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -296,6 +296,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4e38a1380e..d42d50614c 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -14,10 +14,15 @@
 #ifndef NODEAPPEND_H
 #define NODEAPPEND_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 
 extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
 extern void ExecEndAppend(AppendState *node);
 extern void ExecReScanAppend(AppendState *node);
+extern void ExecAppendEstimate(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt);
+extern void ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt);
 
 #endif							/* NODEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e05bc04f52..3a2e0313e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"
 #include "utils/reltrigger.h"
@@ -1000,13 +1001,22 @@ typedef struct ModifyTableState
  *		whichplan		which plan is being executed (0 .. n-1)
  * ----------------
  */
-typedef struct AppendState
+
+struct AppendState;
+typedef struct AppendState AppendState;
+struct ParallelAppendState;
+typedef struct ParallelAppendState ParallelAppendState;
+
+struct AppendState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	PlanState **appendplans;	/* array of PlanStates for my inputs */
 	int			as_nplans;
 	int			as_whichplan;
-} AppendState;
+	ParallelAppendState *as_pstate;	/* parallel coordination info */
+	Size		pstate_len;		/* size of parallel coordination info */
+	bool		(*choose_next_subplan) (AppendState *);
+};
 
 /* ----------------
  *	 MergeAppendState information
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 667d5e269c..711db92576 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -269,6 +269,9 @@ extern void list_free_deep(List *list);
 extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 
+typedef int (*list_qsort_comparator) (const void *a, const void *b);
+extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+
 /*
  * To ease migration to the new list API, a set of compatibility
  * macros are provided that reduce the impact of the list API changes
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9b38d44ba0..02fb366680 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,7 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *appendplans;
+	int			first_partial_plan;
 } Append;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9e68e65cc6..3e6e02c1f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1255,6 +1255,9 @@ typedef struct CustomPath
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
+ * For partial Append, 'subpaths' contains non-partial subpaths followed by
+ * partial subpaths.
+ *
  * Note: it is possible for "subpaths" to contain only one, or even no,
  * elements.  These cases are optimized during create_append_plan.
  * In particular, an AppendPath with no subpaths is a "dummy" path that
@@ -1266,6 +1269,9 @@ typedef struct AppendPath
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 	List	   *subpaths;		/* list of component Paths */
+
+	/* Index of first partial path in subpaths */
+	int			first_partial_path;
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6c2317df39..5a1fbf97c3 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_parallel_append;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
@@ -106,6 +107,7 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(AppendPath *path);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..b4083c6a55 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -14,6 +14,7 @@
 #ifndef PATHNODE_H
 #define PATHNODE_H
 
+#include "nodes/bitmapset.h"
 #include "nodes/relation.h"
 
 
@@ -63,9 +64,11 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers,
-				   List *partitioned_rels);
+extern AppendPath *create_append_path(RelOptInfo *rel,
+					List *subpaths, List *partial_subpaths,
+					Relids required_outer,
+					int parallel_workers, bool parallel_aware,
+					List *partitioned_rels, double rows);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 596fdadc63..460843d73e 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_PARALLEL_APPEND,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..96921552bb 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1404,6 +1404,7 @@ select min(1-id) from matest0;
 
 reset enable_indexscan;
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
                             QUERY PLAN                            
 ------------------------------------------------------------------
@@ -1470,6 +1471,7 @@ select min(1-id) from matest0;
 (1 row)
 
 reset enable_seqscan;
+reset enable_parallel_append;
 drop table matest0 cascade;
 NOTICE:  drop cascades to 3 other objects
 DETAIL:  drop cascades to table matest1
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d1d5b228ce..1aa2fd0497 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -11,8 +11,33 @@ set parallel_setup_cost=0;
 set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on c_star
+                     ->  Parallel Seq Scan on d_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
                      QUERY PLAN                      
 -----------------------------------------------------
  Finalize Aggregate
@@ -28,12 +53,63 @@ explain (costs off)
                      ->  Parallel Seq Scan on f_star
 (11 rows)
 
-select count(*) from a_star;
- count 
--------
-    50
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 3
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Seq Scan on d_star
+                     ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
+                     ->  Parallel Seq Scan on b_star
+                     ->  Parallel Seq Scan on e_star
+                     ->  Parallel Seq Scan on f_star
+(11 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
+(1 row)
+
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+           QUERY PLAN           
+--------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on a_star
+         ->  Seq Scan on b_star
+         ->  Seq Scan on c_star
+         ->  Seq Scan on d_star
+         ->  Seq Scan on e_star
+         ->  Seq Scan on f_star
+(8 rows)
+
+select round(avg(aa)), sum(aa) from a_star;
+ round | sum 
+-------+-----
+    14 | 355
 (1 row)
 
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 -- test with leader participation disabled
 set parallel_leader_participation = off;
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f301d..2b738aae7c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_parallel_append     | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc0f5..3fafc5f61a 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -508,11 +508,13 @@ select min(1-id) from matest0;
 reset enable_indexscan;
 
 set enable_seqscan = off;  -- plan with fewest seqscans should be merge
+set enable_parallel_append = off; -- Don't let parallel-append interfere
 explain (verbose, costs off) select * from matest0 order by 1-id;
 select * from matest0 order by 1-id;
 explain (verbose, costs off) select min(1-id) from matest0;
 select min(1-id) from matest0;
 reset enable_seqscan;
+reset enable_parallel_append;
 
 drop table matest0 cascade;
 
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index bb4e34adbe..07937fa4be 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -15,9 +15,28 @@ set parallel_tuple_cost=0;
 set min_parallel_table_scan_size=0;
 set max_parallel_workers_per_gather=4;
 
+-- test Parallel Append.
 explain (costs off)
-  select count(*) from a_star;
-select count(*) from a_star;
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to on;
+-- Mix of partial and non-partial subplans.
+alter table c_star set (parallel_workers = 0);
+alter table d_star set (parallel_workers = 0);
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+set enable_parallel_append to off;
+explain (costs off)
+  select round(avg(aa)), sum(aa) from a_star;
+select round(avg(aa)), sum(aa) from a_star;
+reset enable_parallel_append;
+alter table c_star reset (parallel_workers);
+alter table d_star reset (parallel_workers);
 
 -- test with leader participation disabled
 set parallel_leader_participation = off;
-- 
2.14.1

#134

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: amul sul (#133)

Re: [HACKERS] Parallel Append implementation

On Tue, Nov 28, 2017 at 8:02 PM, amul sul <sulamul@gmail.com> wrote:

Apart from this I have added few assert to keep eye on node->as_whichplan
value in the attached patch, thanks.

This is still hot, moved to next CF.
--
Michael

#135

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: amul sul (#133)

Re: [HACKERS] Parallel Append implementation

On Tue, Nov 28, 2017 at 6:02 AM, amul sul <sulamul@gmail.com> wrote:

Here are the changes I did on v21 patch to handle crash reported by Rajkumar[1]:

diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index e3b17cf0e2..e0ee918808 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -479,9 +479,12 @@ choose_next_subplan_for_worker(AppendState *node)
pstate->pa_next_plan = append->first_partial_plan;
else
pstate->pa_next_plan++;
-       if (pstate->pa_next_plan == node->as_whichplan)
+
+       if (pstate->pa_next_plan == node->as_whichplan ||
+           (pstate->pa_next_plan == append->first_partial_plan &&
+            append->first_partial_plan >= node->as_nplans))
{
-           /* We've tried everything! */
+           /* We've tried everything or there were no partial plans */
pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
LWLockRelease(&pstate->pa_lock);
return false;

I changed this around a little, added a test case, and committed this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#136

Amit Khandekar

amitdkhan.pg@gmail.com

about 8 years ago

In reply to: Robert Haas (#135)

Re: [HACKERS] Parallel Append implementation

On 6 December 2017 at 04:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 28, 2017 at 6:02 AM, amul sul <sulamul@gmail.com> wrote:

Here are the changes I did on v21 patch to handle crash reported by Rajkumar[1]:

diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index e3b17cf0e2..e0ee918808 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -479,9 +479,12 @@ choose_next_subplan_for_worker(AppendState *node)
pstate->pa_next_plan = append->first_partial_plan;
else
pstate->pa_next_plan++;
-       if (pstate->pa_next_plan == node->as_whichplan)
+
+       if (pstate->pa_next_plan == node->as_whichplan ||
+           (pstate->pa_next_plan == append->first_partial_plan &&
+            append->first_partial_plan >= node->as_nplans))
{
-           /* We've tried everything! */
+           /* We've tried everything or there were no partial plans */
pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
LWLockRelease(&pstate->pa_lock);
return false;

I changed this around a little, added a test case, and committed this.

Thanks Robert !

The crash that is reported on pgsql-committers, is being discussed on
that list itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#137

Adrien Nayrat

adrien.nayrat@anayrat.info

almost 8 years ago

In reply to: Robert Haas (#135)

Re: [HACKERS] Parallel Append implementation

Hello,

I notice Parallel append is not listed on Parallel Plans documentation :
https://www.postgresql.org/docs/devel/static/parallel-plans.html

If you agree I can add it to Open Items.

Thanks,

--
Adrien NAYRAT

#138

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Adrien Nayrat (#137)

Re: [HACKERS] Parallel Append implementation

On Sat, Apr 7, 2018 at 10:21 AM, Adrien Nayrat
<adrien.nayrat@anayrat.info> wrote:

I notice Parallel append is not listed on Parallel Plans documentation :
https://www.postgresql.org/docs/devel/static/parallel-plans.html

I agree it might be nice to mention this somewhere on this page, but
I'm not exactly sure where it would make logical sense to put it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#139

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Robert Haas (#138)

Re: [HACKERS] Parallel Append implementation

On Tue, May 8, 2018 at 5:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Apr 7, 2018 at 10:21 AM, Adrien Nayrat
<adrien.nayrat@anayrat.info> wrote:

I notice Parallel append is not listed on Parallel Plans documentation :
https://www.postgresql.org/docs/devel/static/parallel-plans.html

I agree it might be nice to mention this somewhere on this page, but
I'm not exactly sure where it would make logical sense to put it.

It's not a scan, it's not a join and it's not an aggregation so I
think it needs to be in a new <sect2> as the same level as those
others. It's a different kind of thing.

--
Thomas Munro
http://www.enterprisedb.com

#140

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Thomas Munro (#139)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Tue, May 8, 2018 at 12:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

It's not a scan, it's not a join and it's not an aggregation so I
think it needs to be in a new <sect2> as the same level as those
others. It's a different kind of thing.

I'm a little skeptical about that idea because I'm not sure it's
really in the same category as far as importance is concerned, but I
don't have a better idea. Here's a patch. I'm worried this is too
much technical jargon, but I don't know how to explain it any more
simply.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

parallel-append-doc.patchapplication/octet-stream; name=parallel-append-doc.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index d8f001d4b6..ee1023a98c 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -401,6 +401,55 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 
  </sect2>
 
+ <sect2 id="parallel-append">
+  <title>Parallel Append</title>
+
+  <para>
+    Whenever <productname>PostgreSQL</productname> needs to combine rows
+    from multiple sources into a single result set, it uses an
+    <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
+    This commonly happens when implementing <literal>UNION ALL</literal> or
+    when scanning a partitioned table.  Such nodes can be used in parallel
+    plans just as they can in any other plan.  However, in a parallel plan,
+    it is also possible that the planner may choose to substitute a
+    <literal>Parallel Append</literal> node.
+  </para>
+
+  <para>
+    When an <literal>Append</literal> node is used in a parallel plan, each
+    process will execute the child plans in the order in which they appear,
+    so that all workers cooperate to execute the first child plan until it is
+    complete and then move to the second plan at around the same time.
+    When a <literal>Parallel Append</literal> is used instead, the executor
+    will instead spread out the workers as evenly as possible across its child
+    plans, so that multiple child plans are executed simultaneously.  This
+    avoids contention between the workers, and also avoids paying the startup
+    cost of a child plan in those workers that never execute it.
+  </para>
+
+  <para>
+    Also, unlike a regular <literal>Append</literal> node, which can only have
+    partial children when used within a parallel plan, <literal>Parallel
+    Append</literal> node can have both partial and non-partial child plans.
+    Non-partial children will be scanned by only a single worker, since
+    scanning them more than once would preduce duplicate results.  Plans that
+    involve appending multiple results sets can therefore achieve
+    coarse-grained parallelism even when efficient partial plans are not
+    available.  For example, consider a query against a partitioned table
+    which can be only be implemented efficiently by using an index that does
+    not support parallel scans.  The planner might choose a <literal>Parallel
+    Append</literal> of regular <literal>Index Scan</literal> plans; each
+    individual index scan would have to be executed to completion by a single
+    process, but different scans could be performed at the same time by
+    different processes.
+  </para>
+
+  <para>
+    <xref linkend="guc-enable-parallel-append" /> can be used to disable
+    this feature.
+  </para>
+ </sect2>
+
  <sect2 id="parallel-plan-tips">
   <title>Parallel Plan Tips</title>

#141

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Robert Haas (#140)

Re: [HACKERS] Parallel Append implementation

On Wed, May 9, 2018 at 1:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 8, 2018 at 12:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

It's not a scan, it's not a join and it's not an aggregation so I
think it needs to be in a new <sect2> as the same level as those
others. It's a different kind of thing.

I'm a little skeptical about that idea because I'm not sure it's
really in the same category as far as importance is concerned, but I
don't have a better idea. Here's a patch. I'm worried this is too
much technical jargon, but I don't know how to explain it any more
simply.

+ scanning them more than once would preduce duplicate results. Plans that

s/preduce/produce/

+    <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
vs.
+    Append</literal> of regular <literal>Index Scan</literal> plans; each

I think we should standardise on <literal>Foo Bar</literal>,
<literal>FooBar</literal> or <emphasis>foo bar</emphasis> when
discussing executor nodes on this page.

--
Thomas Munro
http://www.enterprisedb.com

#142

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Thomas Munro (#141)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Tue, May 8, 2018 at 5:05 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

+ scanning them more than once would preduce duplicate results. Plans that

s/preduce/produce/

Fixed, thanks.

+    <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
vs.
+    Append</literal> of regular <literal>Index Scan</literal> plans; each
I think we should standardise on <literal>Foo Bar</literal>,
<literal>FooBar</literal> or <emphasis>foo bar</emphasis> when
discussing executor nodes on this page.

Well, EXPLAIN prints MergeAppend but Index Scan, and I think we should
follow that precedent here.

As for <emphasis> vs. <literal>, I think the reason I ended up using
<emphasis> in the section on scans was because I thought that
<literal>Parallel Seq Scan</literal> might be confusing (what's a
"seq"?), so I tried to fudge my way around that by referring to it as
an abstract idea rather than the exact EXPLAIN output. You then
copied that style in the join section, and, well, like you say, now we
have a sort of hodgepodge of styles. Maybe that's a problem for
another patch, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

parallel-append-doc-v2.patchapplication/octet-stream; name=parallel-append-doc-v2.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index d8f001d4b6..6c0598a7ef 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -401,6 +401,55 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 
  </sect2>
 
+ <sect2 id="parallel-append">
+  <title>Parallel Append</title>
+
+  <para>
+    Whenever <productname>PostgreSQL</productname> needs to combine rows
+    from multiple sources into a single result set, it uses an
+    <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
+    This commonly happens when implementing <literal>UNION ALL</literal> or
+    when scanning a partitioned table.  Such nodes can be used in parallel
+    plans just as they can in any other plan.  However, in a parallel plan,
+    it is also possible that the planner may choose to substitute a
+    <literal>Parallel Append</literal> node.
+  </para>
+
+  <para>
+    When an <literal>Append</literal> node is used in a parallel plan, each
+    process will execute the child plans in the order in which they appear,
+    so that all workers cooperate to execute the first child plan until it is
+    complete and then move to the second plan at around the same time.
+    When a <literal>Parallel Append</literal> is used instead, the executor
+    will instead spread out the workers as evenly as possible across its child
+    plans, so that multiple child plans are executed simultaneously.  This
+    avoids contention between the workers, and also avoids paying the startup
+    cost of a child plan in those workers that never execute it.
+  </para>
+
+  <para>
+    Also, unlike a regular <literal>Append</literal> node, which can only have
+    partial children when used within a parallel plan, <literal>Parallel
+    Append</literal> node can have both partial and non-partial child plans.
+    Non-partial children will be scanned by only a single worker, since
+    scanning them more than once would produce duplicate results.  Plans that
+    involve appending multiple results sets can therefore achieve
+    coarse-grained parallelism even when efficient partial plans are not
+    available.  For example, consider a query against a partitioned table
+    which can be only be implemented efficiently by using an index that does
+    not support parallel scans.  The planner might choose a <literal>Parallel
+    Append</literal> of regular <literal>Index Scan</literal> plans; each
+    individual index scan would have to be executed to completion by a single
+    process, but different scans could be performed at the same time by
+    different processes.
+  </para>
+
+  <para>
+    <xref linkend="guc-enable-parallel-append" /> can be used to disable
+    this feature.
+  </para>
+ </sect2>
+
  <sect2 id="parallel-plan-tips">
   <title>Parallel Plan Tips</title>

#143

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Robert Haas (#142)

Re: [HACKERS] Parallel Append implementation

On Thu, May 10, 2018 at 7:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

[parallel-append-doc-v2.patch]

+    plans just as they can in any other plan.  However, in a parallel plan,
+    it is also possible that the planner may choose to substitute a
+    <literal>Parallel Append</literal> node.

Maybe drop "it is also possible that "? It seems a bit unnecessary
and sounds a bit odd followed by "may <verb>", but maybe it's just me.

+    Also, unlike a regular <literal>Append</literal> node, which can only have
+    partial children when used within a parallel plan, <literal>Parallel
+    Append</literal> node can have both partial and non-partial child plans.

Missing "a" before "<literal>Parallel".

+ Non-partial children will be scanned by only a single worker, since

Are we using "worker" in a more general sense that possibly includes
the leader? Hmm, yes, other text on this page does that too. Ho hum.

--
Thomas Munro
http://www.enterprisedb.com

#144

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Thomas Munro (#143)

1 attachment(s)

Re: [HACKERS] Parallel Append implementation

On Sun, Jul 29, 2018 at 5:49 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, May 10, 2018 at 7:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

[parallel-append-doc-v2.patch]
+    plans just as they can in any other plan.  However, in a parallel plan,
+    it is also possible that the planner may choose to substitute a
+    <literal>Parallel Append</literal> node.
Maybe drop "it is also possible that "? It seems a bit unnecessary
and sounds a bit odd followed by "may <verb>", but maybe it's just me.

Changed.

+    Also, unlike a regular <literal>Append</literal> node, which can only have
+    partial children when used within a parallel plan, <literal>Parallel
+    Append</literal> node can have both partial and non-partial child plans.

Missing "a" before "<literal>Parallel".

Fixed.

+ Non-partial children will be scanned by only a single worker, since

Are we using "worker" in a more general sense that possibly includes
the leader? Hmm, yes, other text on this page does that too. Ho hum.

Tried to be more careful about this.

New version attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

parallel-append-doc-v3.patchapplication/octet-stream; name=parallel-append-doc-v3.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index e9a015ecd3..52806261b6 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -393,6 +393,54 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 
  </sect2>
 
+ <sect2 id="parallel-append">
+  <title>Parallel Append</title>
+
+  <para>
+    Whenever <productname>PostgreSQL</productname> needs to combine rows
+    from multiple sources into a single result set, it uses an
+    <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
+    This commonly happens when implementing <literal>UNION ALL</literal> or
+    when scanning a partitioned table.  Such nodes can be used in parallel
+    plans just as they can in any other plan.  However, in a parallel plan,
+    the planner may instead use a <literal>Parallel Append</literal> node.
+  </para>
+
+  <para>
+    When an <literal>Append</literal> node is used in a parallel plan, each
+    process will execute the child plans in the order in which they appear,
+    so that all participating processes cooperate to execute the first child
+    plan until it is complete and then move to the second plan at around the
+    same time.  When a <literal>Parallel Append</literal> is used instead, the
+    executor will instead spread out the participating processes as evenly as
+    possible across its child plans, so that multiple child plans are executed
+    simultaneously.  This avoids contention, and also avoids paying the startup
+    cost of a child plan in those processes that never execute it.
+  </para>
+
+  <para>
+    Also, unlike a regular <literal>Append</literal> node, which can only have
+    partial children when used within a parallel plan, a <literal>Parallel
+    Append</literal> node can have both partial and non-partial child plans.
+    Non-partial children will be scanned by only a single process, since
+    scanning them more than once would produce duplicate results.  Plans that
+    involve appending multiple results sets can therefore achieve
+    coarse-grained parallelism even when efficient partial plans are not
+    available.  For example, consider a query against a partitioned table
+    which can be only be implemented efficiently by using an index that does
+    not support parallel scans.  The planner might choose a <literal>Parallel
+    Append</literal> of regular <literal>Index Scan</literal> plans; each
+    individual index scan would have to be executed to completion by a single
+    process, but different scans could be performed at the same time by
+    different processes.
+  </para>
+
+  <para>
+    <xref linkend="guc-enable-parallel-append" /> can be used to disable
+    this feature.
+  </para>
+ </sect2>
+
  <sect2 id="parallel-plan-tips">
   <title>Parallel Plan Tips</title>

#145

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Robert Haas (#144)

Re: [HACKERS] Parallel Append implementation

On Tue, Jul 31, 2018 at 5:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: