WIP: [[Parallel] Shared] Hash

Started by Thomas Munroabout 9 years ago83 messages

thomas.munro@enterprisedb.com

about 9 years ago

Hi hackers,

In PostgreSQL 9.6, hash joins can be parallelised under certain
conditions, but a copy of the hash table is built in every
participating backend. That means that memory and CPU time are
wasted. In many cases, that's OK: if the hash table contents are
small and cheap to compute, then we don't really care, we're just
happy that the probing can be done in parallel. But in cases where
the hash table is large and/or expensive to build, we could do much
better. I am working on that problem.

To recap the situation in 9.6, a hash join can appear below a Gather
node and it looks much the same as a non-parallel hash join except
that it has a partial outer plan:

-> Hash Join
-> <partial outer plan>
-> Hash
-> <non-partial parallel-safe inner plan>

A partial plan is one that has some kind of 'scatter' operation as its
ultimate source of tuples. Currently the only kind of scatter
operation is a Parallel Seq Scan (but see also the Parallel Index Scan
and Parallel Bitmap Scan proposals). The scatter operation enables
parallelism in all the executor nodes above it, as far as the
enclosing 'gather' operation which must appear somewhere above it.
Currently the only kind of gather operation is a Gather node (but see
also the Gather Merge proposal which adds a new one).

The inner plan is built from a non-partial parallel-safe path and will
be run in every worker.

Note that a Hash Join node in 9.6 isn't parallel-aware itself: it's
not doing anything special at execution time to support parallelism.
The planner has determined that correct partial results will be
produced by this plan, but the executor nodes are blissfully unaware
of parallelism.

PROPOSED NEW PLAN VARIANTS

Shortly I will post a patch which introduces two new hash join plan
variants that are parallel-aware:

1. Parallel Hash Join with Shared Hash

-> Parallel Hash Join
-> <partial outer plan>
-> Shared Hash
-> <non-partial parallel-safe inner plan>

In this case, there is only one copy of the hash table and only one
participant loads it. The other participants wait patiently for one
chosen backend to finish building the hash table, and then they all
wake up and probe.

Call the number of participants P, being the number of workers + 1
(for the leader). Compared to a non-shared hash plan, we avoid
wasting CPU and IO resources running P copies of the inner plan in
parallel (something that is not well captured in our costing model for
parallel query today), and we can allow ourselves to use a hash table
P times larger while sticking to the same overall space target of
work_mem * P.

2. Parallel Hash Join with Parallel Shared Hash

-> Parallel Hash Join
-> <partial outer plan>
-> Parallel Shared Hash
-> <partial inner plan>

In this case, the inner plan is run in parallel by all participants.
We have the advantages of a shared hash table as described above, and
now we can also divide the work of running the inner plan and hashing
the resulting tuples by P participants. Note that Parallel Shared
Hash is acting as a special kind of gather operation that is the
counterpart to the scatter operation contained in the inner plan.

PERFORMANCE

So far I have been unable to measure any performance degradation
compared with unpatched master for hash joins with non-shared hash.
That's good because it means that I didn't slow existing plans down
when I introduced a bunch of conditional branches to existing hash
join code.

Laptop testing shows greater than 2x speedups on several of the TPC-H
queries with single batches, and no slowdowns. I will post test
numbers on big rig hardware in the coming weeks when I have the
batching code in more complete and stable shape.

IMPLEMENTATION

I have taken the approach of extending the existing hash join
algorithm, rather than introducing separate hash join executor nodes
or a fundamentally different algorithm. Here's a short description of
what the patch does:

1. SHARED HASH TABLE

To share data between participants, the patch uses two other patches I
have proposed: DSA areas[1]/messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com, which provide a higher level interface
to DSM segments to make programming with processes a little more like
programming with threads, and in particular a per-parallel-query DSA
area[2]/messages/by-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com that is made available for any executor node that needs some
shared work space.

The patch uses atomic operations to push tuples into the hash table
buckets while building, rehashing and loading, and then the hash table
is immutable during probing (except for match flags used to implement
outer joins). The existing memory chunk design is retained for dense
allocation of tuples, which provides a convenient way to rehash the
table when its size changes.

2. WORK COORDINATION

To coordinate parallel work, this patch uses two other patches:
barriers[3]/messages/by-id/CAEepm=2_y7oi01OjA_wLvYcWMc9_d=LaoxrY3eiROCZkB_qakA@mail.gmail.com, to implement a 'barrier' or 'phaser' synchronisation
primitive, and those in turn use the condition variables proposed by
Robert Haas.

Barriers provide a way for participants to break work up into phases
that they unanimously agree to enter together, which is a basic
requirement for parallelising hash joins. It is not safe to insert
into the hash table until exactly one participant has created it; it
is not safe to probe the hash table until all participants have
finished inserting into it; it is not safe to scan it for unmatched
tuples until all participants have finished probing it; it is not safe
to discard it and start loading the next batch until ... you get the
idea. You could also construct appropriate synchronisation using
various other interlocking primitives or flow control systems, but
fundamentally these wait points would exist at some level, and I think
this way is quite clean and simple. YMMV.

If we had exactly W workers and the leader didn't participate, then we
could use a simple simple pthread- or MPI-style barrier without an
explicit notion of 'phase'. We would simply take the existing hash
join code, add the shared hash table, add barrier waits at various
points and make sure that all participants always hit all of those
points in the same order, and it should All Just Work. But we have a
variable party size and a dual-role leader process, and I want to
highlight the specific problems that causes here because they increase
the patch size significantly:

Problem 1: We don't know how many workers will actually start. We
know how many were planned, but at execution time we may have
exhausted limits and actually get a smaller number. So we can't use
"static" barriers like the classic barriers in POSIX or MPI where the
group size is known up front. We need "dynamic" barriers with attach
and detach operations. As soon as you have varying party size you
need some kind of explicit model of the current phase, so that a new
participant can know what to do when it joins. For that reason, this
patch uses a phase number to track progress through the parallel hash
join. See MultiExecHash and ExecHashJoin which have switch statements
allowing a newly joined participant to synchronise their own state
machine and program counter with the phase.

Problem 2: One participant is not like the others: Gather may or may
not decide to run its subplan directly if the worker processes aren't
producing any tuples (and the proposed Gather Merge is the same). The
problem is that it also needs to consume tuples from the fixed-size
queues of the regular workers. A deadlock could arise if the leader's
plan blocks waiting for other participants while another participant
has filled its output queue and is waiting for the leader to consume.
One way to avoid such deadlocks is to follow the rule that the leader
should never wait for other participants if there is any possibility
that they have emitted tuples. The simplest way to do that would be
to have shared hash plans refuse to run in the leader by returning
NULL to signal the end of this partial tuple stream, but then we'd
lose a CPU compared to non-shared hash plans. The latest point the
leader can exit while respecting that rule is at the end of probing
the first batch. That is the approach taken by the patch currently.
See ExecHashCheckForEarlyExit for logic and discussion. It would be
better to be able to use the leader in later batches too, but as far
as I can see that'd require changes that are out of scope for this
patch. One idea would be an executor protocol change allowing plans
running in the leader to detach and yield, saying 'I have no further
tuples right now, but I'm not finished; try again later', and then
reattach when you call it back. Clearly that sails close to
asynchronous execution territory.

Problem 3: If the leader drops out after the first batch to solve
problem 2, then it may leave behind batch files which must be
processed by other participants. I had originally planned to defer
work on batch file sharing until a later iteration, thinking that it
would be a nice performance improvement to redistribute work from
uneven batch files, but it turns out to be necessary for correct
results because of participants exiting early. I am working on a very
simple batch sharing system to start with... Participants still
generate their own batch files, and then new operations BufFileExport
and BufFileImport are used to grant read-only access to the BufFile to
other participants. Each participant reads its own batch files
entirely and then tries to read from every other participant's batch
files until they are all exhausted, using a shared read head. The
per-tuple locking granularity, extra seeking and needless buffering in
every backend on batch file reads aren't great, and I'm still figuring
out temporary file cleanup/ownership semantics. There may be an
opportunity to make use of 'unified' BufFile concepts from Peter
Geoghegan's work, or create some new reusable shared tuple spilling
infrastructure.

3. COSTING

For now, I have introduced a GUC called cpu_shared_tuple_cost which
provides a straw-man model of the overhead of exchanging tuples via a
shared hash table, and the extra process coordination required. If
it's zero then a non-shared hash plan (ie multiple copies) has the
same cost as a shared hash plan, even though the non-shared hash plan
wastefully runs P copies of the plan. If cost represents runtime and
and we assume perfectly spherical cows running without interference
from each other, that makes some kind of sense, but it doesn't account
for the wasted resources and contention caused by running the same
plan in parallel. I don't know what to do about that yet. If
cpu_shared_tuple_cost is a positive number, as it probably should be
(more on that later), then shared hash tables look more expensive than
non-shared ones, which is technically true (CPU cache sharing etc) but
unhelpful because what you lose there you tend to gain by not running
all those plans in parallel. In other words cpu_shared_tuple_cost
doesn't really model the cost situation at all well, but it's a useful
GUC for development purposes for now as positive and negative numbers
can be used to turn the feature on and off for testing... As for
work_mem, it seems to me that 9.6 already established that work_mem is
a per participant limit, and it would be only fair to let a shared
plan use a total of work_mem * P too. I am still working on work_mem
accounting and reporting. Accounting for the parallelism in parallel
shared hash plans is easy though: their estimated tuple count is
already divided by P in the underlying partial path, and that is a
fairly accurate characterisation of what's going to happen at
execution time: it's often going to go a lot faster, and those plans
are the real goal of this work.

STATUS

Obviously this is a work in progress. I am actively working on the following:

* rescan
* batch number increases
* skew buckets
* costing model and policy/accounting for work_mem
* shared batch file reading
* preloading next batch
* debugging and testing
* tidying and refactoring

The basic approach is visible and simple cases are working though, so
I am submitting this WIP work for a round of review in the current
commitfest and hoping to get some feedback and ideas. I will post the
patch in a follow-up email shortly... Thanks for reading!

[1]: /messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
[2]: /messages/by-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com
[3]: /messages/by-id/CAEepm=2_y7oi01OjA_wLvYcWMc9_d=LaoxrY3eiROCZkB_qakA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#1)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

Thomas Munro <thomas.munro@enterprisedb.com> wrote:

The basic approach is visible and simple cases are working though, so
I am submitting this WIP work for a round of review in the current
commitfest and hoping to get some feedback and ideas. I will post the
patch in a follow-up email shortly...

Aloha,

Please find a WIP patch attached. Everything related to batch reading
is not currently in a working state, which breaks multi-batch joins,
but many single batch cases work correctly. In an earlier version I
had multi-batch joins working but was before I started tackling
problems 2 and 3 listed in my earlier message. There is some error
handling and resource cleanup missing, and doubtless some cases not
handled correctly. But I thought it would be good to share this
development snapshot for discussion, so I'm posting this as is, and
will post an updated version when I've straightened out the batching
code some more.

To apply parallel-hash-v1, first apply the following patches, in this order:

condition-variable-v3.patch [1]/messages/by-id/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
remove-useless-barrier-header-v2.patch [2]/messages/by-id/CAEepm=1wrrzxh=SRCF_Hk4SZQ9BULy1vWsicx0EbgUf0B85vZQ@mail.gmail.com
barrier-v3.patch [2]/messages/by-id/CAEepm=1wrrzxh=SRCF_Hk4SZQ9BULy1vWsicx0EbgUf0B85vZQ@mail.gmail.com
dsa-v4.patch [3]/messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
dsa-area-for-executor-v1.patch [4]/messages/by-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com

When applying dsa-v4 on top of barrier-v3, it will reject a hunk in
src/backend/storage/ipc/Makefile where they both add their object
file. Simply add dsa.o to OBJS manually.

Then you can apply parallel-hash-v1.patch, which is attached to this message.

[1]: /messages/by-id/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
[2]: /messages/by-id/CAEepm=1wrrzxh=SRCF_Hk4SZQ9BULy1vWsicx0EbgUf0B85vZQ@mail.gmail.com
[3]: /messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
[4]: /messages/by-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-hash-v1.patchapplication/octet-stream; name=parallel-hash-v1.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0a669d9..1e7d369 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1023,7 +1023,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			pname = sname = "Limit";
 			break;
 		case T_Hash:
-			pname = sname = "Hash";
+			if (((Hash *) plan)->shared_table)
+				pname = sname = "Shared Hash";
+			else
+				pname = sname = "Hash";
 			break;
 		default:
 			pname = sname = "???";
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 72bacd5..2d1ff2a 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -27,6 +27,7 @@
 #include "executor/executor.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
+#include "executor/nodeHashJoin.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/tqueue.h"
 #include "nodes/nodeFuncs.h"
@@ -203,6 +204,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinEstimate((HashJoinState *) planstate,
+									 e->pcxt);
+				break;
 			default:
 				break;
 		}
@@ -255,6 +260,9 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinInitializeDSM((HashJoinState *) planstate,
+										  d->pcxt);
 			default:
 				break;
 		}
@@ -724,6 +732,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinInitializeWorker((HashJoinState *) planstate,
+											 toc);
+				break;
 			default:
 				break;
 		}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 6375d9b..1cc7f59 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -25,6 +25,7 @@
 #include <limits.h>
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "commands/tablespace.h"
 #include "executor/execdebug.h"
@@ -32,12 +33,13 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "pgstat.h"
+#include "port/atomics.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
-
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
@@ -47,8 +49,30 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 						uint32 hashvalue,
 						int bucketNumber);
 static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
+static void ExecHashRebucket(HashJoinTable hashtable);
+static void ExecHashTableComputeOptimalBuckets(HashJoinTable hashtable);
+
+static void add_tuple_count(HashJoinTable hashtable, int count,
+							bool secondary);
+static HashJoinTuple next_tuple_in_bucket(HashJoinTable table,
+										  HashJoinTuple tuple);
+static HashJoinTuple first_tuple_in_skew_bucket(HashJoinTable table,
+												int skew_bucket_no);
+static HashJoinTuple first_tuple_in_skew_bucket(HashJoinTable table,
+												int bucket_no);
+static void insert_tuple_into_bucket(HashJoinTable table, int bucket_no,
+									 HashJoinTuple tuple,
+									 dsa_pointer tuple_pointer);
+static void insert_tuple_into_skew_bucket(HashJoinTable table,
+										  int bucket_no,
+										  HashJoinTuple tuple,
+										  dsa_pointer tuple_pointer);
 
 static void *dense_alloc(HashJoinTable hashtable, Size size);
+static void *dense_alloc_shared(HashJoinTable hashtable, Size size,
+								dsa_pointer *chunk_shared,
+								bool secondary);
+
 
 /* ----------------------------------------------------------------
  *		ExecHash
@@ -64,6 +88,100 @@ ExecHash(HashState *node)
 }
 
 /* ----------------------------------------------------------------
+ * 		ExecHashCheckForEarlyExit
+ *
+ *		return true if this process needs to abandon work on the
+ *		hash join to avoid a deadlock
+ * ----------------------------------------------------------------
+ */
+bool
+ExecHashCheckForEarlyExit(HashJoinTable hashtable)
+{
+	/*
+	 * The golden rule of leader deadlock avoidance: since leader processes
+	 * have two separate roles, namely reading from worker queues AND executing
+	 * the same plan as workers, we must never allow a leader to wait for
+	 * workers if there is any possibility those workers have emitted tuples.
+	 * Otherwise we could get into a situation where a worker fills up its
+	 * output tuple queue and begins waiting for the leader to read, while
+	 * the leader is busy waiting for the worker.
+	 *
+	 * Parallel hash joins with shared tables are inherently susceptible to
+	 * such deadlocks because there are points at which all participants must
+	 * wait (you can't start check for unmatched tuples in the hash table until
+	 * probing has completed in all workers, etc).
+	 *
+	 * So we follow these rules:
+	 *
+	 * 1.  If there are workers participating, the leader MUST NOT not
+	 *     participate in any further work after probing the first batch, so
+	 *     that it never has to wait for workers that might have emitted
+	 *     tuples.
+	 *
+	 * 2.  If there are no workers participating, the leader MUST run all the
+	 *     batches to completion, because that's the only way for the join
+	 *     to complete.  There is no deadlock risk if there are no workers.
+	 *
+	 * 3.  Workers MUST NOT participate if the hashing phase has finished by
+	 *     the time they have joined, so that the leader can reliably determine
+	 *     whether there are any workers running when it comes to the point
+	 *     where it must choose between 1 and 2.
+	 *
+	 * In other words, if the leader makes it all the way through hashing and
+	 * probing before any workers show up, then the leader will run the whole
+	 * hash join on its own.  If workers do show up any time before hashing is
+	 * finished, the leader will stop executing the join after helping probe
+	 * the first batch.   In the unlikely event of the first worker showing up
+	 * after the leader has finished hashing, it will exit because it's too
+	 * late, the leader has already decided to do all the work alone.
+	 */
+
+	if (!IsParallelWorker())
+	{
+		/* Running in the leader process. */
+		if (BarrierPhase(&hashtable->shared->barrier) == PHJ_PHASE_PROBING &&
+			hashtable->shared->at_least_one_worker)
+		{
+			/* Abandon ship due to rule 1.  There are workers running. */
+			hashtable->detached_early = true;
+		}
+		else
+		{
+			/*
+			 * Continue processing due to rule 2.  There are no workers, and
+			 * any workers that show up later will abandon ship.
+			 */
+		}
+	}
+	else
+	{
+		/* Running in a worker process. */
+		if (hashtable->attached_at_phase < PHJ_PHASE_PROBING)
+		{
+			/*
+			 * Advertise that there are workers, so that the leader can
+			 * choose between rules 1 and 2.  It's OK that several workers can
+			 * write to this variable without immediately memory
+			 * synchronization, because the leader will only read it in a later
+			 * phase (see above).
+			 */
+			hashtable->shared->at_least_one_worker = true;
+		}
+		else
+		{
+			/* Abandon ship due to rule 3. */
+			hashtable->detached_early = true;
+		}
+	}
+
+	/* If we decided to exit early, detach now. */
+	if (hashtable->detached_early)
+		BarrierDetach(&hashtable->shared->barrier);
+
+	return hashtable->detached_early;
+}
+
+/* ----------------------------------------------------------------
  *		MultiExecHash
  *
  *		build hash table for hashjoin, doing partitioning if more
@@ -79,6 +197,7 @@ MultiExecHash(HashState *node)
 	TupleTableSlot *slot;
 	ExprContext *econtext;
 	uint32		hashvalue;
+	Barrier	   *barrier = NULL;
 
 	/* must provide our own instrumentation support */
 	if (node->ps.instrument)
@@ -90,6 +209,55 @@ MultiExecHash(HashState *node)
 	outerNode = outerPlanState(node);
 	hashtable = node->hashtable;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Synchronize parallel hash table builds.  At this stage we know that
+		 * the shared hash table has been created, but we don't know if our
+		 * peers are still in MultiExecHash and if so how far through.  We use
+		 * the phase to synchronize with them.
+		 */
+		barrier = &hashtable->shared->barrier;
+
+		switch (BarrierPhase(barrier))
+		{
+		case PHJ_PHASE_INIT:
+			/* ExecHashTableCreate already handled this phase. */
+			Assert(false);
+		case PHJ_PHASE_CREATING:
+			/* Wait for serial phase, and then either hash or wait. */
+			if (BarrierWait(barrier, WAIT_EVENT_HASH_CREATING))
+				goto hash;
+			else if (node->ps.plan->parallel_aware)
+				goto hash;
+			else
+				goto post_hash;
+		case PHJ_PHASE_HASHING:
+			/* Hashing is already underway.  Can we join in? */
+			if (node->ps.plan->parallel_aware)
+				goto hash;
+			else
+				goto post_hash;
+		case PHJ_PHASE_RESIZING:
+			/* Can't help with serial phase. */
+			goto post_resize;
+		case PHJ_PHASE_REBUCKETING:
+			/* Rebucketing is in progress.  Let's help do that. */
+			goto rebucket;
+		default:
+			/* The hash table building work is already finished. */
+			goto finish;
+		}
+	}
+
+ hash:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Make sure our local hashtable is up-to-date so we can hash. */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_HASHING);
+		ExecHashUpdate(hashtable);
+	}
+
 	/*
 	 * set expression context
 	 */
@@ -123,22 +291,98 @@ MultiExecHash(HashState *node)
 			else
 			{
 				/* Not subject to skew optimization, so insert normally */
-				ExecHashTableInsert(hashtable, slot, hashvalue);
+				ExecHashTableInsert(hashtable, slot, hashvalue, false);
 			}
-			hashtable->totalTuples += 1;
+			/*
+			 * Shared tuple counters are managed by dense_alloc_shared.  For
+			 * private hash tables we maintain the counter here.
+			 */
+			if (!HashJoinTableIsShared(hashtable))
+				hashtable->totalTuples += 1;
 		}
 	}
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Update shared tuple count for the current chunk.  Other chunks are
+		 * accounted for already, when new chunks are allocated.
+		 */
+		if (hashtable->primary_chunk != NULL)
+			add_tuple_count(hashtable, hashtable->primary_chunk->ntuples,
+							false);
+	}
+
+ post_hash:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		bool elected_to_resize;
+
+		/*
+		 * Wait for all backends to finish hashing.  If only one worker is
+		 * running the hashing phase because of a non-partial inner plan, the
+		 * other workers will pile up here waiting.  If multiple worker are
+		 * hashing, they should finish close to each other in time.
+		 */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_HASHING);
+		elected_to_resize = BarrierWait(barrier, WAIT_EVENT_HASH_HASHING);
+		/*
+		 * Resizing is a serial phase.  All but one should skip ahead to
+		 * rebucketing, but all workers should update their copy of the shared
+		 * tuple count with the final total first.
+		 */
+		hashtable->totalTuples =
+			pg_atomic_read_u64(&hashtable->shared->total_primary_tuples);
+		if (!elected_to_resize)
+			goto post_resize;
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+	}
+
 	/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-	if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-		ExecHashIncreaseNumBuckets(hashtable);
+	ExecHashIncreaseNumBuckets(hashtable);
+
+ post_resize:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+		BarrierWait(&hashtable->shared->barrier,
+					WAIT_EVENT_HASH_RESIZING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REBUCKETING);
+	}
+
+ rebucket:
+	/* If the table was resized, insert tuples into the new buckets. */
+	ExecHashUpdate(hashtable);
+	ExecHashRebucket(hashtable);
 
 	/* Account for the buckets in spaceUsed (reported in EXPLAIN ANALYZE) */
-	hashtable->spaceUsed += hashtable->nbuckets * sizeof(HashJoinTuple);
+	hashtable->spaceUsed += hashtable->nbuckets * sizeof(HashJoinBucketHead);
 	if (hashtable->spaceUsed > hashtable->spacePeak)
 		hashtable->spacePeak = hashtable->spaceUsed;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REBUCKETING);
+		BarrierWait(barrier, WAIT_EVENT_HASH_REBUCKETING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING);
+	}
+
+ finish:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * All hashing work has finished.  The other workers may be probing or
+		 * processing unmatched tuples for the initial batch, or dealing with
+		 * later batches.  The next synchronization point is in ExecHashJoin's
+		 * HJ_BUILD_HASHTABLE case, which will figure that out and synchronize
+		 * its local state machine with the parallel processing group's phase.
+		 */
+		Assert(BarrierPhase(barrier) >= PHJ_PHASE_PROBING);
+		ExecHashUpdate(hashtable);
+	}
+
 	/* must provide our own instrumentation support */
+	/* TODO: report only the tuples that WE hashed here? */
 	if (node->ps.instrument)
 		InstrStopNode(node->ps.instrument, hashtable->totalTuples);
 
@@ -243,8 +487,9 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
+ExecHashTableCreate(HashState *state, List *hashOperators, bool keepNulls)
 {
+	Hash	   *node;
 	HashJoinTable hashtable;
 	Plan	   *outerNode;
 	int			nbuckets;
@@ -261,6 +506,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	 * "outer" subtree of this node, but the inner relation of the hashjoin).
 	 * Compute the appropriate size of the hash table.
 	 */
+	node = (Hash *) state->ps.plan;
 	outerNode = outerPlan(node);
 
 	ExecChooseHashTableSize(outerNode->plan_rows, outerNode->plan_width,
@@ -305,7 +551,13 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	hashtable->spaceUsedSkew = 0;
 	hashtable->spaceAllowedSkew =
 		hashtable->spaceAllowed * SKEW_WORK_MEM_PERCENT / 100;
-	hashtable->chunks = NULL;
+	hashtable->primary_chunk = NULL;
+	hashtable->secondary_chunk = NULL;
+	hashtable->chunks_to_rebucket = NULL;
+	hashtable->primary_chunk_shared = InvalidDsaPointer;
+	hashtable->secondary_chunk_shared = InvalidDsaPointer;
+	hashtable->area = state->ps.state->es_query_area;
+	hashtable->shared = state->shared_table_data;
 
 #ifdef HJDEBUG
 	printf("Hashjoin %p: initial nbatch = %d, nbuckets = %d\n",
@@ -368,23 +620,101 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 		PrepareTempTablespaces();
 	}
 
-	/*
-	 * Prepare context for the first-scan space allocations; allocate the
-	 * hashbucket array therein, and set each bucket "empty".
-	 */
-	MemoryContextSwitchTo(hashtable->batchCxt);
+	MemoryContextSwitchTo(oldcxt);
 
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Barrier *barrier;
 
-	/*
-	 * Set up for skew optimization, if possible and there's a need for more
-	 * than one batch.  (In a one-batch join, there's no point in it.)
-	 */
-	if (nbatch > 1)
-		ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+		/*
+		 * Attach to the barrier.  The corresponding detach operation is in
+		 * ExecHashTableDestroy.
+		 */
+		barrier = &hashtable->shared->barrier;
+		hashtable->attached_at_phase = BarrierAttach(barrier);
 
-	MemoryContextSwitchTo(oldcxt);
+		/*
+		 * So far we have no idea whether there are any other workers, and if
+		 * so, what phase they are working on.  The only thing we care about
+		 * at this point is whether someone has already created the shared
+		 * hash table yet.  If not, one backend will be elected to do that
+		 * now.
+		 */
+		if (BarrierPhase(barrier) == PHJ_PHASE_INIT)
+		{
+			if (BarrierWait(barrier, WAIT_EVENT_HASH_INIT))
+			{
+				/* Serial phase: create the hash tables */
+				Size bytes;
+				HashJoinBucketHead *buckets;
+				int i;
+				SharedHashJoinTable shared;
+				dsa_area *area;
+
+				shared = hashtable->shared;
+				area = hashtable->area;
+				bytes = nbuckets * sizeof(HashJoinBucketHead);
+
+				/* Allocate the primary and secondary hash tables. */
+				shared->primary_buckets = dsa_allocate(area, bytes);
+				shared->secondary_buckets = dsa_allocate(area, bytes);
+				if (!DsaPointerIsValid(shared->primary_buckets) ||
+					!DsaPointerIsValid(shared->secondary_buckets))
+					ereport(ERROR,
+							(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+							 errmsg("out of memory")));
+
+				/* Set up primary table's buckets. */
+				buckets = dsa_get_address(area, shared->primary_buckets);
+				for (i = 0; i < nbuckets; ++i)
+					dsa_pointer_atomic_init(&buckets[i].shared,
+											InvalidDsaPointer);
+				/* Set up secondary table's buckets. */
+				buckets = dsa_get_address(area, shared->secondary_buckets);
+				for (i = 0; i < nbuckets; ++i)
+					dsa_pointer_atomic_init(&buckets[i].shared,
+											InvalidDsaPointer);
+
+				/* Initialize the rest of parallel_state. */
+				hashtable->shared->nbuckets = nbuckets;
+				pg_atomic_write_u32(&hashtable->shared->next_unmatched_bucket,
+									0);
+				/* TODO: ExecHashBuildSkewHash */
+
+				ExecHashJoinResetBatchReaders(hashtable);
+
+				/*
+				 * The backend-local pointers in hashtable will be set up by
+				 * ExecHashUpdate, at each point where they might have
+				 * changed.
+				 */
+			}
+			Assert(BarrierPhase(&hashtable->shared->barrier) ==
+				   PHJ_PHASE_CREATING);
+			/* The next synchronization point is in MultiExecHash. */
+		}
+	}
+	else
+	{
+		/*
+		 * Prepare context for the first-scan space allocations; allocate the
+		 * hashbucket array therein, and set each bucket "empty".
+		 */
+		MemoryContextSwitchTo(hashtable->batchCxt);
+
+		hashtable->buckets = (HashJoinBucketHead *)
+			palloc0(nbuckets * sizeof(HashJoinBucketHead));
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/*
+		 * Set up for skew optimization, if possible and there's a need for
+		 * more than one batch.  (In a one-batch join, there's no point in
+		 * it.)
+		 */
+		if (nbatch > 1)
+			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+	}
 
 	return hashtable;
 }
@@ -564,6 +894,49 @@ ExecHashTableDestroy(HashJoinTable hashtable)
 {
 	int			i;
 
+	/* Detached, if we haven't already. */
+	if (HashJoinTableIsShared(hashtable) && !hashtable->detached_early)
+	{
+		Barrier *barrier = &hashtable->shared->barrier;
+
+		/*
+		 * We can't make any assertions about the phase here, because we could
+		 * be destroyed mid-probing due to a Limit clause, or after running
+		 * out of work, or as a leading having decided to exit early.  Instead
+		 * we just detach from the barrier, and let the last participant to
+		 * detach clean up.
+		 */
+
+		if (BarrierWait(barrier, WAIT_EVENT_HASH_DESTROY))
+		{
+			/* Serial: free the tables */
+			if (DsaPointerIsValid(hashtable->shared->primary_buckets))
+			{
+				dsa_free(hashtable->area,
+						 hashtable->shared->primary_buckets);
+				hashtable->shared->primary_buckets = InvalidDsaPointer;
+			}
+			if (DsaPointerIsValid(hashtable->shared->secondary_buckets))
+			{
+				dsa_free(hashtable->area,
+						 hashtable->shared->secondary_buckets);
+				hashtable->shared->secondary_buckets = InvalidDsaPointer;
+			}
+
+
+			/* This isn't a real phase: it's "past the end". */
+			/*
+			elog(LOG, "XXX ExecHashTableDestroy nbatch = %d", hashtable->curbatch);
+			elog(LOG, "XXX ExecHashTableDestroy expected %d got %d", BarrierPhase(barrier), PHJ_PHASE_PROMOTING_BATCH(hashtable->curbatch + 1));
+			Assert(BarrierPhase(barrier) ==
+				   PHJ_PHASE_PROMOTING_BATCH(hashtable->curbatch + 1));
+			*/
+			/* TODO: reinitialize barrier for rescan! */
+			/* TODO: free chunks? */
+		}
+		BarrierDetach(&hashtable->shared->barrier);
+	}
+
 	/*
 	 * Make sure all the temp files are closed.  We skip batch 0, since it
 	 * can't have any temp files (and the arrays might not even exist if
@@ -600,6 +973,18 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	long		nfreed;
 	HashMemoryChunk oldchunks;
 
+	/*
+	 * TODO:TM this will be done incrementally for shared tables; for now it
+	 * is disabled!  Current idea: the chain of memory chunks can be shifted
+	 * to another list of memory chunks to be rebatched, and other workers
+	 * that are busy hashing can see that it's non-empty, and pop chunks off
+	 * to rebatch.  This way we can fan out the expensive rebatching work, but
+	 * potentially requires more than one hash table active at a time.  More
+	 * study required.
+	 */
+	if (HashJoinTableIsShared(hashtable))
+		return;
+
 	/* do nothing if we've decided to shut off growth */
 	if (!hashtable->growEnabled)
 		return;
@@ -670,13 +1055,13 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	 * already been processed. We will free the old chunks as we go.
 	 */
 	memset(hashtable->buckets, 0, sizeof(HashJoinTuple) * hashtable->nbuckets);
-	oldchunks = hashtable->chunks;
-	hashtable->chunks = NULL;
+	oldchunks = hashtable->primary_chunk;
+	hashtable->primary_chunk = NULL;
 
 	/* so, let's scan through the old chunks, and all tuples in each chunk */
 	while (oldchunks != NULL)
 	{
-		HashMemoryChunk nextchunk = oldchunks->next;
+		HashMemoryChunk nextchunk = oldchunks->next.private;
 
 		/* position within the buffer (up to oldchunks->used) */
 		size_t		idx = 0;
@@ -699,20 +1084,23 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 				/* keep tuple in memory - copy it into the new chunk */
 				HashJoinTuple copyTuple;
 
-				copyTuple = (HashJoinTuple) dense_alloc(hashtable, hashTupleSize);
+				copyTuple = (HashJoinTuple)
+					dense_alloc(hashtable, hashTupleSize);
 				memcpy(copyTuple, hashTuple, hashTupleSize);
 
 				/* and add it back to the appropriate bucket */
-				copyTuple->next = hashtable->buckets[bucketno];
-				hashtable->buckets[bucketno] = copyTuple;
+				insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+										 InvalidDsaPointer);
 			}
 			else
 			{
 				/* dump it out */
 				Assert(batchno > curbatch);
-				ExecHashJoinSaveTuple(HJTUPLE_MINTUPLE(hashTuple),
+				ExecHashJoinSaveTuple(hashtable,
+									  HJTUPLE_MINTUPLE(hashTuple),
 									  hashTuple->hashvalue,
-									  &hashtable->innerBatchFile[batchno]);
+									  batchno,
+									  true);
 
 				hashtable->spaceUsed -= hashTupleSize;
 				nfreed++;
@@ -758,8 +1146,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 static void
 ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 {
-	HashMemoryChunk chunk;
-
 	/* do nothing if not an increase (it's called increase for a reason) */
 	if (hashtable->nbuckets >= hashtable->nbuckets_optimal)
 		return;
@@ -780,16 +1166,156 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 	 * Just reallocate the proper number of buckets - we don't need to walk
 	 * through them - we can walk the dense-allocated chunks (just like in
 	 * ExecHashIncreaseNumBatches, but without all the copying into new
-	 * chunks)
+	 * chunks): see ExecHashRebucket, which must be called next.
+	 */
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Size bytes;
+		int i;
+
+		/* Serial phase: only one backend reallocates. */
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_RESIZING);
+
+		/* Free the old arrays. */
+		dsa_free(hashtable->area,
+				 hashtable->shared->primary_buckets);
+		dsa_free(hashtable->area,
+				 hashtable->shared->secondary_buckets);
+		/* Allocate replacements. */
+		bytes = hashtable->nbuckets * sizeof(HashJoinBucketHead);
+		hashtable->shared->primary_buckets =
+			dsa_allocate(hashtable->area, bytes);
+		hashtable->shared->secondary_buckets =
+			dsa_allocate(hashtable->area, bytes);
+		if (!DsaPointerIsValid(hashtable->shared->primary_buckets) ||
+			!DsaPointerIsValid(hashtable->shared->secondary_buckets))
+			ereport(ERROR,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("out of memory")));
+		/* Initialize empty buckets. */
+		hashtable->buckets =
+			dsa_get_address(hashtable->area,
+							hashtable->shared->primary_buckets);
+		for (i = 0; i < hashtable->nbuckets; ++i)
+			dsa_pointer_atomic_write(&hashtable->buckets[i].shared,
+									 InvalidDsaPointer);
+		hashtable->next_buckets =
+			dsa_get_address(hashtable->area,
+							hashtable->shared->secondary_buckets);
+		for (i = 0; i < hashtable->nbuckets; ++i)
+			dsa_pointer_atomic_write(&hashtable->next_buckets[i].shared,
+									 InvalidDsaPointer);
+		hashtable->shared->nbuckets = hashtable->nbuckets;
+		/* Move all primary chunks to the rebucket list. */
+		dsa_pointer_atomic_write(&hashtable->shared->chunks_to_rebucket,
+								 dsa_pointer_atomic_read(&hashtable->shared->head_primary_chunk));
+		dsa_pointer_atomic_write(&hashtable->shared->head_primary_chunk,
+								 InvalidDsaPointer);
+	}
+	else
+	{
+		hashtable->buckets =
+			(HashJoinBucketHead *) repalloc(hashtable->buckets,
+											hashtable->nbuckets * sizeof(HashJoinBucketHead));
+
+		memset(hashtable->buckets, 0, hashtable->nbuckets * sizeof(HashJoinBucketHead));
+		/* Move all chunks to the rebucket list. */
+		hashtable->chunks_to_rebucket = hashtable->primary_chunk;
+		hashtable->primary_chunk = NULL;
+	}
+}
+
+/*
+ * Pop a memory chunk from a given list atomically.  Returns a backend-local
+ * pointer to the chunk, or NULL if the list is empty.  Also sets *chunk_out
+ * to the dsa_pointer to the chunk.
+ */
+static HashMemoryChunk
+ExecHashPopChunk(HashJoinTable hashtable,
+				 dsa_pointer *chunk_out,
+				 dsa_pointer_atomic *head)
+{
+	HashMemoryChunk chunk = NULL;
+
+	/*
+	 * We could see a stale empty list and exist early without a barrier, so
+	 * explicitly include one before we read the head of the list for the
+	 * first time.
 	 */
-	hashtable->buckets =
-		(HashJoinTuple *) repalloc(hashtable->buckets,
-								hashtable->nbuckets * sizeof(HashJoinTuple));
+	pg_read_barrier();
 
-	memset(hashtable->buckets, 0, hashtable->nbuckets * sizeof(HashJoinTuple));
+	for (;;)
+	{
+		*chunk_out = dsa_pointer_atomic_read(head);
+		if (!DsaPointerIsValid(*chunk_out))
+		{
+			chunk = NULL;
+			break;
+		}
+		chunk = (HashMemoryChunk)
+			dsa_get_address(hashtable->area, *chunk_out);
+		if (dsa_pointer_atomic_compare_exchange(head,
+												chunk_out,
+												chunk->next.shared))
+			break;
+	}
 
-	/* scan through all tuples in all chunks to rebuild the hash table */
-	for (chunk = hashtable->chunks; chunk != NULL; chunk = chunk->next)
+	return chunk;
+}
+
+/*
+ * Push a shared memory chunk onto a given list atomically.
+ */
+static void
+ExecHashPushChunk(HashJoinTable hashtable,
+				  HashMemoryChunk chunk,
+				  dsa_pointer chunk_shared,
+				  dsa_pointer_atomic *head)
+{
+	Assert(chunk == dsa_get_address(hashtable->area, chunk_shared));
+
+	for (;;)
+	{
+		chunk->next.shared = dsa_pointer_atomic_read(head);
+		if (dsa_pointer_atomic_compare_exchange(head,
+												&chunk->next.shared,
+												chunk_shared))
+			break;
+	}
+}
+
+/*
+ * ExecHashRebucket
+ *		insert the tuples from all chunks into the correct bucket
+ */
+static void
+ExecHashRebucket(HashJoinTable hashtable)
+{
+	HashMemoryChunk chunk;
+	dsa_pointer chunk_shared;
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * This is a parallel phase.  Workers will atomically pop one chunk at
+		 * a time and rebucket all of its tuples.
+		 */
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_REBUCKETING);
+	}
+
+	/*
+	 * Scan through all tuples in all chunks in the rebucket list to rebuild
+	 * the hash table.
+	 */
+	if (HashJoinTableIsShared(hashtable))
+		chunk =
+			ExecHashPopChunk(hashtable, &chunk_shared,
+							 &hashtable->shared->chunks_to_rebucket);
+	else
+		chunk = hashtable->chunks_to_rebucket;
+	while (chunk != NULL)
 	{
 		/* process all tuples stored in this chunk */
 		size_t		idx = 0;
@@ -797,6 +1323,8 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 		while (idx < chunk->used)
 		{
 			HashJoinTuple hashTuple = (HashJoinTuple) (chunk->data + idx);
+			dsa_pointer hashTuple_shared = chunk_shared +
+				offsetof(HashMemoryChunkData, data) + idx;
 			int			bucketno;
 			int			batchno;
 
@@ -804,16 +1332,52 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 									  &bucketno, &batchno);
 
 			/* add the tuple to the proper bucket */
-			hashTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = hashTuple;
+			insert_tuple_into_bucket(hashtable, bucketno, hashTuple,
+									 hashTuple_shared);
 
 			/* advance index past the tuple */
 			idx += MAXALIGN(HJTUPLE_OVERHEAD +
 							HJTUPLE_MINTUPLE(hashTuple)->t_len);
 		}
+
+		/* Push chunk onto regular list and move to next chunk. */
+		if (HashJoinTableIsShared(hashtable))
+		{
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->head_primary_chunk);
+			chunk =
+				ExecHashPopChunk(hashtable, &chunk_shared,
+								 &hashtable->shared->chunks_to_rebucket);
+		}
+		else
+		{
+			HashMemoryChunk next = chunk->next.private;
+
+			chunk->next.private = hashtable->primary_chunk;
+			hashtable->primary_chunk = chunk;
+			chunk = next;
+		}
 	}
 }
 
+static void
+ExecHashTableComputeOptimalBuckets(HashJoinTable hashtable)
+{
+	double		ntuples = (hashtable->totalTuples - hashtable->skewTuples);
+
+	/*
+	 * Guard against integer overflow and alloc size overflow.  The
+	 * MaxAllocSize limitation doesn't really apply for shared hash tables,
+	 * since DSA has no such limit, but for now let's apply the same limit.
+	 */
+	while (ntuples > (hashtable->nbuckets_optimal * NTUP_PER_BUCKET) &&
+		   hashtable->nbuckets_optimal <= INT_MAX / 2 &&
+		   hashtable->nbuckets_optimal * 2 <= MaxAllocSize / sizeof(HashJoinBucketHead))
+	{
+		hashtable->nbuckets_optimal *= 2;
+		hashtable->log2_nbuckets_optimal += 1;
+	}
+}
 
 /*
  * ExecHashTableInsert
@@ -829,7 +1393,8 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 void
 ExecHashTableInsert(HashJoinTable hashtable,
 					TupleTableSlot *slot,
-					uint32 hashvalue)
+					uint32 hashvalue,
+					bool secondary)
 {
 	MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
 	int			bucketno;
@@ -848,11 +1413,17 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		 */
 		HashJoinTuple hashTuple;
 		int			hashTupleSize;
-		double		ntuples = (hashtable->totalTuples - hashtable->skewTuples);
+		dsa_pointer hashTuple_shared = InvalidDsaPointer;
 
 		/* Create the HashJoinTuple */
 		hashTupleSize = HJTUPLE_OVERHEAD + tuple->t_len;
-		hashTuple = (HashJoinTuple) dense_alloc(hashtable, hashTupleSize);
+		if (HashJoinTableIsShared(hashtable))
+			hashTuple = (HashJoinTuple)
+				dense_alloc_shared(hashtable, hashTupleSize,
+								   &hashTuple_shared, secondary);
+		else
+			hashTuple = (HashJoinTuple)
+				dense_alloc(hashtable, hashTupleSize);
 
 		hashTuple->hashvalue = hashvalue;
 		memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
@@ -866,25 +1437,16 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
 		/* Push it onto the front of the bucket's list */
-		hashTuple->next = hashtable->buckets[bucketno];
-		hashtable->buckets[bucketno] = hashTuple;
+		insert_tuple_into_bucket(hashtable, bucketno, hashTuple,
+								 hashTuple_shared);
 
 		/*
 		 * Increase the (optimal) number of buckets if we just exceeded the
 		 * NTUP_PER_BUCKET threshold, but only when there's still a single
 		 * batch.
 		 */
-		if (hashtable->nbatch == 1 &&
-			ntuples > (hashtable->nbuckets_optimal * NTUP_PER_BUCKET))
-		{
-			/* Guard against integer overflow and alloc size overflow */
-			if (hashtable->nbuckets_optimal <= INT_MAX / 2 &&
-				hashtable->nbuckets_optimal * 2 <= MaxAllocSize / sizeof(HashJoinTuple))
-			{
-				hashtable->nbuckets_optimal *= 2;
-				hashtable->log2_nbuckets_optimal += 1;
-			}
-		}
+		if (hashtable->nbatch == 1)
+			ExecHashTableComputeOptimalBuckets(hashtable);
 
 		/* Account for space used, and back off if we've used too much */
 		hashtable->spaceUsed += hashTupleSize;
@@ -901,9 +1463,11 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		 * put the tuple into a temp file for later batches
 		 */
 		Assert(batchno > hashtable->curbatch);
-		ExecHashJoinSaveTuple(tuple,
+		ExecHashJoinSaveTuple(hashtable,
+							  tuple,
 							  hashvalue,
-							  &hashtable->innerBatchFile[batchno]);
+							  batchno,
+							  true);
 	}
 }
 
@@ -1047,6 +1611,138 @@ ExecHashGetBucketAndBatch(HashJoinTable hashtable,
 }
 
 /*
+ * Update the local hashtable with the current pointers and sizes from
+ * hashtable->parallel_state.
+ */
+void
+ExecHashUpdate(HashJoinTable hashtable)
+{
+	Barrier *barrier;
+
+	if (!HashJoinTableIsShared(hashtable))
+		return;
+
+	barrier = &hashtable->shared->barrier;
+
+	/*
+	 * This should only be called in a phase when the hash table is not being
+	 * mutated (ie resized, swapped etc).
+	 */
+	Assert(!PHJ_PHASE_MUTATING_TABLE(
+		BarrierPhase(&hashtable->shared->barrier)));
+
+	/* The primary hash table. */
+	hashtable->buckets = (HashJoinBucketHead *)
+		dsa_get_address(hashtable->area,
+						hashtable->shared->primary_buckets);
+	hashtable->nbuckets = hashtable->shared->nbuckets;
+	hashtable->log2_nbuckets = my_log2(hashtable->nbuckets);
+	/* The secondary hash table, if there is one (NULL for initial batch). */
+	hashtable->next_buckets = (HashJoinBucketHead *)
+		dsa_get_address(hashtable->area,
+						hashtable->shared->secondary_buckets);
+
+	hashtable->curbatch = PHJ_PHASE_TO_BATCHNO(BarrierPhase(barrier));
+}
+
+/*
+ * Get the next tuple in the same bucket as 'tuple'.
+ */
+static HashJoinTuple
+next_tuple_in_bucket(HashJoinTable table, HashJoinTuple tuple)
+{
+	if (HashJoinTableIsShared(table))
+		return (HashJoinTuple)
+			dsa_get_address(table->area, tuple->next.shared);
+	else
+		return tuple->next.private;
+}
+
+/*
+ * Get the first tuple in a given skew bucket identified by number.
+ */
+static HashJoinTuple
+first_tuple_in_skew_bucket(HashJoinTable table, int skew_bucket_no)
+{
+	if (HashJoinTableIsShared(table))
+		return (HashJoinTuple)
+			dsa_get_address(table->area,
+							table->skewBucket[skew_bucket_no]->tuples.shared);
+	else
+		return table->skewBucket[skew_bucket_no]->tuples.private;
+}
+
+/*
+ * Get the first tuple in a given bucket identified by number.
+ */
+static HashJoinTuple
+first_tuple_in_bucket(HashJoinTable table, int bucket_no)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		dsa_pointer p =
+			dsa_pointer_atomic_read(&table->buckets[bucket_no].shared);
+		return (HashJoinTuple) dsa_get_address(table->area, p);
+	}
+	else
+		return table->buckets[bucket_no].private;
+}
+
+/*
+ * Insert a tuple at the front of a given bucket identified by number.  For
+ * shared hash joins, tuple_shared must be provided, pointing to the tuple in
+ * the dsa_area backing the table.  For private hash joins, it should be
+ * InvalidDsaPointer.
+ */
+static void
+insert_tuple_into_bucket(HashJoinTable table, int bucket_no,
+						 HashJoinTuple tuple, dsa_pointer tuple_shared)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		Assert(tuple == dsa_get_address(table->area, tuple_shared));
+		for (;;)
+		{
+			tuple->next.shared =
+				dsa_pointer_atomic_read(&table->buckets[bucket_no].shared);
+			if (dsa_pointer_atomic_compare_exchange(&table->buckets[bucket_no].shared,
+													&tuple->next.shared,
+													tuple_shared))
+				break;
+		}
+	}
+	else
+	{
+		tuple->next.private = table->buckets[bucket_no].private;
+		table->buckets[bucket_no].private = tuple;
+	}
+}
+
+/*
+ * Insert a tuple at the front of a given skew bucket identified by number.
+ * For shared hash joins, tuple_shared must be provided, pointing to the tuple
+ * in the dsa_area backing the table.  For private hash joins, it should be
+ * InvalidDsaPointer.
+ */
+static void
+insert_tuple_into_skew_bucket(HashJoinTable table, int skew_bucket_no,
+							  HashJoinTuple tuple,
+							  dsa_pointer tuple_shared)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		tuple->next.shared =
+			table->skewBucket[skew_bucket_no]->tuples.shared;
+		table->skewBucket[skew_bucket_no]->tuples.shared = tuple_shared;
+	}
+	else
+	{
+		tuple->next.private = table->skewBucket[skew_bucket_no]->tuples.private;
+		table->skewBucket[skew_bucket_no]->tuples.private = tuple;
+	}
+}
+
+/*
  * ExecScanHashBucket
  *		scan a hash bucket for matches to the current outer tuple
  *
@@ -1073,11 +1769,12 @@ ExecScanHashBucket(HashJoinState *hjstate,
 	 * otherwise scan the standard hashtable bucket.
 	 */
 	if (hashTuple != NULL)
-		hashTuple = hashTuple->next;
+		hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 	else if (hjstate->hj_CurSkewBucketNo != INVALID_SKEW_BUCKET_NO)
-		hashTuple = hashtable->skewBucket[hjstate->hj_CurSkewBucketNo]->tuples;
+		hashTuple = first_tuple_in_skew_bucket(hashtable,
+											   hjstate->hj_CurSkewBucketNo);
 	else
-		hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
+		hashTuple = first_tuple_in_bucket(hashtable, hjstate->hj_CurBucketNo);
 
 	while (hashTuple != NULL)
 	{
@@ -1101,7 +1798,7 @@ ExecScanHashBucket(HashJoinState *hjstate,
 			}
 		}
 
-		hashTuple = hashTuple->next;
+		hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 	}
 
 	/*
@@ -1144,6 +1841,21 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	HashJoinTuple hashTuple = hjstate->hj_CurTuple;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		int phase PG_USED_FOR_ASSERTS_ONLY;
+
+		/*
+		 * TODO: This walks the buckets in parallel mode, like the existing
+		 * code, but it might make more sense to hand out chunks to workers
+		 * instead of buckets.
+		 */
+
+		phase = BarrierPhase(&hashtable->shared->barrier);
+		Assert(PHJ_PHASE_TO_SUBPHASE(phase) == PHJ_SUBPHASE_UNMATCHED);
+		Assert(PHJ_PHASE_TO_BATCHNO(phase) == hashtable->curbatch);
+	}
+
 	for (;;)
 	{
 		/*
@@ -1152,21 +1864,35 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 		 * bucket.
 		 */
 		if (hashTuple != NULL)
-			hashTuple = hashTuple->next;
-		else if (hjstate->hj_CurBucketNo < hashtable->nbuckets)
+			hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
+		else if (HashJoinTableIsShared(hashtable))
 		{
-			hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
-			hjstate->hj_CurBucketNo++;
+			int bucketno =
+				(int) pg_atomic_fetch_add_u32(
+					&hashtable->shared->next_unmatched_bucket, 1);
+
+			if (bucketno > hashtable->nbuckets)
+				break;			/* finished all buckets */
+
+			hashTuple = first_tuple_in_bucket(hashtable, bucketno);
+
+			/* TODO: parallel skew bucket support */
 		}
-		else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets)
+		else
 		{
-			int			j = hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo];
+			if (hjstate->hj_CurBucketNo < hashtable->nbuckets)
+				hashTuple = first_tuple_in_bucket(hashtable,
+												  hjstate->hj_CurBucketNo++);
+			else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets)
+			{
+				int			j = hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo];
 
-			hashTuple = hashtable->skewBucket[j]->tuples;
-			hjstate->hj_CurSkewBucketNo++;
+				hashTuple = first_tuple_in_skew_bucket(hashtable, j);
+				hjstate->hj_CurSkewBucketNo++;
+			}
+			else
+				break;				/* finished all buckets */
 		}
-		else
-			break;				/* finished all buckets */
 
 		while (hashTuple != NULL)
 		{
@@ -1191,7 +1917,7 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 				return true;
 			}
 
-			hashTuple = hashTuple->next;
+			hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		}
 	}
 
@@ -1212,6 +1938,52 @@ ExecHashTableReset(HashJoinTable hashtable)
 	MemoryContext oldcxt;
 	int			nbuckets = hashtable->nbuckets;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Wait for all workers to finish accessing the primary hash table. */
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_UNMATCHED);
+		if (BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_UNMATCHED))
+		{
+			/* Serial phase: promote the secondary table to primary. */
+			dsa_pointer tmp;
+			int i;
+
+			Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+				   PHJ_SUBPHASE_PROMOTING);
+
+			/* Clear the old primary table. */
+			for (i = 0; i < nbuckets; ++i)
+				dsa_pointer_atomic_write(&hashtable->buckets[i].shared,
+										 InvalidDsaPointer);
+
+			/* Swap the two tables. */
+			tmp = hashtable->shared->primary_buckets;
+			hashtable->shared->primary_buckets =
+				hashtable->shared->secondary_buckets;
+			hashtable->shared->secondary_buckets = tmp;
+
+			/* Swap the chunk lists. */
+			tmp = dsa_pointer_atomic_read(&hashtable->shared->head_primary_chunk);
+			dsa_pointer_atomic_write(&hashtable->shared->head_primary_chunk,
+									 dsa_pointer_atomic_read(&hashtable->shared->head_secondary_chunk));
+			dsa_pointer_atomic_write(&hashtable->shared->head_secondary_chunk,
+									 tmp);
+
+			/* TODO: Free the secondary chunks. */
+			/* TODO: Or put them on a freelist instead? */
+
+			pg_atomic_write_u32(&hashtable->shared->next_unmatched_bucket,
+								0);
+		}
+		/* Wait again, so that all workers now have the new table. */
+		BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_PROMOTING);
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_LOADING);
+		ExecHashUpdate(hashtable);
+		return;
+	}
+
 	/*
 	 * Release all the hash buckets and tuples acquired in the prior pass, and
 	 * reinitialize the context for a new pass.
@@ -1220,15 +1992,15 @@ ExecHashTableReset(HashJoinTable hashtable)
 	oldcxt = MemoryContextSwitchTo(hashtable->batchCxt);
 
 	/* Reallocate and reinitialize the hash bucket headers. */
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	hashtable->buckets = (HashJoinBucketHead *)
+		palloc0(nbuckets * sizeof(HashJoinBucketHead));
 
 	hashtable->spaceUsed = 0;
 
 	MemoryContextSwitchTo(oldcxt);
 
 	/* Forget the chunks (the memory was freed by the context reset above). */
-	hashtable->chunks = NULL;
+	hashtable->primary_chunk = NULL;
 }
 
 /*
@@ -1241,10 +2013,14 @@ ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 	HashJoinTuple tuple;
 	int			i;
 
+	/* TODO: share parallel reset work!  coordinate! */
+
 	/* Reset all flags in the main table ... */
 	for (i = 0; i < hashtable->nbuckets; i++)
 	{
-		for (tuple = hashtable->buckets[i]; tuple != NULL; tuple = tuple->next)
+		for (tuple = first_tuple_in_bucket(hashtable, i);
+			 tuple != NULL;
+			 next_tuple_in_bucket(hashtable, tuple))
 			HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(tuple));
 	}
 
@@ -1252,9 +2028,10 @@ ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 	for (i = 0; i < hashtable->nSkewBuckets; i++)
 	{
 		int			j = hashtable->skewBucketNums[i];
-		HashSkewBucket *skewBucket = hashtable->skewBucket[j];
 
-		for (tuple = skewBucket->tuples; tuple != NULL; tuple = tuple->next)
+		for (tuple = first_tuple_in_skew_bucket(hashtable, j);
+			 tuple != NULL;
+			 tuple = next_tuple_in_bucket(hashtable, tuple))
 			HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(tuple));
 	}
 }
@@ -1414,11 +2191,11 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 				continue;
 
 			/* Okay, create a new skew bucket for this hashvalue. */
-			hashtable->skewBucket[bucket] = (HashSkewBucket *)
+			hashtable->skewBucket[bucket] = (HashSkewBucket *) /* TODO */
 				MemoryContextAlloc(hashtable->batchCxt,
 								   sizeof(HashSkewBucket));
 			hashtable->skewBucket[bucket]->hashvalue = hashvalue;
-			hashtable->skewBucket[bucket]->tuples = NULL;
+			hashtable->skewBucket[bucket]->tuples.private = NULL;
 			hashtable->skewBucketNums[hashtable->nSkewBuckets] = bucket;
 			hashtable->nSkewBuckets++;
 			hashtable->spaceUsed += SKEW_BUCKET_OVERHEAD;
@@ -1496,18 +2273,29 @@ ExecHashSkewTableInsert(HashJoinTable hashtable,
 	MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
 	HashJoinTuple hashTuple;
 	int			hashTupleSize;
+	dsa_pointer tuple_pointer;
 
 	/* Create the HashJoinTuple */
 	hashTupleSize = HJTUPLE_OVERHEAD + tuple->t_len;
-	hashTuple = (HashJoinTuple) MemoryContextAlloc(hashtable->batchCxt,
-												   hashTupleSize);
+	if (HashJoinTableIsShared(hashtable))
+	{
+		tuple_pointer = dsa_allocate(hashtable->area, hashTupleSize);
+		hashTuple = (HashJoinTuple) dsa_get_address(hashtable->area,
+													tuple_pointer);
+	}
+	else
+	{
+		tuple_pointer = InvalidDsaPointer;
+		hashTuple = (HashJoinTuple) MemoryContextAlloc(hashtable->batchCxt,
+													   hashTupleSize);
+	}
 	hashTuple->hashvalue = hashvalue;
 	memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
 	HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
 	/* Push it onto the front of the skew bucket's list */
-	hashTuple->next = hashtable->skewBucket[bucketNumber]->tuples;
-	hashtable->skewBucket[bucketNumber]->tuples = hashTuple;
+	insert_tuple_into_skew_bucket(hashtable, bucketNumber, hashTuple,
+								  tuple_pointer);
 
 	/* Account for space used, and back off if we've used too much */
 	hashtable->spaceUsed += hashTupleSize;
@@ -1538,6 +2326,9 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 	int			batchno;
 	HashJoinTuple hashTuple;
 
+	/* TODO: skew buckets not yet supported for parallel mode */
+	Assert(!HashJoinTableIsShared(hashtable));
+
 	/* Locate the bucket to remove */
 	bucketToRemove = hashtable->skewBucketNums[hashtable->nSkewBuckets - 1];
 	bucket = hashtable->skewBucket[bucketToRemove];
@@ -1552,10 +2343,10 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 	ExecHashGetBucketAndBatch(hashtable, hashvalue, &bucketno, &batchno);
 
 	/* Process all tuples in the bucket */
-	hashTuple = bucket->tuples;
+	hashTuple = first_tuple_in_skew_bucket(hashtable, bucketToRemove);
 	while (hashTuple != NULL)
 	{
-		HashJoinTuple nextHashTuple = hashTuple->next;
+		HashJoinTuple nextHashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		MinimalTuple tuple;
 		Size		tupleSize;
 
@@ -1581,8 +2372,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 			memcpy(copyTuple, hashTuple, tupleSize);
 			pfree(hashTuple);
 
-			copyTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = copyTuple;
+			insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+									 InvalidDsaPointer);
 
 			/* We have reduced skew space, but overall space doesn't change */
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -1591,9 +2382,9 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 		{
 			/* Put the tuple into a temp file for later batches */
 			Assert(batchno > hashtable->curbatch);
-			ExecHashJoinSaveTuple(tuple, hashvalue,
-								  &hashtable->innerBatchFile[batchno]);
-			pfree(hashTuple);
+			ExecHashJoinSaveTuple(hashtable, tuple, hashvalue,
+								  batchno, true);
+			// pfree(hashTuple); /* TODO:TM */
 			hashtable->spaceUsed -= tupleSize;
 			hashtable->spaceUsedSkew -= tupleSize;
 		}
@@ -1636,6 +2427,173 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 }
 
 /*
+ * For parallel execution, load as much of the next batch as we can as part of
+ * the probing phase for the current batch.  This overlapping means that we do
+ * something useful before we start waiting for other workers.
+ */
+void
+ExecHashPreloadNextBatch(HashJoinTable hashtable)
+{
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Barrier *barrier PG_USED_FOR_ASSERTS_ONLY = &hashtable->shared->barrier;
+		int curbatch = hashtable->curbatch;
+		int next_batch = curbatch + 1;
+
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING_BATCH(curbatch));
+
+		/* Prepare to read shared batch files for the next batch. */
+		ExecHashJoinInitializeBatchReader(hashtable, next_batch, true);
+
+		if (next_batch < hashtable->nbatch &&
+			hashtable->innerBatchFile[next_batch] != NULL)
+		{
+			/* TODO: Load into secondary hash table while memory is free! */
+		}
+
+		/*
+		 * TODO: While doing this, also watch for chunks that can be
+		 * rebatched, and help with that.
+		 */
+	}
+}
+
+/*
+ * Add to the primary or secondary tuple counter.
+ */
+static void
+add_tuple_count(HashJoinTable hashtable, int count, bool secondary)
+{
+	if (secondary)
+		pg_atomic_fetch_add_u64(&hashtable->shared->total_secondary_tuples,
+								count);
+	else
+	{
+		uint32 total =
+			pg_atomic_fetch_add_u64(&hashtable->shared->total_primary_tuples,
+									count);
+		/* Also update this backend's counter. */
+		hashtable->totalTuples = total + count;
+	}
+}
+
+/*
+ * Allocate 'size' bytes from the currently active shared HashMemoryChunk.
+ * This is essentially the same as the private memory version, but allocates
+ * from separate chunks for the secondary table and periodically updates the
+ * shared tuple counter.
+ */
+static void *
+dense_alloc_shared(HashJoinTable hashtable,
+				   Size size,
+				   dsa_pointer *shared,
+				   bool secondary)
+{
+	dsa_pointer chunk_shared;
+	HashMemoryChunk chunk;
+	char	   *ptr;
+
+	/* just in case the size is not already aligned properly */
+	size = MAXALIGN(size);
+
+	/*
+	 * If tuple size is larger than of 1/4 of chunk size, allocate a separate
+	 * chunk.
+	 */
+	if (size > HASH_CHUNK_THRESHOLD)
+	{
+		/* allocate new chunk */
+		chunk_shared =
+			dsa_allocate(hashtable->area,
+						 offsetof(HashMemoryChunkData, data) + size);
+		chunk = (HashMemoryChunk)
+			dsa_get_address(hashtable->area, chunk_shared);
+		*shared = chunk_shared + offsetof(HashMemoryChunkData, data);
+		chunk->maxlen = size;
+		chunk->used = size;
+		chunk->ntuples = 1;
+
+		/*
+		 * Push onto the appropriate chunk list, but don't make it the current
+		 * chunk because it hasn't got any more useful space in it.  The
+		 * current chunk may still have space, so keep that one current.
+		 */
+		ExecHashPushChunk(hashtable, chunk, chunk_shared,
+						  secondary ?
+						  &hashtable->shared->head_secondary_chunk :
+						  &hashtable->shared->head_primary_chunk);
+
+		/* Count these huge tuples immediately. */
+		add_tuple_count(hashtable, 1, secondary);
+		return chunk->data;
+	}
+
+	/*
+	 * See if we have enough space for it in the current chunk (if any). If
+	 * not, allocate a fresh chunk.
+	 */
+	chunk = secondary ? hashtable->secondary_chunk : hashtable->primary_chunk;
+	if (chunk == NULL || (chunk->maxlen - chunk->used) < size)
+	{
+		/*
+		 * Add the tuplecount for the outgoing chunk to the shared counter.
+		 * Doing this only every time we need to allocate a new chunk should
+		 * reduce contention on the shared counter.
+		 */
+		if (chunk != NULL)
+			add_tuple_count(hashtable, chunk->ntuples, secondary);
+
+		/*
+		 * Allocate new chunk and make it the current chunk for this backend
+		 * to allocate from.
+		 */
+		chunk_shared =
+			dsa_allocate(hashtable->area,
+						 offsetof(HashMemoryChunkData, data) +
+						 HASH_CHUNK_SIZE);
+		chunk = (HashMemoryChunk)
+			dsa_get_address(hashtable->area, chunk_shared);
+		*shared = chunk_shared + offsetof(HashMemoryChunkData, data);
+		if (secondary)
+		{
+			hashtable->secondary_chunk = chunk;
+			hashtable->secondary_chunk_shared = chunk_shared;
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->head_secondary_chunk);
+		}
+		else
+		{
+			hashtable->primary_chunk = chunk;
+			hashtable->primary_chunk_shared = chunk_shared;
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->head_primary_chunk);
+		}
+		chunk->maxlen = HASH_CHUNK_SIZE;
+		chunk->used = size;
+		chunk->ntuples = 1;
+
+		/*
+		 * The shared tuple counter will be updated when this chunk is
+		 * eventually full.  See above.
+		 */
+
+		return chunk->data;
+	}
+
+	/* There is enough space in the current chunk, let's add the tuple */
+	chunk_shared =
+		secondary ? hashtable->secondary_chunk_shared :
+		hashtable->primary_chunk_shared;
+	ptr = chunk->data + chunk->used;
+	*shared = chunk_shared + offsetof(HashMemoryChunkData, data) + chunk->used;
+	chunk->used += size;
+	chunk->ntuples += 1;
+
+	/* return pointer to the start of the tuple memory */
+	return ptr;
+}
+
+/*
  * Allocate 'size' bytes from the currently active HashMemoryChunk
  */
 static void *
@@ -1653,9 +2611,11 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 */
 	if (size > HASH_CHUNK_THRESHOLD)
 	{
+
 		/* allocate new chunk and put it at the beginning of the list */
-		newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt,
-								 offsetof(HashMemoryChunkData, data) + size);
+		newChunk = (HashMemoryChunk)
+			MemoryContextAlloc(hashtable->batchCxt,
+							   offsetof(HashMemoryChunkData, data) + size);
 		newChunk->maxlen = size;
 		newChunk->used = 0;
 		newChunk->ntuples = 0;
@@ -1664,15 +2624,15 @@ dense_alloc(HashJoinTable hashtable, Size size)
 		 * Add this chunk to the list after the first existing chunk, so that
 		 * we don't lose the remaining space in the "current" chunk.
 		 */
-		if (hashtable->chunks != NULL)
+		if (hashtable->primary_chunk != NULL)
 		{
-			newChunk->next = hashtable->chunks->next;
-			hashtable->chunks->next = newChunk;
+			newChunk->next.private = hashtable->primary_chunk->next.private;
+			hashtable->primary_chunk->next.private = newChunk;
 		}
 		else
 		{
-			newChunk->next = hashtable->chunks;
-			hashtable->chunks = newChunk;
+			newChunk->next.private = NULL;
+			hashtable->primary_chunk = newChunk;
 		}
 
 		newChunk->used += size;
@@ -1685,27 +2645,27 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 * See if we have enough space for it in the current chunk (if any). If
 	 * not, allocate a fresh chunk.
 	 */
-	if ((hashtable->chunks == NULL) ||
-		(hashtable->chunks->maxlen - hashtable->chunks->used) < size)
+	if ((hashtable->primary_chunk == NULL) ||
+		(hashtable->primary_chunk->maxlen - hashtable->primary_chunk->used) < size)
 	{
 		/* allocate new chunk and put it at the beginning of the list */
-		newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt,
-					  offsetof(HashMemoryChunkData, data) + HASH_CHUNK_SIZE);
-
+		newChunk = (HashMemoryChunk)
+			MemoryContextAlloc(hashtable->batchCxt,
+							   offsetof(HashMemoryChunkData, data) +
+							   HASH_CHUNK_SIZE);
+		newChunk->next.private = hashtable->primary_chunk;
+		hashtable->primary_chunk = newChunk;
 		newChunk->maxlen = HASH_CHUNK_SIZE;
 		newChunk->used = size;
 		newChunk->ntuples = 1;
 
-		newChunk->next = hashtable->chunks;
-		hashtable->chunks = newChunk;
-
 		return newChunk->data;
 	}
 
 	/* There is enough space in the current chunk, let's add the tuple */
-	ptr = hashtable->chunks->data + hashtable->chunks->used;
-	hashtable->chunks->used += size;
-	hashtable->chunks->ntuples += 1;
+	ptr = hashtable->primary_chunk->data + hashtable->primary_chunk->used;
+	hashtable->primary_chunk->used += size;
+	hashtable->primary_chunk->ntuples += 1;
 
 	/* return pointer to the start of the tuple memory */
 	return ptr;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 369e666..3819151 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -21,8 +21,11 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/barrier.h"
 #include "utils/memutils.h"
 
+#include <unistd.h> /* TODO: remove */
 
 /*
  * States of the ExecHashJoin state machine
@@ -46,7 +49,14 @@ static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 						  BufFile *file,
 						  uint32 *hashvalue,
 						  TupleTableSlot *tupleSlot);
+static TupleTableSlot *ExecHashJoinGetSavedTupleShared(HashJoinTable hashtable,
+						  bool inner,
+						  uint32 batchno,
+						  uint32 *hashvalue,
+						  TupleTableSlot *tupleSlot);
 static bool ExecHashJoinNewBatch(HashJoinState *hjstate);
+static void ExecHashJoinLoadBatch(HashJoinState *hjstate);
+static void ExecHashJoinExportBatches(HashJoinTable hashtable);
 
 
 /* ----------------------------------------------------------------
@@ -147,6 +157,14 @@ ExecHashJoin(HashJoinState *node)
 					/* no chance to not build the hash table */
 					node->hj_FirstOuterTupleSlot = NULL;
 				}
+				else if (hashNode->shared_table_data != NULL)
+				{
+					/*
+					 * TODO: The empty-outer optimization is not implemented
+					 * for shared hash tables yet.
+					 */
+					node->hj_FirstOuterTupleSlot = NULL;
+				}
 				else if (HJ_FILL_OUTER(node) ||
 						 (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
 						  !node->hj_OuterNotEmpty))
@@ -166,7 +184,7 @@ ExecHashJoin(HashJoinState *node)
 				/*
 				 * create the hash table
 				 */
-				hashtable = ExecHashTableCreate((Hash *) hashNode->ps.plan,
+				hashtable = ExecHashTableCreate(hashNode,
 												node->hj_HashOperators,
 												HJ_FILL_INNER(node));
 				node->hj_HashTable = hashtable;
@@ -177,12 +195,29 @@ ExecHashJoin(HashJoinState *node)
 				hashNode->hashtable = hashtable;
 				(void) MultiExecProcNode((PlanState *) hashNode);
 
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Assert(BarrierPhase(&hashtable->shared->barrier) >=
+						   PHJ_PHASE_HASHING);
+
+					/* Allow other backends to access batches we generated. */
+					ExecHashJoinExportBatches(hashtable);
+
+					/*
+					 * Check if we are a worker that attached too late to
+					 * avoid deadlock risk with the leader.
+					 */
+					if (ExecHashCheckForEarlyExit(hashtable))
+						return NULL;
+				}
+
 				/*
 				 * If the inner relation is completely empty, and we're not
 				 * doing a left outer join, we can quit without scanning the
 				 * outer relation.
 				 */
-				if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
+				if (!HashJoinTableIsShared(hashtable) && /* TODO:TM */
+					hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
 					return NULL;
 
 				/*
@@ -198,12 +233,66 @@ ExecHashJoin(HashJoinState *node)
 				 */
 				node->hj_OuterNotEmpty = false;
 
-				node->hj_JoinState = HJ_NEED_NEW_OUTER;
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Barrier *barrier = &hashtable->shared->barrier;
+					int phase = BarrierPhase(barrier);
+
+					/*
+					 * Map the current phase to the appropriate initial state
+					 * for this worker, so we can get started.
+					 */
+					Assert(BarrierPhase(barrier) >= PHJ_PHASE_PROBING);
+					hashtable->curbatch = PHJ_PHASE_TO_BATCHNO(phase);
+					switch (PHJ_PHASE_TO_SUBPHASE(phase))
+					{
+					case PHJ_SUBPHASE_PROMOTING:
+						/* Wait for serial phase to finish. */
+						BarrierWait(barrier, WAIT_EVENT_HASHJOIN_PROMOTING);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_LOADING);
+						/* fall through */
+					case PHJ_SUBPHASE_LOADING:
+						/* Help load the current batch. */
+						ExecHashUpdate(hashtable);
+						ExecHashJoinInitializeBatchReader(hashtable,
+														  hashtable->curbatch,
+														  true);
+						ExecHashJoinLoadBatch(node);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_PROBING);
+						/* fall through */
+					case PHJ_SUBPHASE_PROBING:
+						/* Help probe the current batch. */
+						ExecHashUpdate(hashtable);
+						ExecHashJoinInitializeBatchReader(hashtable,
+														  hashtable->curbatch,
+														  false);
+						node->hj_JoinState = HJ_NEED_NEW_OUTER;
+						break;
+					case PHJ_SUBPHASE_UNMATCHED:
+						/* Help scan for unmatched inner tuples. */
+						ExecHashUpdate(hashtable);
+						node->hj_JoinState = HJ_FILL_INNER_TUPLES;
+						break;
+					}
+					continue;
+				}
+				else
+					node->hj_JoinState = HJ_NEED_NEW_OUTER;
 
 				/* FALL THRU */
 
 			case HJ_NEED_NEW_OUTER:
 
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Assert(PHJ_PHASE_TO_BATCHNO(BarrierPhase(&hashtable->shared->barrier)) ==
+						   hashtable->curbatch);
+					Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+						   PHJ_SUBPHASE_PROBING);
+				}
+
 				/*
 				 * We don't have an outer tuple, try to get the next one
 				 */
@@ -213,6 +302,38 @@ ExecHashJoin(HashJoinState *node)
 				if (TupIsNull(outerTupleSlot))
 				{
 					/* end of batch, or maybe whole join */
+					if (HashJoinTableIsShared(hashtable))
+					{
+						/* Allow other backends to access our batches. */
+						ExecHashJoinExportBatches(hashtable);
+						/*
+						 * Check if we are a leader that can't go further than
+						 * probing the first batch without deadlock risk,
+						 * because there are workers running.
+						 */
+						if (ExecHashCheckForEarlyExit(hashtable))
+						{
+							elog(LOG, "leader detaching!");
+							return NULL;
+						}
+
+						/*
+						 * We may be able to load some amount of the next
+						 * batch into spare work_mem, before we start waiting
+						 * for other workers to finish probing the current
+						 * batch.
+						 */
+						ExecHashPreloadNextBatch(hashtable);
+						/*
+						 * You can't start searching for unmatched tuples
+						 * until all workers have finished probing, so we
+						 * synchronize here.
+						 */
+						BarrierWait(&hashtable->shared->barrier,
+									WAIT_EVENT_HASHJOIN_PROBING);
+						Assert(BarrierPhase(&hashtable->shared->barrier) ==
+							   PHJ_PHASE_UNMATCHED_BATCH(hashtable->curbatch));
+					}
 					if (HJ_FILL_INNER(node))
 					{
 						/* set up to scan for unmatched inner tuples */
@@ -250,9 +371,9 @@ ExecHashJoin(HashJoinState *node)
 					 * Save it in the corresponding outer-batch file.
 					 */
 					Assert(batchno > hashtable->curbatch);
-					ExecHashJoinSaveTuple(ExecFetchSlotMinimalTuple(outerTupleSlot),
-										  hashvalue,
-										&hashtable->outerBatchFile[batchno]);
+					ExecHashJoinSaveTuple(hashtable,
+										  ExecFetchSlotMinimalTuple(outerTupleSlot),
+										  hashvalue, batchno, false);
 					/* Loop around, staying in HJ_NEED_NEW_OUTER state */
 					continue;
 				}
@@ -296,6 +417,13 @@ ExecHashJoin(HashJoinState *node)
 				if (joinqual == NIL || ExecQual(joinqual, econtext, false))
 				{
 					node->hj_MatchedOuter = true;
+					/*
+					 * Note: it is OK to do this in a shared hash table
+					 * without any kind of memory synchronization, because the
+					 * only transition is 0->1, so ordering doesn't matter if
+					 * several backends do it, and there will be a memory
+					 * barrier before anyone reads it.
+					 */
 					HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
 
 					/* In an antijoin, we never return a matched tuple */
@@ -702,10 +830,18 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 		if (file == NULL)
 			return NULL;
 
-		slot = ExecHashJoinGetSavedTuple(hjstate,
-										 file,
-										 hashvalue,
-										 hjstate->hj_OuterTupleSlot);
+		/* TODO: refactor to one function call? */
+		if (HashJoinTableIsShared(hashtable))
+			slot = ExecHashJoinGetSavedTupleShared(hashtable,
+												   false,
+												   curbatch,
+												   hashvalue,
+												   hjstate->hj_OuterTupleSlot);
+		else
+			slot = ExecHashJoinGetSavedTuple(hjstate,
+											 file,
+											 hashvalue,
+											 hjstate->hj_OuterTupleSlot);
 		if (!TupIsNull(slot))
 			return slot;
 	}
@@ -726,13 +862,17 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	int			nbatch;
 	int			curbatch;
-	BufFile    *innerFile;
-	TupleTableSlot *slot;
-	uint32		hashvalue;
+	Barrier	   *barrier;
 
 	nbatch = hashtable->nbatch;
 	curbatch = hashtable->curbatch;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		barrier = &hashtable->shared->barrier;
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_UNMATCHED_BATCH(curbatch));
+	}
+
 	if (curbatch > 0)
 	{
 		/*
@@ -793,6 +933,20 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 			nbatch != hashtable->nbatch_outstart)
 			break;				/* must process due to rule 3 */
 		/* We can ignore this batch. */
+		if (HashJoinTableIsShared(hashtable))
+		{
+			/* Skip the batch, but stay in sync with group. */
+			Assert(BarrierPhase(barrier) == PHJ_PHASE_UNMATCHED_BATCH(curbatch - 1));
+			ExecHashTableReset(hashtable);
+			Assert(BarrierPhase(barrier) == PHJ_PHASE_LOADING_BATCH(curbatch));
+			if (BarrierWait(&hashtable->shared->barrier,
+							WAIT_EVENT_HASHJOIN_SKIP_LOADING))
+				ExecHashJoinResetBatchReaders(hashtable);
+			Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING_BATCH(curbatch));
+			BarrierWait(&hashtable->shared->barrier,
+						WAIT_EVENT_HASHJOIN_SKIP_PROBING);
+			Assert(BarrierPhase(barrier) == PHJ_PHASE_UNMATCHED_BATCH(curbatch));
+		}
 		/* Release associated temp files right away. */
 		if (hashtable->innerBatchFile[curbatch])
 			BufFileClose(hashtable->innerBatchFile[curbatch]);
@@ -812,26 +966,63 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	 * Reload the hash table with the new inner batch (which could be empty)
 	 */
 	ExecHashTableReset(hashtable);
+	ExecHashJoinLoadBatch(hjstate);
+
+	return true;
+}
+
+static void
+ExecHashJoinLoadBatch(HashJoinState *hjstate)
+{
+	HashJoinTable hashtable = hjstate->hj_HashTable;
+	int			curbatch = hashtable->curbatch;
+	BufFile    *innerFile;
+	TupleTableSlot *slot;
+	uint32		hashvalue;
+
+	if (HashJoinTableIsShared(hashtable))
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_LOADING);
 
 	innerFile = hashtable->innerBatchFile[curbatch];
 
 	if (innerFile != NULL)
 	{
-		if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
-			ereport(ERROR,
-					(errcode_for_file_access(),
+		/*
+		 * TODO: Do not rewind inner batch file for shared hash tables,
+		 * because ExecHashPreloadNextBatch already did that and left the read
+		 * head at the right place for us to continue.  Tidy up...
+		 */
+		if (!HashJoinTableIsShared(hashtable))
+		{
+			if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
+				ereport(ERROR,
+						(errcode_for_file_access(),
 				   errmsg("could not rewind hash-join temporary file: %m")));
+		}
 
-		while ((slot = ExecHashJoinGetSavedTuple(hjstate,
+		for (;;)
+		{
+			/* TODO: refactor this into one function call? */
+			if (HashJoinTableIsShared(hashtable))
+				slot = ExecHashJoinGetSavedTupleShared(hashtable,
+													   true,
+													   curbatch,
+													   &hashvalue,
+													   hjstate->hj_HashTupleSlot);
+			else
+				slot = ExecHashJoinGetSavedTuple(hjstate,
 												 innerFile,
 												 &hashvalue,
-												 hjstate->hj_HashTupleSlot)))
-		{
+												 hjstate->hj_HashTupleSlot);
+			if (slot == NULL)
+				break;
+
 			/*
 			 * NOTE: some tuples may be sent to future batches.  Also, it is
 			 * possible for hashtable->nbatch to be increased here!
 			 */
-			ExecHashTableInsert(hashtable, slot, hashvalue);
+			ExecHashTableInsert(hashtable, slot, hashvalue, false);
 		}
 
 		/*
@@ -845,7 +1036,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	/*
 	 * Rewind outer batch file (if present), so that we can start reading it.
 	 */
-	if (hashtable->outerBatchFile[curbatch] != NULL)
+	if (!HashJoinTableIsShared(hashtable) && hashtable->outerBatchFile[curbatch] != NULL)
 	{
 		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
 			ereport(ERROR,
@@ -853,7 +1044,112 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 				   errmsg("could not rewind hash-join temporary file: %m")));
 	}
 
-	return true;
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Wait until all workers have finished loading their portion of the
+		 * hash table, so that all workers can start probing.
+		 */
+		if (BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASHJOIN_LOADING))
+			ExecHashJoinResetBatchReaders(hashtable);
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
+		ExecHashJoinInitializeBatchReader(hashtable, hashtable->curbatch, false);
+	}
+}
+
+/*
+ * Export a BufFile, copy the descriptor to DSA memory and return the
+ * dsa_pointer.
+ */
+static dsa_pointer
+make_batch_descriptor(dsa_area *area, BufFile *file)
+{
+	dsa_pointer pointer;
+	BufFileDescriptor *source;
+	BufFileDescriptor *target;
+	size_t size;
+
+	source = BufFileExport(file);
+	size = BufFileDescriptorSize(source);
+	pointer = dsa_allocate(area, size);
+	if (!DsaPointerIsValid(pointer))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed on dsa_allocate of size %zu.", size)));
+	target = dsa_get_address(area, pointer);
+	memcpy(target, source, size);
+	pfree(source);
+
+	return pointer;
+}
+
+/*
+ * Publish a batch descriptor for a future batch so that other participants
+ * can import it and read it.  If 'descriptor' is InvalidDsaPointer, then
+ * forget the published descriptor so that it will be reexported later.
+ */
+static void
+set_batch_descriptor(HashJoinTable hashtable, int batchno, bool inner,
+					 dsa_pointer descriptor)
+{
+	HashJoinParticipantState *participant;
+	dsa_pointer *level1;
+	dsa_pointer *level2;
+	int rank;
+	int index;
+
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+	rank = fls(batchno);
+	index = batchno % (1 << (rank - 1));
+	level1 = inner ? participant->inner_batch_descriptors
+				   : participant->outer_batch_descriptors;
+	if (level1[rank] == InvalidDsaPointer)
+	{
+		size_t size = sizeof(dsa_pointer) * (1 << rank);
+
+		level1[rank] = dsa_allocate(hashtable->area, size);
+		if (level1[rank] == InvalidDsaPointer)
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory"),
+					 errdetail("Failed on dsa_allocate of size %zu.", size)));
+		level2 = dsa_get_address(hashtable->area, level1[rank]);
+		memset(level2, 0, size);
+	}
+	level2 = dsa_get_address(hashtable->area, level1[rank]);
+	if (level2[index] != InvalidDsaPointer)
+		dsa_free(hashtable->area, level2[index]);
+	level2[index] = descriptor;
+}
+
+/*
+ * Get a batch descriptor published by a given participant, if there is one.
+ */
+static BufFileDescriptor *
+get_batch_descriptor(HashJoinTable hashtable, int participant_number,
+					 int batchno, bool inner)
+{
+	HashJoinParticipantState *participant;
+	dsa_pointer *level1;
+	dsa_pointer *level2;
+	int rank;
+	int index;
+
+	participant = &hashtable->shared->participants[participant_number];
+	rank = fls(batchno);
+	index = batchno % (1 << (rank - 1));
+	level1 = inner ? participant->inner_batch_descriptors
+				   : participant->outer_batch_descriptors;
+	if (level1[rank] == InvalidDsaPointer)
+		return NULL;
+	level2 = dsa_get_address(hashtable->area, level1[rank]);
+	if (level2[index] == InvalidDsaPointer)
+		return NULL;
+
+	return (BufFileDescriptor *)
+		dsa_get_address(hashtable->area, level2[index]);
 }
 
 /*
@@ -868,17 +1164,33 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
  * will get messed up.
  */
 void
-ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr)
+ExecHashJoinSaveTuple(HashJoinTable hashtable,
+					  MinimalTuple tuple, uint32 hashvalue,
+					  int batchno,
+					  bool inner)
+					  //BufFile **fileptr)
 {
-	BufFile    *file = *fileptr;
+	BufFile    *file;
 	size_t		written;
 
+	if (inner)
+		file = hashtable->innerBatchFile[batchno];
+	else
+		file = hashtable->outerBatchFile[batchno];
 	if (file == NULL)
 	{
 		/* First write to this batch file, so open it. */
 		file = BufFileCreateTemp(false);
-		*fileptr = file;
+		if (inner)
+			hashtable->innerBatchFile[batchno] = file;
+		else
+			hashtable->outerBatchFile[batchno] = file;
+	}
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* This batch needs to be re-exported, if it was already exported. */
+		set_batch_descriptor(hashtable, batchno, inner, InvalidDsaPointer);
 	}
 
 	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
@@ -939,10 +1251,229 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	return ExecStoreMinimalTuple(tuple, tupleSlot, true);
 }
 
+/*
+ * Export unexported future batches created by this participant, so that other
+ * participants can read from them after they have finished reading their own.
+ */
+static void
+ExecHashJoinExportBatches(HashJoinTable hashtable)
+{
+	int i;
+
+	/* Find this participant's HashJoinParticipantState object. */
+	Assert(HashJoinParticipantNumber() < hashtable->shared->planned_participants);
+
+	/* Export future batches and copy their descriptors into DSA memory. */
+	for (i = hashtable->curbatch + 1; i < hashtable->nbatch; ++i)
+	{
+		if (hashtable->innerBatchFile[i] != NULL &&
+			get_batch_descriptor(hashtable, HashJoinParticipantNumber(), i, true) == InvalidDsaPointer)
+			set_batch_descriptor(hashtable, i, true,
+				make_batch_descriptor(hashtable->area, hashtable->innerBatchFile[i]));
+		if (hashtable->outerBatchFile[i] != NULL &&
+			get_batch_descriptor(hashtable, HashJoinParticipantNumber(), i, false) == InvalidDsaPointer)
+			set_batch_descriptor(hashtable, i, true,
+				make_batch_descriptor(hashtable->area, hashtable->outerBatchFile[i]));
+	}
+}
+
+/*
+ * Initialize the batch reader to prepare it for reading a given batch.
+ */
+void
+ExecHashJoinInitializeBatchReader(HashJoinTable hashtable,
+								  int batchno,
+								  bool inner)
+{
+	HashJoinBatchReader *batch_reader;
+	HashJoinParticipantState *participant;
+
+	batch_reader = &hashtable->batch_reader;
+
+	if (!HashJoinTableIsShared(hashtable))
+		return;
+	if (hashtable->nbatch <= 1)
+		return;
+
+	/* We always start reading from the batch file that this backend wrote. */
+	batch_reader->participant_number = HashJoinParticipantNumber();
+	batch_reader->head.fileno = batch_reader->head.offset = -1;
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+	if (inner)
+	{
+		batch_reader->shared = &participant->inner_batch_reader;
+		batch_reader->file = hashtable->innerBatchFile[batchno];
+	}
+	else
+	{
+		batch_reader->shared = &participant->outer_batch_reader;
+		batch_reader->file = hashtable->outerBatchFile[batchno];
+	}
+}
+
+/*
+ * Reset the shared read heads on all shared batch file readers.  Must
+ * be called only in one backend.
+ */
+void
+ExecHashJoinResetBatchReaders(HashJoinTable hashtable)
+{
+	int i;
+
+	for (i = 0; i < hashtable->shared->planned_participants; ++i)
+	{
+		hashtable->shared->participants[i].inner_batch_reader.head.fileno = 0;
+		hashtable->shared->participants[i].inner_batch_reader.head.offset = 0;
+		hashtable->shared->participants[i].outer_batch_reader.head.fileno = 0;
+		hashtable->shared->participants[i].outer_batch_reader.head.offset = 0;
+	}
+}
+
+/*
+ * ExecHashJoinGetSavedTupleShared
+ *		read the next tuple from a batch file, including the batch files of
+ * 		other participants.  Return NULL if no more.
+ *
+ * On success, *hashvalue is set to the tuple's hash value, and the tuple
+ * itself is stored in the given slot.
+ */
+static TupleTableSlot *
+ExecHashJoinGetSavedTupleShared(HashJoinTable hashtable,
+								bool inner,
+								uint32 batchno,
+								uint32 *hashvalue,
+								TupleTableSlot *tupleSlot)
+{
+	TupleTableSlot *result = NULL;
+	HashJoinBatchReader *batch_reader = &hashtable->batch_reader;
+	BufFileDescriptor *descriptor;
+
+	Assert(HashJoinTableIsShared(hashtable));
+
+	for (;;)
+	{
+		uint32		header[2];
+		size_t		nread;
+		MinimalTuple tuple;
+
+		if (hashtable->batch_reader.file == NULL)
+		{
+			/*
+			 * No file found for the current participant.  Try stealing tuples
+			 * from the next participant.
+			 */
+			goto next_participant;
+		}
+
+		LWLockAcquire(&batch_reader->shared->lock, LW_EXCLUSIVE);
+		if (batch_reader->shared->error)
+		{
+			/* Don't try to read if reading failed in some other backend. */
+			ereport(ERROR,
+					(errcode_for_file_access(),
+				 errmsg("could not read from hash-join temporary file")));
+		}
+
+		/* Set the shared error flag, which we'll clear if we succeed. */
+		batch_reader->shared->error = true;
+
+		/*
+		 * If another worker has moved the shared read head since we last read,
+		 * we'll need to seek to the new shared position.
+		 */
+		if (batch_reader->head.fileno != batch_reader->shared->head.fileno ||
+			batch_reader->head.offset != batch_reader->shared->head.offset)
+		{
+			BufFileSeek(batch_reader->file,
+						batch_reader->shared->head.fileno,
+						batch_reader->shared->head.offset,
+						SEEK_SET);
+			batch_reader->head = batch_reader->shared->head;
+		}
+
+		/* Try to read the size and hash. */
+		nread = BufFileRead(batch_reader->file, (void *) header, sizeof(header));
+		if (nread > 0)
+		{
+			if (nread != sizeof(header))
+			{
+				ereport(ERROR,
+					(errcode_for_file_access(),
+				 errmsg("could not read from hash-join temporary file: %m")));
+			}
+			*hashvalue = header[0];
+			tuple = (MinimalTuple) palloc(header[1]);
+			tuple->t_len = header[1];
+			nread = BufFileRead(batch_reader->file,
+								(void *) ((char *) tuple + sizeof(uint32)),
+								header[1] - sizeof(uint32));
+			if (nread != header[1] - sizeof(uint32))
+			{
+				ereport(ERROR,
+						(errcode_for_file_access(),
+				 errmsg("could not read from hash-join temporary file: %m")));
+			}
+
+			result = ExecStoreMinimalTuple(tuple, tupleSlot, true);
+
+		}
+		/* Commit to shared memory. */
+		BufFileTell(batch_reader->file,
+					&batch_reader->head.fileno,
+					&batch_reader->head.offset);
+		batch_reader->shared->head = batch_reader->head;
+		batch_reader->shared->error = false;
+		LWLockRelease(&batch_reader->shared->lock);
+
+		if (result != NULL)
+			return result;
+
+next_participant:
+		/* Try the next participant's batch file. */
+		batch_reader->participant_number =
+			(batch_reader->participant_number + 1) %
+				hashtable->shared->planned_participants;
+		if (batch_reader->participant_number == HashJoinParticipantNumber())
+		{
+			/*
+			 * We've made it all the way back to the file we started with,
+			 * which is the one that this backend wrote.  So there are no more
+			 * tuples to be had in any participant's batch file.
+			 */
+			ExecClearTuple(tupleSlot);
+			return NULL;
+		}
+
+		/* Import the BufFile from that participant, if it exported one. */
+		descriptor = get_batch_descriptor(hashtable,
+										  batch_reader->participant_number,
+										  batchno,
+										  inner);
+		if (descriptor == NULL)
+			batch_reader->file = NULL;
+		else
+			batch_reader->file = BufFileImport(descriptor);
+		batch_reader->shared = inner ? &hashtable->shared->participants[batch_reader->participant_number].inner_batch_reader
+									 : &hashtable->shared->participants[batch_reader->participant_number].outer_batch_reader;
+		batch_reader->head.fileno = batch_reader->head.offset = 0;
+	}
+}
 
 void
 ExecReScanHashJoin(HashJoinState *node)
 {
+	HashState *hashNode = (HashState *) innerPlanState(node);
+
+	/* We can't use HashJoinTableIsShared if the table is NULL. */
+	if (hashNode->shared_table_data != NULL)
+	{
+		elog(ERROR, "TODO: ExecReScanHashJoin not working yet");
+
+		/* Coordinate a rewind to the shared hash table creation phase. */
+		BarrierWaitSet(&hashNode->shared_table_data->barrier, PHJ_PHASE_INIT,
+					   WAIT_EVENT_HASHJOIN_REWINDING);
+	}
+
 	/*
 	 * In a multi-batch join, we currently have to do rescans the hard way,
 	 * primarily because batch temp files may have already been released. But
@@ -977,6 +1508,15 @@ ExecReScanHashJoin(HashJoinState *node)
 
 			/* ExecHashJoin can skip the BUILD_HASHTABLE step */
 			node->hj_JoinState = HJ_NEED_NEW_OUTER;
+
+			if (HashJoinTableIsShared(node->hj_HashTable))
+			{
+								/* Coordinate a rewind to the shared probing phase. */
+				if (BarrierWaitSet(&hashNode->shared_table_data->barrier,
+								   PHJ_PHASE_PROBING,
+								   WAIT_EVENT_HASHJOIN_REWINDING2))
+					ExecHashJoinResetBatchReaders(node->hj_HashTable);
+			}
 		}
 		else
 		{
@@ -985,6 +1525,14 @@ ExecReScanHashJoin(HashJoinState *node)
 			node->hj_HashTable = NULL;
 			node->hj_JoinState = HJ_BUILD_HASHTABLE;
 
+			if (HashJoinTableIsShared(node->hj_HashTable))
+			{
+				/* Coordinate a rewind to the shared hash table creation phase. */
+				BarrierWaitSet(&hashNode->shared_table_data->barrier,
+							   PHJ_PHASE_INIT,
+							   WAIT_EVENT_HASHJOIN_REWINDING3);
+			}
+
 			/*
 			 * if chgParam of subnode is not null then plan will be re-scanned
 			 * by first ExecProcNode.
@@ -1011,3 +1559,76 @@ ExecReScanHashJoin(HashJoinState *node)
 	if (node->js.ps.lefttree->chgParam == NULL)
 		ExecReScan(node->js.ps.lefttree);
 }
+
+void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt)
+{
+	size_t size;
+
+	size = offsetof(SharedHashJoinTableData, participants) +
+		sizeof(HashJoinParticipantState) * (pcxt->nworkers + 1);
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+void
+ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
+{
+	HashState *hashNode;
+	SharedHashJoinTable shared;
+	size_t size;
+	int planned_participants;
+
+	/*
+	 * Set up the state needed to coordinate access to the shared hash table,
+	 * using the plan node ID as the toc key.
+	 */
+	planned_participants = pcxt->nworkers + 1;	/* possible workers + leader */
+	size = offsetof(SharedHashJoinTableData, participants) +
+		sizeof(HashJoinParticipantState) * planned_participants;
+	shared = shm_toc_allocate(pcxt->toc, size);
+	BarrierInit(&shared->barrier, 0);
+	shared->primary_buckets = InvalidDsaPointer;
+	shared->secondary_buckets = InvalidDsaPointer;
+	pg_atomic_init_u32(&shared->next_unmatched_bucket, 0);
+	pg_atomic_init_u64(&shared->total_primary_tuples, 0);
+	pg_atomic_init_u64(&shared->total_secondary_tuples, 0);
+	dsa_pointer_atomic_init(&shared->head_primary_chunk, InvalidDsaPointer);
+	dsa_pointer_atomic_init(&shared->head_secondary_chunk, InvalidDsaPointer);
+	dsa_pointer_atomic_init(&shared->chunks_to_rebucket, InvalidDsaPointer);
+	shared->planned_participants = planned_participants;
+	shm_toc_insert(pcxt->toc, state->js.ps.plan->plan_node_id, shared);
+
+	/*
+	 * Pass the SharedHashJoinTable to the hash node.  If the Gather node
+	 * running in the leader backend decides to execute the hash join, it
+	 * hasn't called ExecHashJoinInitializeWorker so it doesn't have
+	 * state->shared_table_data set up.  So we must do it here.
+	 */
+	hashNode = (HashState *) innerPlanState(state);
+	hashNode->shared_table_data = shared;
+}
+
+void
+ExecHashJoinInitializeWorker(HashJoinState *state, shm_toc *toc)
+{
+	HashState  *hashNode;
+
+	state->hj_sharedHashJoinTable =
+		shm_toc_lookup(toc, state->js.ps.plan->plan_node_id);
+
+	/*
+	 * Inject SharedHashJoinTable into the hash node.  It could instead have
+	 * its own ExecHashInitializeWorker function, but we only want to set its
+	 * 'parallel_aware' flag if we want to tell it to actually build the hash
+	 * table in parallel.  Since its parallel_aware flag also controls whether
+	 * its 'InitializeWorker' function gets called, and it also needs access
+	 * to this object for serial shared hash mode, we'll pass it on here
+	 * instead of depending on that.
+	 */
+	hashNode = (HashState *) innerPlanState(state);
+	hashNode->shared_table_data = state->hj_sharedHashJoinTable;
+	Assert(hashNode->shared_table_data != NULL);
+
+	Assert(HashJoinParticipantNumber() <
+		   hashNode->shared_table_data->planned_participants);
+}
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 00bf3a5..361eb5d 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,6 +31,8 @@
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
 
+#include <unistd.h>
+
 static void InitScanRelation(SeqScanState *node, EState *estate, int eflags);
 static TupleTableSlot *SeqNext(SeqScanState *node);
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ae86954..ca215dd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1993,6 +1993,7 @@ _outHashPath(StringInfo str, const HashPath *node)
 
 	WRITE_NODE_FIELD(path_hashclauses);
 	WRITE_INT_FIELD(num_batches);
+	WRITE_ENUM_FIELD(table_type, HashPathTableType);
 }
 
 static void
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2a49639..79c7650 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -104,6 +104,7 @@
 double		seq_page_cost = DEFAULT_SEQ_PAGE_COST;
 double		random_page_cost = DEFAULT_RANDOM_PAGE_COST;
 double		cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
+double		cpu_shared_tuple_cost = DEFAULT_CPU_SHARED_TUPLE_COST;
 double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double		parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
@@ -2694,7 +2695,8 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 					  List *hashclauses,
 					  Path *outer_path, Path *inner_path,
 					  SpecialJoinInfo *sjinfo,
-					  SemiAntiJoinFactors *semifactors)
+					  SemiAntiJoinFactors *semifactors,
+					  HashPathTableType table_type)
 {
 	Cost		startup_cost = 0;
 	Cost		run_cost = 0;
@@ -2725,6 +2727,26 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;
 
 	/*
+	 * If this is a shared hash table, there is a extra charge for inserting
+	 * each tuple into the shared hash table, to cover the overhead of memory
+	 * synchronization that makes the hash table slightly slower to build than
+	 * a private hash table.  There is no extra charge for probing the hash
+	 * table for outer path row, on the basis that read-only access to the
+	 * hash table shouldn't generate any extra memory synchronization.
+	 *
+	 * TODO: Really what we want is some guess at the number of cache sync
+	 * overhead generated by inserting into cachelines that have been
+	 * invalidated by someone else inserting into a bucket in the same
+	 * cacheline.  Not sure if it's better to introduce a
+	 * cpu_cacheline_sync_cost (or _miss_cost?) and then here estimate the
+	 * number of collisions we expect based by num buckets, cacheline size,
+	 * num workers.  But that might be too detailed/low level/variable
+	 * heavy/bogus.
+	 */
+	if (table_type != HASHPATH_TABLE_PRIVATE)
+		startup_cost += cpu_shared_tuple_cost * inner_path_rows;
+
+	/*
 	 * Get hash table size that executor would use for inner relation.
 	 *
 	 * XXX for the moment, always assume that skew optimization will be
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index cc7384f..87c4cef 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -483,7 +483,8 @@ try_hashjoin_path(PlannerInfo *root,
 				  Path *inner_path,
 				  List *hashclauses,
 				  JoinType jointype,
-				  JoinPathExtraData *extra)
+				  JoinPathExtraData *extra,
+				  HashPathTableType table_type)
 {
 	Relids		required_outer;
 	JoinCostWorkspace workspace;
@@ -508,7 +509,7 @@ try_hashjoin_path(PlannerInfo *root,
 	 */
 	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
 						  outer_path, inner_path,
-						  extra->sjinfo, &extra->semifactors);
+						  extra->sjinfo, &extra->semifactors, table_type);
 
 	if (add_path_precheck(joinrel,
 						  workspace.startup_cost, workspace.total_cost,
@@ -525,7 +526,8 @@ try_hashjoin_path(PlannerInfo *root,
 									  inner_path,
 									  extra->restrictlist,
 									  required_outer,
-									  hashclauses));
+									  hashclauses,
+									  table_type));
 	}
 	else
 	{
@@ -546,7 +548,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 						  Path *inner_path,
 						  List *hashclauses,
 						  JoinType jointype,
-						  JoinPathExtraData *extra)
+						  JoinPathExtraData *extra,
+						  HashPathTableType table_type)
 {
 	JoinCostWorkspace workspace;
 
@@ -571,7 +574,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 	 */
 	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
 						  outer_path, inner_path,
-						  extra->sjinfo, &extra->semifactors);
+						  extra->sjinfo, &extra->semifactors,
+						  table_type);
 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
 		return;
 
@@ -587,7 +591,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 										  inner_path,
 										  extra->restrictlist,
 										  NULL,
-										  hashclauses));
+										  hashclauses,
+										  table_type));
 }
 
 /*
@@ -1356,7 +1361,8 @@ hash_inner_and_outer(PlannerInfo *root,
 							  cheapest_total_inner,
 							  hashclauses,
 							  jointype,
-							  extra);
+							  extra,
+							  HASHPATH_TABLE_PRIVATE);
 			/* no possibility of cheap startup here */
 		}
 		else if (jointype == JOIN_UNIQUE_INNER)
@@ -1372,7 +1378,8 @@ hash_inner_and_outer(PlannerInfo *root,
 							  cheapest_total_inner,
 							  hashclauses,
 							  jointype,
-							  extra);
+							  extra,
+							  HASHPATH_TABLE_PRIVATE);
 			if (cheapest_startup_outer != NULL &&
 				cheapest_startup_outer != cheapest_total_outer)
 				try_hashjoin_path(root,
@@ -1381,7 +1388,8 @@ hash_inner_and_outer(PlannerInfo *root,
 								  cheapest_total_inner,
 								  hashclauses,
 								  jointype,
-								  extra);
+								  extra,
+								  HASHPATH_TABLE_PRIVATE);
 		}
 		else
 		{
@@ -1402,7 +1410,8 @@ hash_inner_and_outer(PlannerInfo *root,
 								  cheapest_total_inner,
 								  hashclauses,
 								  jointype,
-								  extra);
+								  extra,
+								  HASHPATH_TABLE_PRIVATE);
 
 			foreach(lc1, outerrel->cheapest_parameterized_paths)
 			{
@@ -1436,7 +1445,8 @@ hash_inner_and_outer(PlannerInfo *root,
 									  innerpath,
 									  hashclauses,
 									  jointype,
-									  extra);
+									  extra,
+									  HASHPATH_TABLE_PRIVATE);
 				}
 			}
 		}
@@ -1445,23 +1455,32 @@ hash_inner_and_outer(PlannerInfo *root,
 		 * If the joinrel is parallel-safe, we may be able to consider a
 		 * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
 		 * because the outer path will be partial, and therefore we won't be
-		 * able to properly guarantee uniqueness.  Similarly, we can't handle
-		 * JOIN_FULL and JOIN_RIGHT, because they can produce false null
-		 * extended rows.  Also, the resulting path must not be parameterized.
+		 * able to properly guarantee uniqueness.  Also, the resulting path
+		 * must not be parameterized.
 		 */
 		if (joinrel->consider_parallel &&
 			jointype != JOIN_UNIQUE_OUTER &&
-			jointype != JOIN_FULL &&
-			jointype != JOIN_RIGHT &&
 			outerrel->partial_pathlist != NIL &&
 			bms_is_empty(joinrel->lateral_relids))
 		{
 			Path	   *cheapest_partial_outer;
+			Path	   *cheapest_partial_inner = NULL;
 			Path	   *cheapest_safe_inner = NULL;
 
 			cheapest_partial_outer =
 				(Path *) linitial(outerrel->partial_pathlist);
 
+			/* Can we use a partial inner plan too? */
+			if (innerrel->partial_pathlist != NIL)
+				cheapest_partial_inner =
+					(Path *) linitial(innerrel->partial_pathlist);
+			if (cheapest_partial_inner != NULL)
+				try_partial_hashjoin_path(root, joinrel,
+										  cheapest_partial_outer,
+										  cheapest_partial_inner,
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_SHARED_PARALLEL);
+
 			/*
 			 * Normally, given that the joinrel is parallel-safe, the cheapest
 			 * total inner path will also be parallel-safe, but if not, we'll
@@ -1488,10 +1507,20 @@ hash_inner_and_outer(PlannerInfo *root,
 			}
 
 			if (cheapest_safe_inner != NULL)
+			{
+				/* Try a shared table with only one worker building the table. */
 				try_partial_hashjoin_path(root, joinrel,
 										  cheapest_partial_outer,
 										  cheapest_safe_inner,
-										  hashclauses, jointype, extra);
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_SHARED_SERIAL);
+				/* Also private hash tables, built by each worker. */
+				try_partial_hashjoin_path(root, joinrel,
+										  cheapest_partial_outer,
+										  cheapest_safe_inner,
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_PRIVATE);
+			}
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ad49674..4954c4c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3938,6 +3938,23 @@ create_hashjoin_plan(PlannerInfo *root,
 	copy_plan_costsize(&hash_plan->plan, inner_plan);
 	hash_plan->plan.startup_cost = hash_plan->plan.total_cost;
 
+	/*
+	 * Set the table as sharable if appropriate, with parallel or serial
+	 * building.
+	 */
+	switch (best_path->table_type)
+	{
+	case HASHPATH_TABLE_SHARED_PARALLEL:
+		hash_plan->shared_table = true;
+		hash_plan->plan.parallel_aware = true;
+		break;
+	case HASHPATH_TABLE_SHARED_SERIAL:
+		hash_plan->shared_table = true;
+		break;
+	case HASHPATH_TABLE_PRIVATE:
+		break;
+	}
+
 	join_plan = make_hashjoin(tlist,
 							  joinclauses,
 							  otherclauses,
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index abb7507..68cabe6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2096,6 +2096,7 @@ create_mergejoin_path(PlannerInfo *root,
  * 'required_outer' is the set of required outer rels
  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
  *		(this should be a subset of the restrict_clauses list)
+ * 'table_type' for level of hash table sharing
  */
 HashPath *
 create_hashjoin_path(PlannerInfo *root,
@@ -2108,7 +2109,8 @@ create_hashjoin_path(PlannerInfo *root,
 					 Path *inner_path,
 					 List *restrict_clauses,
 					 Relids required_outer,
-					 List *hashclauses)
+					 List *hashclauses,
+					 HashPathTableType table_type)
 {
 	HashPath   *pathnode = makeNode(HashPath);
 
@@ -2123,9 +2125,13 @@ create_hashjoin_path(PlannerInfo *root,
 								  sjinfo,
 								  required_outer,
 								  &restrict_clauses);
-	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_aware =
+		joinrel->consider_parallel &&
+		(table_type == HASHPATH_TABLE_SHARED_SERIAL ||
+		 table_type == HASHPATH_TABLE_SHARED_PARALLEL);
 	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
 		outer_path->parallel_safe && inner_path->parallel_safe;
+	pathnode->table_type = table_type;
 	/* This is a foolish way to estimate parallel_workers, but for now... */
 	pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a392197..00619e4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3393,6 +3393,54 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_HASH_CREATING:
+			event_name = "Hash/Creating";
+			break;
+		case WAIT_EVENT_HASH_HASHING:
+			event_name = "Hash/Hashing";
+			break;
+		case WAIT_EVENT_HASH_RESIZING:
+			event_name = "Hash/Resizing";
+			break;
+		case WAIT_EVENT_HASH_REBUCKETING:
+			event_name = "Hash/Rebucketing";
+			break;
+		case WAIT_EVENT_HASH_INIT:
+			event_name = "Hash/Init";
+			break;
+		case WAIT_EVENT_HASH_DESTROY:
+			event_name = "Hash/Destroy";
+			break;
+		case WAIT_EVENT_HASH_UNMATCHED:
+			event_name = "Hash/Unmatched";
+			break;
+		case WAIT_EVENT_HASH_PROMOTING:
+			event_name = "Hash/Promoting";
+			break;
+		case WAIT_EVENT_HASHJOIN_PROMOTING:
+			event_name = "HashJoin/Promoting";
+			break;
+		case WAIT_EVENT_HASHJOIN_PROBING:
+			event_name = "HashJoin/Probing";
+			break;
+		case WAIT_EVENT_HASHJOIN_SKIP_LOADING:
+			event_name = "HashJoin/SkipLoading";
+			break;
+		case WAIT_EVENT_HASHJOIN_SKIP_PROBING:
+			event_name = "HashJoin/SkipProbing";;
+			break;
+		case WAIT_EVENT_HASHJOIN_LOADING:
+			event_name = "HashJoin/Loading";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING:
+			event_name = "HashJoin/Rewinding";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING2:
+			event_name = "HashJoin/Rewinding2";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING3:
+			event_name = "HashJoin/Rewinding3";;
+			break;
 		/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 042be79..b38cbd8 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -42,6 +42,8 @@
 #include "storage/buf_internals.h"
 #include "utils/resowner.h"
 
+extern int ParallelWorkerNumber;
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large temporary BufFiles to be spread across
@@ -89,6 +91,24 @@ struct BufFile
 	char		buffer[BLCKSZ];
 };
 
+/*
+ * Serialized representation of a single file managed by a BufFile.
+ */
+typedef struct BufFileFileDescriptor
+{
+	char path[MAXPGPATH];
+} BufFileFileDescriptor;
+
+/*
+ * Serialized representation of a BufFile, to be created by BufFileExport and
+ * consumed by BufFileImport.
+ */
+struct BufFileDescriptor
+{
+	size_t num_files;
+	BufFileFileDescriptor files[FLEXIBLE_ARRAY_MEMBER];
+};
+
 static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
@@ -178,6 +198,83 @@ BufFileCreateTemp(bool interXact)
 	return file;
 }
 
+/*
+ * Export a BufFile description in a serialized form so that another backend
+ * can attach to it and read from it.  The format is opaque, but it may be
+ * bitwise copied, and its size may be obtained with BufFileDescriptorSize().
+ */
+BufFileDescriptor *
+BufFileExport(BufFile *file)
+{
+	BufFileDescriptor *descriptor;
+	int i;
+
+	/* Flush output from local buffers. */
+	BufFileFlush(file);
+
+	/*
+	 * TODO: FIXME: disable cleanup until I can figure out a decent cleanup
+	 * strategy!
+	 */
+	file->isInterXact = true;
+
+	/* Create and fill in a descriptor. */
+	descriptor = palloc0(offsetof(BufFileDescriptor, files) +
+						 sizeof(BufFileFileDescriptor) * file->numFiles);
+	descriptor->num_files = file->numFiles;
+	for (i = 0; i < descriptor->num_files; ++i)
+		strcpy(descriptor->files[i].path, FilePathName(file->files[i]));
+
+	return descriptor;
+}
+
+/*
+ * Return the size in bytes of a BufFileDescriptor, so that it can be copied.
+ */
+size_t
+BufFileDescriptorSize(const BufFileDescriptor *descriptor)
+{
+	return offsetof(BufFileDescriptor, files) +
+		sizeof(BufFileFileDescriptor) * descriptor->num_files;
+}
+
+/*
+ * Open a BufFile that was created by another backend and then exported.  The
+ * file must be read-only in all backends, and is still owned by the backend
+ * that created it.  This provides a way for cooperating backends to share
+ * immutable temporary data such as hash join batches.
+ */
+BufFile *
+BufFileImport(BufFileDescriptor *descriptor)
+{
+	BufFile    *file = (BufFile *) palloc(sizeof(BufFile));
+	int i;
+
+	file->numFiles = descriptor->num_files;
+	file->files = (File *) palloc0(sizeof(File) * descriptor->num_files);
+	file->offsets = (off_t *) palloc0(sizeof(off_t) * descriptor->num_files);
+	file->isTemp = false;
+	file->isInterXact = true; /* prevent cleanup by this backend */
+	file->dirty = false;
+	file->resowner = CurrentResourceOwner;
+	file->curFile = 0;
+	file->curOffset = 0L;
+	file->pos = 0;
+	file->nbytes = 0;
+
+	for (i = 0; i < descriptor->num_files; ++i)
+	{
+		file->files[i] =
+			PathNameOpenFile(descriptor->files[i].path,
+							 O_RDONLY | PG_BINARY, 0600);
+		if (file->files[i] <= 0)
+			elog(ERROR, "failed to import \"%s\": %m",
+				 descriptor->files[i].path);
+	}
+
+	return file;
+}
+
 #ifdef NOT_USED
 /*
  * Create a BufFile and attach it to an already-opened virtual File.
diff --git a/src/backend/storage/ipc/barrier.c b/src/backend/storage/ipc/barrier.c
index 8b83c1d..5a45103 100644
--- a/src/backend/storage/ipc/barrier.c
+++ b/src/backend/storage/ipc/barrier.c
@@ -16,6 +16,7 @@
 
 #include "storage/barrier.h"
 
+
 /*
  * Initialize this barrier, setting a static number of participants that we
  * will wait for at each computation phase.  To use a dynamic number of
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2d3cf9e..9becab0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -749,6 +749,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 		}
 
 		/* Values only available to role member */
+		elog(LOG, "XXX pid %d -> %d", beentry->st_procpid, has_privs_of_role(GetUserId(), beentry->st_userid));
 		if (has_privs_of_role(GetUserId(), beentry->st_userid))
 		{
 			SockAddr	zero_clientaddr;
@@ -788,7 +789,6 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				raw_wait_event = UINT32_ACCESS_ONCE(proc->wait_event_info);
 				wait_event_type = pgstat_get_wait_event_type(raw_wait_event);
 				wait_event = pgstat_get_wait_event(raw_wait_event);
-
 			}
 			else
 			{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 65660c1..9b49918 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2857,6 +2857,16 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 	{
+		{"cpu_shared_tuple_cost", PGC_USERSET, QUERY_TUNING_COST,
+			gettext_noop("Sets the planner's estimate of the cost of "
+						 "sharing each tuple with other parallel workers."),
+			NULL
+		},
+		&cpu_shared_tuple_cost,
+		DEFAULT_CPU_TUPLE_COST, -DBL_MAX, DBL_MAX,
+		NULL, NULL, NULL
+	},
+	{
 		{"cpu_index_tuple_cost", PGC_USERSET, QUERY_TUNING_COST,
 			gettext_noop("Sets the planner's estimate of the cost of "
 						 "processing each index entry during an index scan."),
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 6d0e12b..715d420 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -15,7 +15,13 @@
 #define HASHJOIN_H
 
 #include "nodes/execnodes.h"
+#include "port/atomics.h"
+#include "storage/barrier.h"
 #include "storage/buffile.h"
+#include "storage/dsa.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/spin.h"
 
 /* ----------------------------------------------------------------
  *				hash-join hash table structures
@@ -63,7 +69,12 @@
 
 typedef struct HashJoinTupleData
 {
-	struct HashJoinTupleData *next;		/* link to next tuple in same bucket */
+	/* link to next tuple in same bucket */
+	union
+	{
+		dsa_pointer shared;
+		struct HashJoinTupleData *private;
+	} next;
 	uint32		hashvalue;		/* tuple's hash code */
 	/* Tuple data, in MinimalTuple format, follows on a MAXALIGN boundary */
 }	HashJoinTupleData;
@@ -94,7 +105,12 @@ typedef struct HashJoinTupleData
 typedef struct HashSkewBucket
 {
 	uint32		hashvalue;		/* common hash value */
-	HashJoinTuple tuples;		/* linked list of inner-relation tuples */
+	/* linked list of inner-relation tuples */
+	union
+	{
+		dsa_pointer shared;
+		HashJoinTuple private;
+	} tuples;
 } HashSkewBucket;
 
 #define SKEW_BUCKET_OVERHEAD  MAXALIGN(sizeof(HashSkewBucket))
@@ -103,8 +119,9 @@ typedef struct HashSkewBucket
 #define SKEW_MIN_OUTER_FRACTION  0.01
 
 /*
- * To reduce palloc overhead, the HashJoinTuples for the current batch are
- * packed in 32kB buffers instead of pallocing each tuple individually.
+ * To reduce palloc/dsa_allocate overhead, the HashJoinTuples for the current
+ * batch are packed in 32kB buffers instead of pallocing each tuple
+ * individually.
  */
 typedef struct HashMemoryChunkData
 {
@@ -112,17 +129,118 @@ typedef struct HashMemoryChunkData
 	size_t		maxlen;			/* size of the buffer holding the tuples */
 	size_t		used;			/* number of buffer bytes already used */
 
-	struct HashMemoryChunkData *next;	/* pointer to the next chunk (linked
-										 * list) */
+	/* pointer to the next chunk (linked  list) */
+	union
+	{
+		dsa_pointer shared;
+		struct HashMemoryChunkData *private;
+	} next;
 
 	char		data[FLEXIBLE_ARRAY_MEMBER];	/* buffer allocated at the end */
 }	HashMemoryChunkData;
 
 typedef struct HashMemoryChunkData *HashMemoryChunk;
 
+
+
 #define HASH_CHUNK_SIZE			(32 * 1024L)
 #define HASH_CHUNK_THRESHOLD	(HASH_CHUNK_SIZE / 4)
 
+/*
+ * Read head position in a shared batch file.
+ */
+typedef struct HashJoinBatchPosition
+{
+	int fileno;
+	off_t offset;
+} HashJoinBatchPosition;
+
+/*
+ * The state exposed in shared memory for each participant to coordinate
+ * reading of batch files that it wrote.
+ */
+typedef struct HashJoinSharedBatchReader
+{
+	int batchno;				/* the batch number we are currently reading */
+
+	LWLock lock;				/* protects access to the members below */
+	bool error;					/* has an IO error occurred? */
+	HashJoinBatchPosition head;	/* shared read head for current batch */
+} HashJoinSharedBatchReader;
+
+/*
+ * The state exposed in shared memory by each participant allowing its batch
+ * files to be read by other participants.
+ */
+typedef struct HashJoinParticipantState
+{
+	/*
+	 * Arrays of pointers to arrays of pointers to BufFileDesciptor objects
+	 * exported by this participant.  The descriptor for batch i is in slot
+	 * i % (1 << fls(i - 1)) of the array at index fls(i).
+	 *
+	 * This arrangement means that we can modify future batches without
+	 * moving/reallocating the current batch.  The current batch is therefore
+	 * immutable and accessible by other backends which need to read it.
+	 */
+	dsa_pointer inner_batch_descriptors[32];	/* number of bits in batchno */
+	dsa_pointer outer_batch_descriptors[32];
+
+	/*
+	 * The shared state used to coordinate reading from the current batch.  We
+	 * need separate objects for the outer and inner side, because in the
+	 * probing phase some participants can be reading from the outer batch,
+	 * while others can be reading from the inner side to preload the next
+	 * batch.
+	 */
+	HashJoinSharedBatchReader inner_batch_reader;
+	HashJoinSharedBatchReader outer_batch_reader;
+} HashJoinParticipantState;
+
+/*
+ * The state used by each backend to manage reading from batch files written
+ * by all participants.
+ */
+typedef struct HashJoinBatchReader
+{
+	int participant_number;				/* read which participant's batch? */
+	HashJoinSharedBatchReader *shared;	/* holder of the shared read head */
+	BufFile *file;						/* the file opened in this backend */
+	HashJoinBatchPosition head;			/* local read head position */
+} HashJoinBatchReader;
+
+/*
+ * State for a shared hash join table.  Each backend participating in a hash
+ * join with a shared hash table also has a HashJoinTableData object in
+ * backend-private memory, which points to this shared state in the DSM
+ * segment.
+ */
+typedef struct SharedHashJoinTableData
+{
+	Barrier barrier;				/* for synchronizing workers */
+	dsa_pointer primary_buckets;	/* primary hash table */
+	dsa_pointer secondary_buckets;	/* hash table for preloading next batch */
+	bool at_least_one_worker;		/* did at least one worker join in time? */
+	int nbuckets;
+	int nbuckets_optimal;
+	pg_atomic_uint32 next_unmatched_bucket;
+	pg_atomic_uint64 total_primary_tuples;
+	pg_atomic_uint64 total_secondary_tuples;
+	dsa_pointer_atomic head_primary_chunk;
+	dsa_pointer_atomic head_secondary_chunk;
+	dsa_pointer_atomic chunks_to_rebucket;
+	int planned_participants;		/* number of planned workers + leader */
+
+	/* state exposed by each participant for sharing batches */
+	HashJoinParticipantState participants[FLEXIBLE_ARRAY_MEMBER];
+} SharedHashJoinTableData;
+
+typedef union HashJoinBucketHead
+{
+	dsa_pointer_atomic shared;
+	HashJoinTuple private;
+} HashJoinBucketHead;
+
 typedef struct HashJoinTableData
 {
 	int			nbuckets;		/* # buckets in the in-memory hash table */
@@ -134,9 +252,11 @@ typedef struct HashJoinTableData
 	int			log2_nbuckets_optimal;	/* log2(nbuckets_optimal) */
 
 	/* buckets[i] is head of list of tuples in i'th in-memory bucket */
-	struct HashJoinTupleData **buckets;
+	HashJoinBucketHead *buckets;
 	/* buckets array is per-batch storage, as are all the tuples */
 
+	HashJoinBucketHead *next_buckets;	/* for preloading next batch */
+
 	bool		keepNulls;		/* true to store unmatchable NULL tuples */
 
 	bool		skewEnabled;	/* are we using skew optimization? */
@@ -185,7 +305,71 @@ typedef struct HashJoinTableData
 	MemoryContext batchCxt;		/* context for this-batch-only storage */
 
 	/* used for dense allocation of tuples (into linked chunks) */
-	HashMemoryChunk chunks;		/* one list for the whole batch */
+	HashMemoryChunk primary_chunk;		/* current chunk for this batch */
+	HashMemoryChunk secondary_chunk;	/* current chunk for next batch */
+	HashMemoryChunk chunks_to_rebucket;	/* after resizing table */
+	dsa_pointer primary_chunk_shared;	/* DSA pointer to primary_chunk */
+	dsa_pointer secondary_chunk_shared;	/* DSA pointer to secondary_chunk */
+
+	/* State for coordinating shared tables for parallel hash joins. */
+	dsa_area *area;
+	SharedHashJoinTableData *shared;	/* the shared state */
+	int attached_at_phase;				/* the phase this participant joined */
+	bool detached_early;				/* did we decide to detach early? */
+	HashJoinBatchReader batch_reader;	/* state for reading batches in */
 }	HashJoinTableData;
 
+/* Check if a HashJoinTable is shared by parallel workers. */
+#define HashJoinTableIsShared(table) ((table)->shared != NULL)
+
+/* The phases of parallel hash computation. */
+#define PHJ_PHASE_INIT					0
+#define PHJ_PHASE_CREATING				1
+#define PHJ_PHASE_HASHING				2
+#define PHJ_PHASE_RESIZING  			3
+#define PHJ_PHASE_REBUCKETING 			4
+#define PHJ_PHASE_PROBING				5	/* PHJ_PHASE_PROBING_BATCH(0) */
+#define PHJ_PHASE_UNMATCHED				6	/* PHJ_PHASE_UNMATCHED_BATCH(0) */
+
+/* The subphases for batches. */
+#define PHJ_SUBPHASE_PROMOTING			0
+#define PHJ_SUBPHASE_LOADING			1
+#define PHJ_SUBPHASE_PROBING			2
+#define PHJ_SUBPHASE_UNMATCHED			3
+
+/* The phases of parallel processing for batch(n). */
+#define PHJ_PHASE_PROMOTING_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 4 - 3)
+#define PHJ_PHASE_LOADING_BATCH(n)		(PHJ_PHASE_UNMATCHED + (n) * 4 - 2)
+#define PHJ_PHASE_PROBING_BATCH(n)		(PHJ_PHASE_UNMATCHED + (n) * 4 - 1)
+#define PHJ_PHASE_UNMATCHED_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 4 - 0)
+
+/* Phase number -> sub-phase within a batch. */
+#define PHJ_PHASE_TO_SUBPHASE(p)										\
+	(((int)(p) - PHJ_PHASE_UNMATCHED + PHJ_SUBPHASE_UNMATCHED) % 4)
+
+/* Phase number -> batch number. */
+#define PHJ_PHASE_TO_BATCHNO(p)											\
+	(((int)(p) - PHJ_PHASE_UNMATCHED + PHJ_SUBPHASE_UNMATCHED) / 4)
+
+/*
+ * Is a given phase one in which a new hash table array is being assigned by
+ * one elected backend?  That includes initial creation, reallocation during
+ * resize, and promotion of secondary hash table to primary.  Workers that
+ * show up and attach at an arbitrary time must wait such phases out before
+ * doing anything with the hash table.
+ */
+#define PHJ_PHASE_MUTATING_TABLE(p)									\
+	((p) == PHJ_PHASE_CREATING ||									\
+	 (p) == PHJ_PHASE_RESIZING ||									\
+	 (PHJ_PHASE_TO_BATCHNO(p) > 0 &&								\
+	  PHJ_PHASE_TO_SUBPHASE(p) == PHJ_SUBPHASE_PROMOTING))
+
+/*
+ * Return the 'participant number' for a process participating in a parallel
+ * hash join.  We give a number < hashtable->shared->planned_participants
+ * to each potential participant, including the leader.
+ */
+#define HashJoinParticipantNumber() \
+	(IsParallelWorker() ? ParallelWorkerNumber + 1 : 0)
+
 #endif   /* HASHJOIN_H */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index 8cf6d15..d208981 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -22,12 +22,12 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(Hash *node, List *hashOperators,
+extern HashJoinTable ExecHashTableCreate(HashState *node, List *hashOperators,
 					bool keepNulls);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
 extern void ExecHashTableInsert(HashJoinTable hashtable,
 					TupleTableSlot *slot,
-					uint32 hashvalue);
+					uint32 hashvalue, bool secondary);
 extern bool ExecHashGetHashValue(HashJoinTable hashtable,
 					 ExprContext *econtext,
 					 List *hashkeys,
@@ -49,5 +49,8 @@ extern void ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 						int *numbatches,
 						int *num_skew_mcvs);
 extern int	ExecHashGetSkewBucket(HashJoinTable hashtable, uint32 hashvalue);
+extern void ExecHashPreloadNextBatch(HashJoinTable hashtable);
+extern void ExecHashUpdate(HashJoinTable hashtable);
+extern bool ExecHashCheckForEarlyExit(HashJoinTable hashtable);
 
 #endif   /* NODEHASH_H */
diff --git a/src/include/executor/nodeHashjoin.h b/src/include/executor/nodeHashjoin.h
index f24127a..7d07788 100644
--- a/src/include/executor/nodeHashjoin.h
+++ b/src/include/executor/nodeHashjoin.h
@@ -14,15 +14,25 @@
 #ifndef NODEHASHJOIN_H
 #define NODEHASHJOIN_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "storage/buffile.h"
+#include "storage/shm_toc.h"
 
 extern HashJoinState *ExecInitHashJoin(HashJoin *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecHashJoin(HashJoinState *node);
 extern void ExecEndHashJoin(HashJoinState *node);
 extern void ExecReScanHashJoin(HashJoinState *node);
 
-extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr);
+extern void ExecHashJoinSaveTuple(HashJoinTable hashtable,
+					  MinimalTuple tuple, uint32 hashvalue,
+					  int batchno, bool inner);
+extern void ExecHashJoinInitializeBatchReader(HashJoinTable hashtable,
+					  int batchno, bool inner);
+extern void ExecHashJoinResetBatchReaders(HashJoinTable hashtable);
+
+extern void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt);
+extern void ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt);
+extern void ExecHashJoinInitializeWorker(HashJoinState *state, shm_toc *toc);
 
 #endif   /* NODEHASHJOIN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2fadf76..9ae55be 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1738,6 +1738,7 @@ typedef struct MergeJoinState
 /* these structs are defined in executor/hashjoin.h: */
 typedef struct HashJoinTupleData *HashJoinTuple;
 typedef struct HashJoinTableData *HashJoinTable;
+typedef struct SharedHashJoinTableData *SharedHashJoinTable;
 
 typedef struct HashJoinState
 {
@@ -1759,6 +1760,7 @@ typedef struct HashJoinState
 	int			hj_JoinState;
 	bool		hj_MatchedOuter;
 	bool		hj_OuterNotEmpty;
+	SharedHashJoinTable hj_sharedHashJoinTable;
 } HashJoinState;
 
 
@@ -1982,6 +1984,9 @@ typedef struct HashState
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
 	List	   *hashkeys;		/* list of ExprState nodes */
 	/* hashkeys is same as parent's hj_InnerHashKeys */
+
+	/* The following are the same as the parent's. */
+	SharedHashJoinTable shared_table_data;
 } HashState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index e2fbc7d..e8f90d9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -782,6 +782,7 @@ typedef struct Hash
 	bool		skewInherit;	/* is outer join rel an inheritance tree? */
 	Oid			skewColType;	/* datatype of the outer key column */
 	int32		skewColTypmod;	/* typmod of the outer key column */
+	bool		shared_table;	/* table shared by multiple workers? */
 	/* all other info is in the parent HashJoin node */
 } Hash;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3a1255a..8b06551 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1258,6 +1258,16 @@ typedef struct MergePath
 	bool		materialize_inner;		/* add Materialize to inner? */
 } MergePath;
 
+typedef enum
+{
+	/* Every worker builds its own private copy of the hash table. */
+	HASHPATH_TABLE_PRIVATE,
+	/* One worker builds a shared hash table, and all workers probe it. */
+	HASHPATH_TABLE_SHARED_SERIAL,
+	/* All workers build a shared hash table, and then probe it. */
+	HASHPATH_TABLE_SHARED_PARALLEL
+} HashPathTableType;
+
 /*
  * A hashjoin path has these fields.
  *
@@ -1272,6 +1282,7 @@ typedef struct HashPath
 	JoinPath	jpath;
 	List	   *path_hashclauses;		/* join clauses used for hashing */
 	int			num_batches;	/* number of batches expected */
+	HashPathTableType table_type;		/* level of sharedness */
 } HashPath;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 2a4df2f..7bb0d1d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -24,6 +24,7 @@
 #define DEFAULT_SEQ_PAGE_COST  1.0
 #define DEFAULT_RANDOM_PAGE_COST  4.0
 #define DEFAULT_CPU_TUPLE_COST	0.01
+#define DEFAULT_CPU_SHARED_TUPLE_COST 0.0
 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005
 #define DEFAULT_CPU_OPERATOR_COST  0.0025
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
@@ -48,6 +49,7 @@ typedef enum
 extern PGDLLIMPORT double seq_page_cost;
 extern PGDLLIMPORT double random_page_cost;
 extern PGDLLIMPORT double cpu_tuple_cost;
+extern PGDLLIMPORT double cpu_shared_tuple_cost;
 extern PGDLLIMPORT double cpu_index_tuple_cost;
 extern PGDLLIMPORT double cpu_operator_cost;
 extern PGDLLIMPORT double parallel_tuple_cost;
@@ -144,7 +146,8 @@ extern void initial_cost_hashjoin(PlannerInfo *root,
 					  List *hashclauses,
 					  Path *outer_path, Path *inner_path,
 					  SpecialJoinInfo *sjinfo,
-					  SemiAntiJoinFactors *semifactors);
+					  SemiAntiJoinFactors *semifactors,
+					  HashPathTableType table_type);
 extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 					JoinCostWorkspace *workspace,
 					SpecialJoinInfo *sjinfo,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 71d9154..5f4ca87 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -134,7 +134,8 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
 					 Path *inner_path,
 					 List *restrict_clauses,
 					 Relids required_outer,
-					 List *hashclauses);
+					 List *hashclauses,
+					 HashPathTableType table_type);
 
 extern ProjectionPath *create_projection_path(PlannerInfo *root,
 					   RelOptInfo *rel,
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0b85b7a..519b2e6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -785,7 +785,23 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_HASH_CREATING,
+	WAIT_EVENT_HASH_HASHING,
+	WAIT_EVENT_HASH_RESIZING,
+	WAIT_EVENT_HASH_REBUCKETING,
+	WAIT_EVENT_HASH_INIT,
+	WAIT_EVENT_HASH_DESTROY,
+	WAIT_EVENT_HASH_UNMATCHED,
+	WAIT_EVENT_HASH_PROMOTING,
+	WAIT_EVENT_HASHJOIN_PROMOTING,
+	WAIT_EVENT_HASHJOIN_PROBING,
+	WAIT_EVENT_HASHJOIN_SKIP_LOADING,
+	WAIT_EVENT_HASHJOIN_SKIP_PROBING,
+	WAIT_EVENT_HASHJOIN_LOADING,
+	WAIT_EVENT_HASHJOIN_REWINDING,
+	WAIT_EVENT_HASHJOIN_REWINDING2, /* TODO: rename me */
+	WAIT_EVENT_HASHJOIN_REWINDING3 /* TODO: rename me */
 } WaitEventIPC;
 
 /* ----------
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 809e596..044262d 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -30,12 +30,17 @@
 
 typedef struct BufFile BufFile;
 
+typedef struct BufFileDescriptor BufFileDescriptor;
+
 /*
  * prototypes for functions in buffile.c
  */
 
 extern BufFile *BufFileCreateTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
+extern BufFileDescriptor *BufFileExport(BufFile *file);
+extern BufFile *BufFileImport(BufFileDescriptor *descriptor);
+extern size_t BufFileDescriptorSize(const BufFileDescriptor *descriptor);
 extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern size_t BufFileWrite(BufFile *file, void *ptr, size_t size);
 extern int	BufFileSeek(BufFile *file, int fileno, off_t offset, int whence);

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#2)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Nov 1, 2016 at 5:33 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Please find a WIP patch attached. Everything related to batch reading
is not currently in a working state, which breaks multi-batch joins,
but many single batch cases work correctly. In an earlier version I
had multi-batch joins working but was before I started tackling
problems 2 and 3 listed in my earlier message.

Here is a better version with code to handle multi-batch joins. The
BufFile sharing approach to reading other participants' batch files is
a straw-man (perhaps what we really want would look more like a shared
tuplestore?), but solves the immediate problem I described earlier so
I can focus on other aspects of the problem. There may be some issues
with cleanup though, more on that soon.

Here's a summary of how this patch chops the hash join up into phases.
The 'phase' is an integer that encodes the step we're up to in the
algorithm, including the current batch number, and I represent that
with macros like PHJ_PHASE_HASHING and PHJ_PHASE_PROBING_BATCH(42).
Each phase is either serial, meaning that one participant does
something special, or parallel meaning that all participants do the
same thing. It goes like this:

* PHJ_PHASE_INIT
The initial phase established by the leader before launching workers.

* PHJ_PHASE_CREATING
Serial: One participant creates the hash table.

* PHJ_PHASE_HASHING
Serial or parallel: Depending on plan, one or all participants
execute the inner plan to completion, building the hash table for
batch 0 and possibly writing tuples to batch files on disk for future
batches.

* PHJ_PHASE_RESIZING
Serial: One participant resizes the hash table if necessary.

* PHJ_PHASE_REBUCKETING
Parallel: If the hash table was resized, all participants rehash all
the tuples in it and insert them into the buckets of the new larger
hash table.

* PHJ_PHASE_PROBING_BATCH(0)
Parallel: All participants execute the outer plan to completion. For
each tuple they either probe the hash table if it's for the current
batch, or write it out to a batch file if it's for a future batch.
For each tuple matched in the hash table, they set the matched bit.
When they are finished probing batch 0, they also preload tuples from
inner batch 1 into a secondary hash table until work_mem is exhausted
(note that at this time work_mem is occupied by the primary hash
table: this is just a way to use any remaining work_mem and extract a
little bit more parallelism, since otherwise every participant would
be waiting for all participants to finish probing; instead we wait for
all paticipants to finish probing AND for spare work_mem to run out).

* PHJ_PHASE_UNMATCHED_BATCH(0)
Parallel: For right/full joins, all participants then scan the hash
table looking for unmatched tuples.

... now we are ready for batch 1 ...

* PHJ_PHASE_PROMOTING_BATCH(1)
Serial: One participant promotes the secondary hash table to become
the new primary hash table.

* PHJ_PHASE_LOADING_BATCH(1)
Parallel: All participants finish loading inner batch 1 into the hash
table (work that was started in the previous probing phase).

* PHJ_PHASE_PREPARING_BATCH(1)
Serial: One participant resets the batch reading heads, so that we
are ready to read from outer batch 1, and inner batch 2.

* PHJ_PHASE_PROBING_BATCH(1)
Parallel: All participants read from outer batch 1 to probe the hash
table, then read from inner batch 2 to preload tuples into the
secondary hash table.

* PHJ_PHASE_UNMATCHED_BATCH(1)
Parallel: For right/full joins, all participants then scan the hash
table looking for unmatched tuples.

... now we are ready for batch 2 ...

Then all participants synchronise a final time to enter batch
PHJ_PHASE_PROMOTING_BATCH(nbatch), which is one past the end and is
the point at which it is safe to clean up. (There may be an
optimisation where I can clean up after the last participant detaches
instead, more on that soon).

Obviously I'm actively working on developing and stabilising all this.
Some of the things I'm working on are: work_mem accounting, batch
increases, rescans and figuring out if the resource management for
those BufFiles is going to work. There are quite a lot of edge cases
some of which I'm still figuring out, but I feel like this approach is
workable. At this stage I want to share what I'm doing to see if
others have feedback, ideas, blood curdling screams of horror, etc. I
will have better patches and a set of test queries soon. Thanks for
reading.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-hash-v2.patchapplication/octet-stream; name=parallel-hash-v2.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0a669d9..1e7d369 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1023,7 +1023,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			pname = sname = "Limit";
 			break;
 		case T_Hash:
-			pname = sname = "Hash";
+			if (((Hash *) plan)->shared_table)
+				pname = sname = "Shared Hash";
+			else
+				pname = sname = "Hash";
 			break;
 		default:
 			pname = sname = "???";
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 72bacd5..c8c39f7 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -27,6 +27,7 @@
 #include "executor/executor.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
+#include "executor/nodeHashjoin.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/tqueue.h"
 #include "nodes/nodeFuncs.h"
@@ -203,6 +204,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinEstimate((HashJoinState *) planstate,
+									 e->pcxt);
+				break;
 			default:
 				break;
 		}
@@ -255,6 +260,9 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinInitializeDSM((HashJoinState *) planstate,
+										  d->pcxt);
 			default:
 				break;
 		}
@@ -724,6 +732,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinInitializeWorker((HashJoinState *) planstate,
+											 toc);
+				break;
 			default:
 				break;
 		}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 6375d9b..0b8d27b 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -25,6 +25,7 @@
 #include <limits.h>
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "commands/tablespace.h"
 #include "executor/execdebug.h"
@@ -32,12 +33,13 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "pgstat.h"
+#include "port/atomics.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
-
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
@@ -47,8 +49,30 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 						uint32 hashvalue,
 						int bucketNumber);
 static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
+static void ExecHashRebucket(HashJoinTable hashtable);
+static void ExecHashTableComputeOptimalBuckets(HashJoinTable hashtable);
+
+static void add_tuple_count(HashJoinTable hashtable, int count,
+							bool secondary);
+static HashJoinTuple next_tuple_in_bucket(HashJoinTable table,
+										  HashJoinTuple tuple);
+static HashJoinTuple first_tuple_in_skew_bucket(HashJoinTable table,
+												int skew_bucket_no);
+static HashJoinTuple first_tuple_in_skew_bucket(HashJoinTable table,
+												int bucket_no);
+static void insert_tuple_into_bucket(HashJoinTable table, int bucket_no,
+									 HashJoinTuple tuple,
+									 dsa_pointer tuple_pointer);
+static void insert_tuple_into_skew_bucket(HashJoinTable table,
+										  int bucket_no,
+										  HashJoinTuple tuple,
+										  dsa_pointer tuple_pointer);
 
 static void *dense_alloc(HashJoinTable hashtable, Size size);
+static void *dense_alloc_shared(HashJoinTable hashtable, Size size,
+								dsa_pointer *chunk_shared,
+								bool secondary);
+
 
 /* ----------------------------------------------------------------
  *		ExecHash
@@ -64,6 +88,100 @@ ExecHash(HashState *node)
 }
 
 /* ----------------------------------------------------------------
+ * 		ExecHashCheckForEarlyExit
+ *
+ *		return true if this process needs to abandon work on the
+ *		hash join to avoid a deadlock
+ * ----------------------------------------------------------------
+ */
+bool
+ExecHashCheckForEarlyExit(HashJoinTable hashtable)
+{
+	/*
+	 * The golden rule of leader deadlock avoidance: since leader processes
+	 * have two separate roles, namely reading from worker queues AND executing
+	 * the same plan as workers, we must never allow a leader to wait for
+	 * workers if there is any possibility those workers have emitted tuples.
+	 * Otherwise we could get into a situation where a worker fills up its
+	 * output tuple queue and begins waiting for the leader to read, while
+	 * the leader is busy waiting for the worker.
+	 *
+	 * Parallel hash joins with shared tables are inherently susceptible to
+	 * such deadlocks because there are points at which all participants must
+	 * wait (you can't start check for unmatched tuples in the hash table until
+	 * probing has completed in all workers, etc).
+	 *
+	 * So we follow these rules:
+	 *
+	 * 1.  If there are workers participating, the leader MUST NOT not
+	 *     participate in any further work after probing the first batch, so
+	 *     that it never has to wait for workers that might have emitted
+	 *     tuples.
+	 *
+	 * 2.  If there are no workers participating, the leader MUST run all the
+	 *     batches to completion, because that's the only way for the join
+	 *     to complete.  There is no deadlock risk if there are no workers.
+	 *
+	 * 3.  Workers MUST NOT participate if the hashing phase has finished by
+	 *     the time they have joined, so that the leader can reliably determine
+	 *     whether there are any workers running when it comes to the point
+	 *     where it must choose between 1 and 2.
+	 *
+	 * In other words, if the leader makes it all the way through hashing and
+	 * probing before any workers show up, then the leader will run the whole
+	 * hash join on its own.  If workers do show up any time before hashing is
+	 * finished, the leader will stop executing the join after helping probe
+	 * the first batch.   In the unlikely event of the first worker showing up
+	 * after the leader has finished hashing, it will exit because it's too
+	 * late, the leader has already decided to do all the work alone.
+	 */
+
+	if (!IsParallelWorker())
+	{
+		/* Running in the leader process. */
+		if (BarrierPhase(&hashtable->shared->barrier) == PHJ_PHASE_PROBING &&
+			hashtable->shared->at_least_one_worker)
+		{
+			/* Abandon ship due to rule 1.  There are workers running. */
+			hashtable->detached_early = true;
+		}
+		else
+		{
+			/*
+			 * Continue processing due to rule 2.  There are no workers, and
+			 * any workers that show up later will abandon ship.
+			 */
+		}
+	}
+	else
+	{
+		/* Running in a worker process. */
+		if (hashtable->attached_at_phase < PHJ_PHASE_PROBING)
+		{
+			/*
+			 * Advertise that there are workers, so that the leader can
+			 * choose between rules 1 and 2.  It's OK that several workers can
+			 * write to this variable without immediately memory
+			 * synchronization, because the leader will only read it in a later
+			 * phase (see above).
+			 */
+			hashtable->shared->at_least_one_worker = true;
+		}
+		else
+		{
+			/* Abandon ship due to rule 3. */
+			hashtable->detached_early = true;
+		}
+	}
+
+	/* If we decided to exit early, detach now. */
+	if (hashtable->detached_early)
+		BarrierDetach(&hashtable->shared->barrier);
+
+	return hashtable->detached_early;
+}
+
+/* ----------------------------------------------------------------
  *		MultiExecHash
  *
  *		build hash table for hashjoin, doing partitioning if more
@@ -79,6 +197,7 @@ MultiExecHash(HashState *node)
 	TupleTableSlot *slot;
 	ExprContext *econtext;
 	uint32		hashvalue;
+	Barrier	   *barrier = NULL;
 
 	/* must provide our own instrumentation support */
 	if (node->ps.instrument)
@@ -90,6 +209,55 @@ MultiExecHash(HashState *node)
 	outerNode = outerPlanState(node);
 	hashtable = node->hashtable;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Synchronize parallel hash table builds.  At this stage we know that
+		 * the shared hash table has been created, but we don't know if our
+		 * peers are still in MultiExecHash and if so how far through.  We use
+		 * the phase to synchronize with them.
+		 */
+		barrier = &hashtable->shared->barrier;
+
+		switch (BarrierPhase(barrier))
+		{
+		case PHJ_PHASE_INIT:
+			/* ExecHashTableCreate already handled this phase. */
+			Assert(false);
+		case PHJ_PHASE_CREATING:
+			/* Wait for serial phase, and then either hash or wait. */
+			if (BarrierWait(barrier, WAIT_EVENT_HASH_CREATING))
+				goto hash;
+			else if (node->ps.plan->parallel_aware)
+				goto hash;
+			else
+				goto post_hash;
+		case PHJ_PHASE_HASHING:
+			/* Hashing is already underway.  Can we join in? */
+			if (node->ps.plan->parallel_aware)
+				goto hash;
+			else
+				goto post_hash;
+		case PHJ_PHASE_RESIZING:
+			/* Can't help with serial phase. */
+			goto post_resize;
+		case PHJ_PHASE_REBUCKETING:
+			/* Rebucketing is in progress.  Let's help do that. */
+			goto rebucket;
+		default:
+			/* The hash table building work is already finished. */
+			goto finish;
+		}
+	}
+
+ hash:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Make sure our local hashtable is up-to-date so we can hash. */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_HASHING);
+		ExecHashUpdate(hashtable);
+	}
+
 	/*
 	 * set expression context
 	 */
@@ -123,22 +291,98 @@ MultiExecHash(HashState *node)
 			else
 			{
 				/* Not subject to skew optimization, so insert normally */
-				ExecHashTableInsert(hashtable, slot, hashvalue);
+				ExecHashTableInsert(hashtable, slot, hashvalue, false);
 			}
-			hashtable->totalTuples += 1;
+			/*
+			 * Shared tuple counters are managed by dense_alloc_shared.  For
+			 * private hash tables we maintain the counter here.
+			 */
+			if (!HashJoinTableIsShared(hashtable))
+				hashtable->totalTuples += 1;
 		}
 	}
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Update shared tuple count for the current chunk.  Other chunks are
+		 * accounted for already, when new chunks are allocated.
+		 */
+		if (hashtable->primary_chunk != NULL)
+			add_tuple_count(hashtable, hashtable->primary_chunk->ntuples,
+							false);
+	}
+
+ post_hash:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		bool elected_to_resize;
+
+		/*
+		 * Wait for all backends to finish hashing.  If only one worker is
+		 * running the hashing phase because of a non-partial inner plan, the
+		 * other workers will pile up here waiting.  If multiple worker are
+		 * hashing, they should finish close to each other in time.
+		 */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_HASHING);
+		elected_to_resize = BarrierWait(barrier, WAIT_EVENT_HASH_HASHING);
+		/*
+		 * Resizing is a serial phase.  All but one should skip ahead to
+		 * rebucketing, but all workers should update their copy of the shared
+		 * tuple count with the final total first.
+		 */
+		hashtable->totalTuples =
+			pg_atomic_read_u64(&hashtable->shared->total_primary_tuples);
+		if (!elected_to_resize)
+			goto post_resize;
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+	}
+
 	/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-	if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-		ExecHashIncreaseNumBuckets(hashtable);
+	ExecHashIncreaseNumBuckets(hashtable);
+
+ post_resize:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+		BarrierWait(&hashtable->shared->barrier,
+					WAIT_EVENT_HASH_RESIZING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REBUCKETING);
+	}
+
+ rebucket:
+	/* If the table was resized, insert tuples into the new buckets. */
+	ExecHashUpdate(hashtable);
+	ExecHashRebucket(hashtable);
 
 	/* Account for the buckets in spaceUsed (reported in EXPLAIN ANALYZE) */
-	hashtable->spaceUsed += hashtable->nbuckets * sizeof(HashJoinTuple);
+	hashtable->spaceUsed += hashtable->nbuckets * sizeof(HashJoinBucketHead);
 	if (hashtable->spaceUsed > hashtable->spacePeak)
 		hashtable->spacePeak = hashtable->spaceUsed;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REBUCKETING);
+		BarrierWait(barrier, WAIT_EVENT_HASH_REBUCKETING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING);
+	}
+
+ finish:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * All hashing work has finished.  The other workers may be probing or
+		 * processing unmatched tuples for the initial batch, or dealing with
+		 * later batches.  The next synchronization point is in ExecHashJoin's
+		 * HJ_BUILD_HASHTABLE case, which will figure that out and synchronize
+		 * its local state machine with the parallel processing group's phase.
+		 */
+		Assert(BarrierPhase(barrier) >= PHJ_PHASE_PROBING);
+		ExecHashUpdate(hashtable);
+	}
+
 	/* must provide our own instrumentation support */
+	/* TODO: report only the tuples that WE hashed here? */
 	if (node->ps.instrument)
 		InstrStopNode(node->ps.instrument, hashtable->totalTuples);
 
@@ -243,8 +487,9 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
+ExecHashTableCreate(HashState *state, List *hashOperators, bool keepNulls)
 {
+	Hash	   *node;
 	HashJoinTable hashtable;
 	Plan	   *outerNode;
 	int			nbuckets;
@@ -261,6 +506,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	 * "outer" subtree of this node, but the inner relation of the hashjoin).
 	 * Compute the appropriate size of the hash table.
 	 */
+	node = (Hash *) state->ps.plan;
 	outerNode = outerPlan(node);
 
 	ExecChooseHashTableSize(outerNode->plan_rows, outerNode->plan_width,
@@ -305,7 +551,14 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	hashtable->spaceUsedSkew = 0;
 	hashtable->spaceAllowedSkew =
 		hashtable->spaceAllowed * SKEW_WORK_MEM_PERCENT / 100;
-	hashtable->chunks = NULL;
+	hashtable->primary_chunk = NULL;
+	hashtable->secondary_chunk = NULL;
+	hashtable->chunks_to_rebucket = NULL;
+	hashtable->primary_chunk_shared = InvalidDsaPointer;
+	hashtable->secondary_chunk_shared = InvalidDsaPointer;
+	hashtable->area = state->ps.state->es_query_area;
+	hashtable->shared = state->shared_table_data;
+	hashtable->detached_early = false;
 
 #ifdef HJDEBUG
 	printf("Hashjoin %p: initial nbatch = %d, nbuckets = %d\n",
@@ -368,23 +621,101 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 		PrepareTempTablespaces();
 	}
 
-	/*
-	 * Prepare context for the first-scan space allocations; allocate the
-	 * hashbucket array therein, and set each bucket "empty".
-	 */
-	MemoryContextSwitchTo(hashtable->batchCxt);
+	MemoryContextSwitchTo(oldcxt);
 
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Barrier *barrier;
 
-	/*
-	 * Set up for skew optimization, if possible and there's a need for more
-	 * than one batch.  (In a one-batch join, there's no point in it.)
-	 */
-	if (nbatch > 1)
-		ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+		/*
+		 * Attach to the barrier.  The corresponding detach operation is in
+		 * ExecHashTableDestroy.
+		 */
+		barrier = &hashtable->shared->barrier;
+		hashtable->attached_at_phase = BarrierAttach(barrier);
 
-	MemoryContextSwitchTo(oldcxt);
+		/*
+		 * So far we have no idea whether there are any other participants, and
+		 * if so, what phase they are working on.  The only thing we care about
+		 * at this point is whether someone has already created the shared
+		 * hash table yet.  If not, one backend will be elected to do that
+		 * now.
+		 */
+		if (BarrierPhase(barrier) == PHJ_PHASE_INIT)
+		{
+			if (BarrierWait(barrier, WAIT_EVENT_HASH_INIT))
+			{
+				/* Serial phase: create the hash tables */
+				Size bytes;
+				HashJoinBucketHead *buckets;
+				int i;
+				SharedHashJoinTable shared;
+				dsa_area *area;
+
+				shared = hashtable->shared;
+				area = hashtable->area;
+				bytes = nbuckets * sizeof(HashJoinBucketHead);
+
+				/* Allocate the primary and secondary hash tables. */
+				shared->primary_buckets = dsa_allocate(area, bytes);
+				shared->secondary_buckets = dsa_allocate(area, bytes);
+				if (!DsaPointerIsValid(shared->primary_buckets) ||
+					!DsaPointerIsValid(shared->secondary_buckets))
+					ereport(ERROR,
+							(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+							 errmsg("out of memory")));
+
+				/* Set up primary table's buckets. */
+				buckets = dsa_get_address(area, shared->primary_buckets);
+				for (i = 0; i < nbuckets; ++i)
+					dsa_pointer_atomic_init(&buckets[i].shared,
+											InvalidDsaPointer);
+				/* Set up secondary table's buckets. */
+				buckets = dsa_get_address(area, shared->secondary_buckets);
+				for (i = 0; i < nbuckets; ++i)
+					dsa_pointer_atomic_init(&buckets[i].shared,
+											InvalidDsaPointer);
+
+				/* Initialize the rest of parallel_state. */
+				hashtable->shared->nbuckets = nbuckets;
+				pg_atomic_write_u32(&hashtable->shared->next_unmatched_bucket,
+									0);
+				ExecHashJoinRewindBatches(hashtable, 0);
+
+				/* TODO: ExecHashBuildSkewHash */
+
+				/*
+				 * The backend-local pointers in hashtable will be set up by
+				 * ExecHashUpdate, at each point where they might have
+				 * changed.
+				 */
+			}
+			Assert(BarrierPhase(&hashtable->shared->barrier) ==
+				   PHJ_PHASE_CREATING);
+			/* The next synchronization point is in MultiExecHash. */
+		}
+	}
+	else
+	{
+		/*
+		 * Prepare context for the first-scan space allocations; allocate the
+		 * hashbucket array therein, and set each bucket "empty".
+		 */
+		MemoryContextSwitchTo(hashtable->batchCxt);
+
+		hashtable->buckets = (HashJoinBucketHead *)
+			palloc0(nbuckets * sizeof(HashJoinBucketHead));
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/*
+		 * Set up for skew optimization, if possible and there's a need for
+		 * more than one batch.  (In a one-batch join, there's no point in
+		 * it.)
+		 */
+		if (nbatch > 1)
+			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+	}
 
 	return hashtable;
 }
@@ -481,8 +812,8 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 	 * Note that both nbuckets and nbatch must be powers of 2 to make
 	 * ExecHashGetBucketAndBatch fast.
 	 */
-	max_pointers = (work_mem * 1024L) / sizeof(HashJoinTuple);
-	max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));
+	max_pointers = (work_mem * 1024L) / sizeof(HashJoinBucketHead);
+	max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinBucketHead));
 	/* If max_pointers isn't a power of 2, must round it down to one */
 	mppow2 = 1L << my_log2(max_pointers);
 	if (max_pointers != mppow2)
@@ -504,7 +835,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 	 * If there's not enough space to store the projected number of tuples and
 	 * the required bucket headers, we will need multiple batches.
 	 */
-	bucket_bytes = sizeof(HashJoinTuple) * nbuckets;
+	bucket_bytes = sizeof(HashJoinBucketHead) * nbuckets;
 	if (inner_rel_bytes + bucket_bytes > hash_table_bytes)
 	{
 		/* We'll need multiple batches */
@@ -519,12 +850,12 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 		 * NTUP_PER_BUCKET tuples, whose projected size already includes
 		 * overhead for the hash code, pointer to the next tuple, etc.
 		 */
-		bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinTuple));
+		bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinBucketHead));
 		lbuckets = 1L << my_log2(hash_table_bytes / bucket_size);
 		lbuckets = Min(lbuckets, max_pointers);
 		nbuckets = (int) lbuckets;
 		nbuckets = 1 << my_log2(nbuckets);
-		bucket_bytes = nbuckets * sizeof(HashJoinTuple);
+		bucket_bytes = nbuckets * sizeof(HashJoinBucketHead);
 
 		/*
 		 * Buckets are simple pointers to hashjoin tuples, while tupsize
@@ -564,6 +895,38 @@ ExecHashTableDestroy(HashJoinTable hashtable)
 {
 	int			i;
 
+	/* Detached, if we haven't already. */
+	if (HashJoinTableIsShared(hashtable) && !hashtable->detached_early)
+	{
+		Barrier *barrier = &hashtable->shared->barrier;
+
+		/*
+		 * TODO: Can we just detach if there is only one batch, but wait here
+		 * if there is more than one (to make sure batch files created by this
+		 * participant are not deleted)?  When detaching, the last one to
+		 * detach should do the cleanup work, and/or leave things in the right
+		 * state for rescanning.
+		 */
+
+		if (BarrierWait(barrier, WAIT_EVENT_HASH_DESTROY))
+		{
+			/* Serial: free the tables */
+			if (DsaPointerIsValid(hashtable->shared->primary_buckets))
+			{
+				dsa_free(hashtable->area,
+						 hashtable->shared->primary_buckets);
+				hashtable->shared->primary_buckets = InvalidDsaPointer;
+			}
+			if (DsaPointerIsValid(hashtable->shared->secondary_buckets))
+			{
+				dsa_free(hashtable->area,
+						 hashtable->shared->secondary_buckets);
+				hashtable->shared->secondary_buckets = InvalidDsaPointer;
+			}
+		}
+		BarrierDetach(&hashtable->shared->barrier);
+	}
+
 	/*
 	 * Make sure all the temp files are closed.  We skip batch 0, since it
 	 * can't have any temp files (and the arrays might not even exist if
@@ -600,6 +963,15 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	long		nfreed;
 	HashMemoryChunk oldchunks;
 
+	/*
+	 * TODO: Implement for shared tables.  It's OK for different workers to
+	 * have different ideas of nbatch for short times, as long as they agree
+	 * at key points in time (ie when deciding if we're finished).  Working on
+	 * this...
+	 */
+	if (HashJoinTableIsShared(hashtable))
+		return;
+
 	/* do nothing if we've decided to shut off growth */
 	if (!hashtable->growEnabled)
 		return;
@@ -661,7 +1033,7 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 		hashtable->log2_nbuckets = hashtable->log2_nbuckets_optimal;
 
 		hashtable->buckets = repalloc(hashtable->buckets,
-								sizeof(HashJoinTuple) * hashtable->nbuckets);
+								sizeof(HashJoinBucketHead) * hashtable->nbuckets);
 	}
 
 	/*
@@ -669,14 +1041,14 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	 * buckets now and not have to keep track which tuples in the buckets have
 	 * already been processed. We will free the old chunks as we go.
 	 */
-	memset(hashtable->buckets, 0, sizeof(HashJoinTuple) * hashtable->nbuckets);
-	oldchunks = hashtable->chunks;
-	hashtable->chunks = NULL;
+	memset(hashtable->buckets, 0, sizeof(HashJoinBucketHead) * hashtable->nbuckets);
+	oldchunks = hashtable->primary_chunk;
+	hashtable->primary_chunk = NULL;
 
 	/* so, let's scan through the old chunks, and all tuples in each chunk */
 	while (oldchunks != NULL)
 	{
-		HashMemoryChunk nextchunk = oldchunks->next;
+		HashMemoryChunk nextchunk = oldchunks->next.private;
 
 		/* position within the buffer (up to oldchunks->used) */
 		size_t		idx = 0;
@@ -699,20 +1071,23 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 				/* keep tuple in memory - copy it into the new chunk */
 				HashJoinTuple copyTuple;
 
-				copyTuple = (HashJoinTuple) dense_alloc(hashtable, hashTupleSize);
+				copyTuple = (HashJoinTuple)
+					dense_alloc(hashtable, hashTupleSize);
 				memcpy(copyTuple, hashTuple, hashTupleSize);
 
 				/* and add it back to the appropriate bucket */
-				copyTuple->next = hashtable->buckets[bucketno];
-				hashtable->buckets[bucketno] = copyTuple;
+				insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+										 InvalidDsaPointer);
 			}
 			else
 			{
 				/* dump it out */
 				Assert(batchno > curbatch);
-				ExecHashJoinSaveTuple(HJTUPLE_MINTUPLE(hashTuple),
+				ExecHashJoinSaveTuple(hashtable,
+									  HJTUPLE_MINTUPLE(hashTuple),
 									  hashTuple->hashvalue,
-									  &hashtable->innerBatchFile[batchno]);
+									  batchno,
+									  true);
 
 				hashtable->spaceUsed -= hashTupleSize;
 				nfreed++;
@@ -758,8 +1133,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 static void
 ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 {
-	HashMemoryChunk chunk;
-
 	/* do nothing if not an increase (it's called increase for a reason) */
 	if (hashtable->nbuckets >= hashtable->nbuckets_optimal)
 		return;
@@ -780,16 +1153,156 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 	 * Just reallocate the proper number of buckets - we don't need to walk
 	 * through them - we can walk the dense-allocated chunks (just like in
 	 * ExecHashIncreaseNumBatches, but without all the copying into new
-	 * chunks)
+	 * chunks): see ExecHashRebucket, which must be called next.
 	 */
-	hashtable->buckets =
-		(HashJoinTuple *) repalloc(hashtable->buckets,
-								hashtable->nbuckets * sizeof(HashJoinTuple));
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Size bytes;
+		int i;
+
+		/* Serial phase: only one backend reallocates. */
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_RESIZING);
+
+		/* Free the old arrays. */
+		dsa_free(hashtable->area,
+				 hashtable->shared->primary_buckets);
+		dsa_free(hashtable->area,
+				 hashtable->shared->secondary_buckets);
+		/* Allocate replacements. */
+		bytes = hashtable->nbuckets * sizeof(HashJoinBucketHead);
+		hashtable->shared->primary_buckets =
+			dsa_allocate(hashtable->area, bytes);
+		hashtable->shared->secondary_buckets =
+			dsa_allocate(hashtable->area, bytes);
+		if (!DsaPointerIsValid(hashtable->shared->primary_buckets) ||
+			!DsaPointerIsValid(hashtable->shared->secondary_buckets))
+			ereport(ERROR,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("out of memory")));
+		/* Initialize empty buckets. */
+		hashtable->buckets =
+			dsa_get_address(hashtable->area,
+							hashtable->shared->primary_buckets);
+		for (i = 0; i < hashtable->nbuckets; ++i)
+			dsa_pointer_atomic_write(&hashtable->buckets[i].shared,
+									 InvalidDsaPointer);
+		hashtable->next_buckets =
+			dsa_get_address(hashtable->area,
+							hashtable->shared->secondary_buckets);
+		for (i = 0; i < hashtable->nbuckets; ++i)
+			dsa_pointer_atomic_write(&hashtable->next_buckets[i].shared,
+									 InvalidDsaPointer);
+		hashtable->shared->nbuckets = hashtable->nbuckets;
+		/* Move all primary chunks to the rebucket list. */
+		dsa_pointer_atomic_write(&hashtable->shared->chunks_to_rebucket,
+								 dsa_pointer_atomic_read(&hashtable->shared->head_primary_chunk));
+		dsa_pointer_atomic_write(&hashtable->shared->head_primary_chunk,
+								 InvalidDsaPointer);
+	}
+	else
+	{
+		hashtable->buckets =
+			(HashJoinBucketHead *) repalloc(hashtable->buckets,
+											hashtable->nbuckets * sizeof(HashJoinBucketHead));
+
+		memset(hashtable->buckets, 0, hashtable->nbuckets * sizeof(HashJoinBucketHead));
+		/* Move all chunks to the rebucket list. */
+		hashtable->chunks_to_rebucket = hashtable->primary_chunk;
+		hashtable->primary_chunk = NULL;
+	}
+}
+
+/*
+ * Pop a memory chunk from a given list atomically.  Returns a backend-local
+ * pointer to the chunk, or NULL if the list is empty.  Also sets *chunk_out
+ * to the dsa_pointer to the chunk.
+ */
+static HashMemoryChunk
+ExecHashPopChunk(HashJoinTable hashtable,
+				 dsa_pointer *chunk_out,
+				 dsa_pointer_atomic *head)
+{
+	HashMemoryChunk chunk = NULL;
+
+	/*
+	 * We could see a stale empty list and exist early without a barrier, so
+	 * explicitly include one before we read the head of the list for the
+	 * first time.
+	 */
+	pg_read_barrier();
 
-	memset(hashtable->buckets, 0, hashtable->nbuckets * sizeof(HashJoinTuple));
+	for (;;)
+	{
+		*chunk_out = dsa_pointer_atomic_read(head);
+		if (!DsaPointerIsValid(*chunk_out))
+		{
+			chunk = NULL;
+			break;
+		}
+		chunk = (HashMemoryChunk)
+			dsa_get_address(hashtable->area, *chunk_out);
+		if (dsa_pointer_atomic_compare_exchange(head,
+												chunk_out,
+												chunk->next.shared))
+			break;
+	}
+
+	return chunk;
+}
+
+/*
+ * Push a shared memory chunk onto a given list atomically.
+ */
+static void
+ExecHashPushChunk(HashJoinTable hashtable,
+				  HashMemoryChunk chunk,
+				  dsa_pointer chunk_shared,
+				  dsa_pointer_atomic *head)
+{
+	Assert(chunk == dsa_get_address(hashtable->area, chunk_shared));
+
+	for (;;)
+	{
+		chunk->next.shared = dsa_pointer_atomic_read(head);
+		if (dsa_pointer_atomic_compare_exchange(head,
+												&chunk->next.shared,
+												chunk_shared))
+			break;
+	}
+}
+
+/*
+ * ExecHashRebucket
+ *		insert the tuples from all chunks into the correct bucket
+ */
+static void
+ExecHashRebucket(HashJoinTable hashtable)
+{
+	HashMemoryChunk chunk;
+	dsa_pointer chunk_shared;
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * This is a parallel phase.  Workers will atomically pop one chunk at
+		 * a time and rebucket all of its tuples.
+		 */
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_REBUCKETING);
+	}
 
-	/* scan through all tuples in all chunks to rebuild the hash table */
-	for (chunk = hashtable->chunks; chunk != NULL; chunk = chunk->next)
+	/*
+	 * Scan through all tuples in all chunks in the rebucket list to rebuild
+	 * the hash table.
+	 */
+	if (HashJoinTableIsShared(hashtable))
+		chunk =
+			ExecHashPopChunk(hashtable, &chunk_shared,
+							 &hashtable->shared->chunks_to_rebucket);
+	else
+		chunk = hashtable->chunks_to_rebucket;
+	while (chunk != NULL)
 	{
 		/* process all tuples stored in this chunk */
 		size_t		idx = 0;
@@ -797,6 +1310,8 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 		while (idx < chunk->used)
 		{
 			HashJoinTuple hashTuple = (HashJoinTuple) (chunk->data + idx);
+			dsa_pointer hashTuple_shared = chunk_shared +
+				offsetof(HashMemoryChunkData, data) + idx;
 			int			bucketno;
 			int			batchno;
 
@@ -804,16 +1319,52 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 									  &bucketno, &batchno);
 
 			/* add the tuple to the proper bucket */
-			hashTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = hashTuple;
+			insert_tuple_into_bucket(hashtable, bucketno, hashTuple,
+									 hashTuple_shared);
 
 			/* advance index past the tuple */
 			idx += MAXALIGN(HJTUPLE_OVERHEAD +
 							HJTUPLE_MINTUPLE(hashTuple)->t_len);
 		}
+
+		/* Push chunk onto regular list and move to next chunk. */
+		if (HashJoinTableIsShared(hashtable))
+		{
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->head_primary_chunk);
+			chunk =
+				ExecHashPopChunk(hashtable, &chunk_shared,
+								 &hashtable->shared->chunks_to_rebucket);
+		}
+		else
+		{
+			HashMemoryChunk next = chunk->next.private;
+
+			chunk->next.private = hashtable->primary_chunk;
+			hashtable->primary_chunk = chunk;
+			chunk = next;
+		}
 	}
 }
 
+static void
+ExecHashTableComputeOptimalBuckets(HashJoinTable hashtable)
+{
+	double		ntuples = (hashtable->totalTuples - hashtable->skewTuples);
+
+	/*
+	 * Guard against integer overflow and alloc size overflow.  The
+	 * MaxAllocSize limitation doesn't really apply for shared hash tables,
+	 * since DSA has no such limit, but for now let's apply the same limit.
+	 */
+	while (ntuples > (hashtable->nbuckets_optimal * NTUP_PER_BUCKET) &&
+		   hashtable->nbuckets_optimal <= INT_MAX / 2 &&
+		   hashtable->nbuckets_optimal * 2 <= MaxAllocSize / sizeof(HashJoinBucketHead))
+	{
+		hashtable->nbuckets_optimal *= 2;
+		hashtable->log2_nbuckets_optimal += 1;
+	}
+}
 
 /*
  * ExecHashTableInsert
@@ -829,7 +1380,8 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 void
 ExecHashTableInsert(HashJoinTable hashtable,
 					TupleTableSlot *slot,
-					uint32 hashvalue)
+					uint32 hashvalue,
+					bool secondary)
 {
 	MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
 	int			bucketno;
@@ -848,11 +1400,17 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		 */
 		HashJoinTuple hashTuple;
 		int			hashTupleSize;
-		double		ntuples = (hashtable->totalTuples - hashtable->skewTuples);
+		dsa_pointer hashTuple_shared = InvalidDsaPointer;
 
 		/* Create the HashJoinTuple */
 		hashTupleSize = HJTUPLE_OVERHEAD + tuple->t_len;
-		hashTuple = (HashJoinTuple) dense_alloc(hashtable, hashTupleSize);
+		if (HashJoinTableIsShared(hashtable))
+			hashTuple = (HashJoinTuple)
+				dense_alloc_shared(hashtable, hashTupleSize,
+								   &hashTuple_shared, secondary);
+		else
+			hashTuple = (HashJoinTuple)
+				dense_alloc(hashtable, hashTupleSize);
 
 		hashTuple->hashvalue = hashvalue;
 		memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
@@ -866,32 +1424,23 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
 		/* Push it onto the front of the bucket's list */
-		hashTuple->next = hashtable->buckets[bucketno];
-		hashtable->buckets[bucketno] = hashTuple;
+		insert_tuple_into_bucket(hashtable, bucketno, hashTuple,
+								 hashTuple_shared);
 
 		/*
 		 * Increase the (optimal) number of buckets if we just exceeded the
 		 * NTUP_PER_BUCKET threshold, but only when there's still a single
 		 * batch.
 		 */
-		if (hashtable->nbatch == 1 &&
-			ntuples > (hashtable->nbuckets_optimal * NTUP_PER_BUCKET))
-		{
-			/* Guard against integer overflow and alloc size overflow */
-			if (hashtable->nbuckets_optimal <= INT_MAX / 2 &&
-				hashtable->nbuckets_optimal * 2 <= MaxAllocSize / sizeof(HashJoinTuple))
-			{
-				hashtable->nbuckets_optimal *= 2;
-				hashtable->log2_nbuckets_optimal += 1;
-			}
-		}
+		if (hashtable->nbatch == 1)
+			ExecHashTableComputeOptimalBuckets(hashtable);
 
 		/* Account for space used, and back off if we've used too much */
 		hashtable->spaceUsed += hashTupleSize;
 		if (hashtable->spaceUsed > hashtable->spacePeak)
 			hashtable->spacePeak = hashtable->spaceUsed;
 		if (hashtable->spaceUsed +
-			hashtable->nbuckets_optimal * sizeof(HashJoinTuple)
+			hashtable->nbuckets_optimal * sizeof(HashJoinBucketHead)
 			> hashtable->spaceAllowed)
 			ExecHashIncreaseNumBatches(hashtable);
 	}
@@ -901,9 +1450,11 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		 * put the tuple into a temp file for later batches
 		 */
 		Assert(batchno > hashtable->curbatch);
-		ExecHashJoinSaveTuple(tuple,
+		ExecHashJoinSaveTuple(hashtable,
+							  tuple,
 							  hashvalue,
-							  &hashtable->innerBatchFile[batchno]);
+							  batchno,
+							  true);
 	}
 }
 
@@ -1047,6 +1598,138 @@ ExecHashGetBucketAndBatch(HashJoinTable hashtable,
 }
 
 /*
+ * Update the local hashtable with the current pointers and sizes from
+ * hashtable->parallel_state.
+ */
+void
+ExecHashUpdate(HashJoinTable hashtable)
+{
+	Barrier *barrier;
+
+	if (!HashJoinTableIsShared(hashtable))
+		return;
+
+	barrier = &hashtable->shared->barrier;
+
+	/*
+	 * This should only be called in a phase when the hash table is not being
+	 * mutated (ie resized, swapped etc).
+	 */
+	Assert(!PHJ_PHASE_MUTATING_TABLE(
+		BarrierPhase(&hashtable->shared->barrier)));
+
+	/* The primary hash table. */
+	hashtable->buckets = (HashJoinBucketHead *)
+		dsa_get_address(hashtable->area,
+						hashtable->shared->primary_buckets);
+	hashtable->nbuckets = hashtable->shared->nbuckets;
+	hashtable->log2_nbuckets = my_log2(hashtable->nbuckets);
+	/* The secondary hash table, if there is one (NULL for initial batch). */
+	hashtable->next_buckets = (HashJoinBucketHead *)
+		dsa_get_address(hashtable->area,
+						hashtable->shared->secondary_buckets);
+
+	hashtable->curbatch = PHJ_PHASE_TO_BATCHNO(BarrierPhase(barrier));
+}
+
+/*
+ * Get the next tuple in the same bucket as 'tuple'.
+ */
+static HashJoinTuple
+next_tuple_in_bucket(HashJoinTable table, HashJoinTuple tuple)
+{
+	if (HashJoinTableIsShared(table))
+		return (HashJoinTuple)
+			dsa_get_address(table->area, tuple->next.shared);
+	else
+		return tuple->next.private;
+}
+
+/*
+ * Get the first tuple in a given skew bucket identified by number.
+ */
+static HashJoinTuple
+first_tuple_in_skew_bucket(HashJoinTable table, int skew_bucket_no)
+{
+	if (HashJoinTableIsShared(table))
+		return (HashJoinTuple)
+			dsa_get_address(table->area,
+							table->skewBucket[skew_bucket_no]->tuples.shared);
+	else
+		return table->skewBucket[skew_bucket_no]->tuples.private;
+}
+
+/*
+ * Get the first tuple in a given bucket identified by number.
+ */
+static HashJoinTuple
+first_tuple_in_bucket(HashJoinTable table, int bucket_no)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		dsa_pointer p =
+			dsa_pointer_atomic_read(&table->buckets[bucket_no].shared);
+		return (HashJoinTuple) dsa_get_address(table->area, p);
+	}
+	else
+		return table->buckets[bucket_no].private;
+}
+
+/*
+ * Insert a tuple at the front of a given bucket identified by number.  For
+ * shared hash joins, tuple_shared must be provided, pointing to the tuple in
+ * the dsa_area backing the table.  For private hash joins, it should be
+ * InvalidDsaPointer.
+ */
+static void
+insert_tuple_into_bucket(HashJoinTable table, int bucket_no,
+						 HashJoinTuple tuple, dsa_pointer tuple_shared)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		Assert(tuple == dsa_get_address(table->area, tuple_shared));
+		for (;;)
+		{
+			tuple->next.shared =
+				dsa_pointer_atomic_read(&table->buckets[bucket_no].shared);
+			if (dsa_pointer_atomic_compare_exchange(&table->buckets[bucket_no].shared,
+													&tuple->next.shared,
+													tuple_shared))
+				break;
+		}
+	}
+	else
+	{
+		tuple->next.private = table->buckets[bucket_no].private;
+		table->buckets[bucket_no].private = tuple;
+	}
+}
+
+/*
+ * Insert a tuple at the front of a given skew bucket identified by number.
+ * For shared hash joins, tuple_shared must be provided, pointing to the tuple
+ * in the dsa_area backing the table.  For private hash joins, it should be
+ * InvalidDsaPointer.
+ */
+static void
+insert_tuple_into_skew_bucket(HashJoinTable table, int skew_bucket_no,
+							  HashJoinTuple tuple,
+							  dsa_pointer tuple_shared)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		tuple->next.shared =
+			table->skewBucket[skew_bucket_no]->tuples.shared;
+		table->skewBucket[skew_bucket_no]->tuples.shared = tuple_shared;
+	}
+	else
+	{
+		tuple->next.private = table->skewBucket[skew_bucket_no]->tuples.private;
+		table->skewBucket[skew_bucket_no]->tuples.private = tuple;
+	}
+}
+
+/*
  * ExecScanHashBucket
  *		scan a hash bucket for matches to the current outer tuple
  *
@@ -1073,11 +1756,12 @@ ExecScanHashBucket(HashJoinState *hjstate,
 	 * otherwise scan the standard hashtable bucket.
 	 */
 	if (hashTuple != NULL)
-		hashTuple = hashTuple->next;
+		hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 	else if (hjstate->hj_CurSkewBucketNo != INVALID_SKEW_BUCKET_NO)
-		hashTuple = hashtable->skewBucket[hjstate->hj_CurSkewBucketNo]->tuples;
+		hashTuple = first_tuple_in_skew_bucket(hashtable,
+											   hjstate->hj_CurSkewBucketNo);
 	else
-		hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
+		hashTuple = first_tuple_in_bucket(hashtable, hjstate->hj_CurBucketNo);
 
 	while (hashTuple != NULL)
 	{
@@ -1101,7 +1785,7 @@ ExecScanHashBucket(HashJoinState *hjstate,
 			}
 		}
 
-		hashTuple = hashTuple->next;
+		hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 	}
 
 	/*
@@ -1144,6 +1828,21 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	HashJoinTuple hashTuple = hjstate->hj_CurTuple;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		int phase PG_USED_FOR_ASSERTS_ONLY;
+
+		/*
+		 * TODO: This walks the buckets in parallel mode, like the existing
+		 * code, but it might make more sense to hand out chunks to workers
+		 * instead of buckets.
+		 */
+
+		phase = BarrierPhase(&hashtable->shared->barrier);
+		Assert(PHJ_PHASE_TO_SUBPHASE(phase) == PHJ_SUBPHASE_UNMATCHED);
+		Assert(PHJ_PHASE_TO_BATCHNO(phase) == hashtable->curbatch);
+	}
+
 	for (;;)
 	{
 		/*
@@ -1152,21 +1851,35 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 		 * bucket.
 		 */
 		if (hashTuple != NULL)
-			hashTuple = hashTuple->next;
-		else if (hjstate->hj_CurBucketNo < hashtable->nbuckets)
+			hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
+		else if (HashJoinTableIsShared(hashtable))
 		{
-			hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
-			hjstate->hj_CurBucketNo++;
+			int bucketno =
+				(int) pg_atomic_fetch_add_u32(
+					&hashtable->shared->next_unmatched_bucket, 1);
+
+			if (bucketno >= hashtable->nbuckets)
+				break;			/* finished all buckets */
+
+			hashTuple = first_tuple_in_bucket(hashtable, bucketno);
+
+			/* TODO: parallel skew bucket support */
 		}
-		else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets)
+		else
 		{
-			int			j = hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo];
+			if (hjstate->hj_CurBucketNo < hashtable->nbuckets)
+				hashTuple = first_tuple_in_bucket(hashtable,
+												  hjstate->hj_CurBucketNo++);
+			else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets)
+			{
+				int			j = hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo];
 
-			hashTuple = hashtable->skewBucket[j]->tuples;
-			hjstate->hj_CurSkewBucketNo++;
+				hashTuple = first_tuple_in_skew_bucket(hashtable, j);
+				hjstate->hj_CurSkewBucketNo++;
+			}
+			else
+				break;				/* finished all buckets */
 		}
-		else
-			break;				/* finished all buckets */
 
 		while (hashTuple != NULL)
 		{
@@ -1191,7 +1904,7 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 				return true;
 			}
 
-			hashTuple = hashTuple->next;
+			hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		}
 	}
 
@@ -1212,6 +1925,65 @@ ExecHashTableReset(HashJoinTable hashtable)
 	MemoryContext oldcxt;
 	int			nbuckets = hashtable->nbuckets;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Wait for all workers to finish accessing the primary hash table. */
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_UNMATCHED);
+		if (BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_UNMATCHED))
+		{
+			/* Serial phase: promote the secondary table to primary. */
+			dsa_pointer tmp;
+			int i;
+
+			Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+				   PHJ_SUBPHASE_PROMOTING);
+
+			/* Clear the old primary table. */
+			for (i = 0; i < nbuckets; ++i)
+				dsa_pointer_atomic_write(&hashtable->buckets[i].shared,
+										 InvalidDsaPointer);
+
+			/* Swap the two tables. */
+			tmp = hashtable->shared->primary_buckets;
+			hashtable->shared->primary_buckets =
+				hashtable->shared->secondary_buckets;
+			hashtable->shared->secondary_buckets = tmp;
+
+			/* Swap the chunk lists. */
+			tmp = dsa_pointer_atomic_read(&hashtable->shared->head_primary_chunk);
+			dsa_pointer_atomic_write(&hashtable->shared->head_primary_chunk,
+									 dsa_pointer_atomic_read(&hashtable->shared->head_secondary_chunk));
+			dsa_pointer_atomic_write(&hashtable->shared->head_secondary_chunk,
+									 tmp);
+
+			/* Free the secondary chunks. */
+			/* TODO: Or put them on a freelist in one cheap operation instead? */
+			tmp = dsa_pointer_atomic_read(&hashtable->shared->head_secondary_chunk);
+			while (DsaPointerIsValid(tmp))
+			{
+				HashMemoryChunk chunk = (HashMemoryChunk)
+					dsa_get_address(hashtable->area, tmp);
+				dsa_pointer next = chunk->next.shared;
+
+				dsa_free(hashtable->area, tmp);
+				tmp = next;
+			}
+			dsa_pointer_atomic_write(&hashtable->shared->head_secondary_chunk,
+									 InvalidDsaPointer);
+
+			/* Reset the unmatched cursor. */
+			pg_atomic_write_u32(&hashtable->shared->next_unmatched_bucket,
+								0);
+		}
+		/* Wait again, so that all workers now have the new table. */
+		BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_PROMOTING);
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_LOADING);
+		ExecHashUpdate(hashtable);
+		return;
+	}
+
 	/*
 	 * Release all the hash buckets and tuples acquired in the prior pass, and
 	 * reinitialize the context for a new pass.
@@ -1220,15 +1992,15 @@ ExecHashTableReset(HashJoinTable hashtable)
 	oldcxt = MemoryContextSwitchTo(hashtable->batchCxt);
 
 	/* Reallocate and reinitialize the hash bucket headers. */
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	hashtable->buckets = (HashJoinBucketHead *)
+		palloc0(nbuckets * sizeof(HashJoinBucketHead));
 
 	hashtable->spaceUsed = 0;
 
 	MemoryContextSwitchTo(oldcxt);
 
 	/* Forget the chunks (the memory was freed by the context reset above). */
-	hashtable->chunks = NULL;
+	hashtable->primary_chunk = NULL;
 }
 
 /*
@@ -1241,10 +2013,14 @@ ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 	HashJoinTuple tuple;
 	int			i;
 
+	/* TODO: share parallel reset work!  coordinate! */
+
 	/* Reset all flags in the main table ... */
 	for (i = 0; i < hashtable->nbuckets; i++)
 	{
-		for (tuple = hashtable->buckets[i]; tuple != NULL; tuple = tuple->next)
+		for (tuple = first_tuple_in_bucket(hashtable, i);
+			 tuple != NULL;
+			 tuple = next_tuple_in_bucket(hashtable, tuple))
 			HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(tuple));
 	}
 
@@ -1252,9 +2028,10 @@ ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 	for (i = 0; i < hashtable->nSkewBuckets; i++)
 	{
 		int			j = hashtable->skewBucketNums[i];
-		HashSkewBucket *skewBucket = hashtable->skewBucket[j];
 
-		for (tuple = skewBucket->tuples; tuple != NULL; tuple = tuple->next)
+		for (tuple = first_tuple_in_skew_bucket(hashtable, j);
+			 tuple != NULL;
+			 tuple = next_tuple_in_bucket(hashtable, tuple))
 			HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(tuple));
 	}
 }
@@ -1414,11 +2191,11 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 				continue;
 
 			/* Okay, create a new skew bucket for this hashvalue. */
-			hashtable->skewBucket[bucket] = (HashSkewBucket *)
+			hashtable->skewBucket[bucket] = (HashSkewBucket *) /* TODO */
 				MemoryContextAlloc(hashtable->batchCxt,
 								   sizeof(HashSkewBucket));
 			hashtable->skewBucket[bucket]->hashvalue = hashvalue;
-			hashtable->skewBucket[bucket]->tuples = NULL;
+			hashtable->skewBucket[bucket]->tuples.private = NULL;
 			hashtable->skewBucketNums[hashtable->nSkewBuckets] = bucket;
 			hashtable->nSkewBuckets++;
 			hashtable->spaceUsed += SKEW_BUCKET_OVERHEAD;
@@ -1496,18 +2273,29 @@ ExecHashSkewTableInsert(HashJoinTable hashtable,
 	MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
 	HashJoinTuple hashTuple;
 	int			hashTupleSize;
+	dsa_pointer tuple_pointer;
 
 	/* Create the HashJoinTuple */
 	hashTupleSize = HJTUPLE_OVERHEAD + tuple->t_len;
-	hashTuple = (HashJoinTuple) MemoryContextAlloc(hashtable->batchCxt,
-												   hashTupleSize);
+	if (HashJoinTableIsShared(hashtable))
+	{
+		tuple_pointer = dsa_allocate(hashtable->area, hashTupleSize);
+		hashTuple = (HashJoinTuple) dsa_get_address(hashtable->area,
+													tuple_pointer);
+	}
+	else
+	{
+		tuple_pointer = InvalidDsaPointer;
+		hashTuple = (HashJoinTuple) MemoryContextAlloc(hashtable->batchCxt,
+													   hashTupleSize);
+	}
 	hashTuple->hashvalue = hashvalue;
 	memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
 	HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
 	/* Push it onto the front of the skew bucket's list */
-	hashTuple->next = hashtable->skewBucket[bucketNumber]->tuples;
-	hashtable->skewBucket[bucketNumber]->tuples = hashTuple;
+	insert_tuple_into_skew_bucket(hashtable, bucketNumber, hashTuple,
+								  tuple_pointer);
 
 	/* Account for space used, and back off if we've used too much */
 	hashtable->spaceUsed += hashTupleSize;
@@ -1538,6 +2326,9 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 	int			batchno;
 	HashJoinTuple hashTuple;
 
+	/* TODO: skew buckets not yet supported for parallel mode */
+	Assert(!HashJoinTableIsShared(hashtable));
+
 	/* Locate the bucket to remove */
 	bucketToRemove = hashtable->skewBucketNums[hashtable->nSkewBuckets - 1];
 	bucket = hashtable->skewBucket[bucketToRemove];
@@ -1552,10 +2343,10 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 	ExecHashGetBucketAndBatch(hashtable, hashvalue, &bucketno, &batchno);
 
 	/* Process all tuples in the bucket */
-	hashTuple = bucket->tuples;
+	hashTuple = first_tuple_in_skew_bucket(hashtable, bucketToRemove);
 	while (hashTuple != NULL)
 	{
-		HashJoinTuple nextHashTuple = hashTuple->next;
+		HashJoinTuple nextHashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		MinimalTuple tuple;
 		Size		tupleSize;
 
@@ -1581,8 +2372,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 			memcpy(copyTuple, hashTuple, tupleSize);
 			pfree(hashTuple);
 
-			copyTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = copyTuple;
+			insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+									 InvalidDsaPointer);
 
 			/* We have reduced skew space, but overall space doesn't change */
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -1591,8 +2382,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 		{
 			/* Put the tuple into a temp file for later batches */
 			Assert(batchno > hashtable->curbatch);
-			ExecHashJoinSaveTuple(tuple, hashvalue,
-								  &hashtable->innerBatchFile[batchno]);
+			ExecHashJoinSaveTuple(hashtable, tuple, hashvalue,
+								  batchno, true);
 			pfree(hashTuple);
 			hashtable->spaceUsed -= tupleSize;
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -1636,6 +2427,141 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 }
 
 /*
+ * Add to the primary or secondary tuple counter.
+ */
+static void
+add_tuple_count(HashJoinTable hashtable, int count, bool secondary)
+{
+	if (secondary)
+		pg_atomic_fetch_add_u64(&hashtable->shared->total_secondary_tuples,
+								count);
+	else
+	{
+		uint32 total =
+			pg_atomic_fetch_add_u64(&hashtable->shared->total_primary_tuples,
+									count);
+		/* Also update this backend's counter. */
+		hashtable->totalTuples = total + count;
+	}
+}
+
+/*
+ * Allocate 'size' bytes from the currently active shared HashMemoryChunk.
+ * This is essentially the same as the private memory version, but allocates
+ * from separate chunks for the secondary table and periodically updates the
+ * shared tuple counter.
+ */
+static void *
+dense_alloc_shared(HashJoinTable hashtable,
+				   Size size,
+				   dsa_pointer *shared,
+				   bool secondary)
+{
+	dsa_pointer chunk_shared;
+	HashMemoryChunk chunk;
+	char	   *ptr;
+
+	/* just in case the size is not already aligned properly */
+	size = MAXALIGN(size);
+
+	/*
+	 * If tuple size is larger than of 1/4 of chunk size, allocate a separate
+	 * chunk.
+	 */
+	if (size > HASH_CHUNK_THRESHOLD)
+	{
+		/* allocate new chunk */
+		chunk_shared =
+			dsa_allocate(hashtable->area,
+						 offsetof(HashMemoryChunkData, data) + size);
+		chunk = (HashMemoryChunk)
+			dsa_get_address(hashtable->area, chunk_shared);
+		*shared = chunk_shared + offsetof(HashMemoryChunkData, data);
+		chunk->maxlen = size;
+		chunk->used = size;
+		chunk->ntuples = 1;
+
+		/*
+		 * Push onto the appropriate chunk list, but don't make it the current
+		 * chunk because it hasn't got any more useful space in it.  The
+		 * current chunk may still have space, so keep that one current.
+		 */
+		ExecHashPushChunk(hashtable, chunk, chunk_shared,
+						  secondary ?
+						  &hashtable->shared->head_secondary_chunk :
+						  &hashtable->shared->head_primary_chunk);
+
+		/* Count these huge tuples immediately. */
+		add_tuple_count(hashtable, 1, secondary);
+		return chunk->data;
+	}
+
+	/*
+	 * See if we have enough space for it in the current chunk (if any). If
+	 * not, allocate a fresh chunk.
+	 */
+	chunk = secondary ? hashtable->secondary_chunk : hashtable->primary_chunk;
+	if (chunk == NULL || (chunk->maxlen - chunk->used) < size)
+	{
+		/*
+		 * Add the tuplecount for the outgoing chunk to the shared counter.
+		 * Doing this only every time we need to allocate a new chunk should
+		 * reduce contention on the shared counter.
+		 */
+		if (chunk != NULL)
+			add_tuple_count(hashtable, chunk->ntuples, secondary);
+
+		/*
+		 * Allocate new chunk and make it the current chunk for this backend
+		 * to allocate from.
+		 */
+		chunk_shared =
+			dsa_allocate(hashtable->area,
+						 offsetof(HashMemoryChunkData, data) +
+						 HASH_CHUNK_SIZE);
+		chunk = (HashMemoryChunk)
+			dsa_get_address(hashtable->area, chunk_shared);
+		*shared = chunk_shared + offsetof(HashMemoryChunkData, data);
+		if (secondary)
+		{
+			hashtable->secondary_chunk = chunk;
+			hashtable->secondary_chunk_shared = chunk_shared;
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->head_secondary_chunk);
+		}
+		else
+		{
+			hashtable->primary_chunk = chunk;
+			hashtable->primary_chunk_shared = chunk_shared;
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->head_primary_chunk);
+		}
+		chunk->maxlen = HASH_CHUNK_SIZE;
+		chunk->used = size;
+		chunk->ntuples = 1;
+
+		/*
+		 * The shared tuple counter will be updated when this chunk is
+		 * eventually full.  See above.
+		 */
+
+		return chunk->data;
+	}
+
+	/* There is enough space in the current chunk, let's add the tuple */
+	chunk_shared =
+		secondary ? hashtable->secondary_chunk_shared :
+		hashtable->primary_chunk_shared;
+	ptr = chunk->data + chunk->used;
+	*shared = chunk_shared + offsetof(HashMemoryChunkData, data) + chunk->used;
+	chunk->used += size;
+	chunk->ntuples += 1;
+
+	/* return pointer to the start of the tuple memory */
+	return ptr;
+}
+
+/*
  * Allocate 'size' bytes from the currently active HashMemoryChunk
  */
 static void *
@@ -1653,9 +2579,11 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 */
 	if (size > HASH_CHUNK_THRESHOLD)
 	{
+
 		/* allocate new chunk and put it at the beginning of the list */
-		newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt,
-								 offsetof(HashMemoryChunkData, data) + size);
+		newChunk = (HashMemoryChunk)
+			MemoryContextAlloc(hashtable->batchCxt,
+							   offsetof(HashMemoryChunkData, data) + size);
 		newChunk->maxlen = size;
 		newChunk->used = 0;
 		newChunk->ntuples = 0;
@@ -1664,15 +2592,15 @@ dense_alloc(HashJoinTable hashtable, Size size)
 		 * Add this chunk to the list after the first existing chunk, so that
 		 * we don't lose the remaining space in the "current" chunk.
 		 */
-		if (hashtable->chunks != NULL)
+		if (hashtable->primary_chunk != NULL)
 		{
-			newChunk->next = hashtable->chunks->next;
-			hashtable->chunks->next = newChunk;
+			newChunk->next.private = hashtable->primary_chunk->next.private;
+			hashtable->primary_chunk->next.private = newChunk;
 		}
 		else
 		{
-			newChunk->next = hashtable->chunks;
-			hashtable->chunks = newChunk;
+			newChunk->next.private = NULL;
+			hashtable->primary_chunk = newChunk;
 		}
 
 		newChunk->used += size;
@@ -1685,27 +2613,27 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 * See if we have enough space for it in the current chunk (if any). If
 	 * not, allocate a fresh chunk.
 	 */
-	if ((hashtable->chunks == NULL) ||
-		(hashtable->chunks->maxlen - hashtable->chunks->used) < size)
+	if ((hashtable->primary_chunk == NULL) ||
+		(hashtable->primary_chunk->maxlen - hashtable->primary_chunk->used) < size)
 	{
 		/* allocate new chunk and put it at the beginning of the list */
-		newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt,
-					  offsetof(HashMemoryChunkData, data) + HASH_CHUNK_SIZE);
-
+		newChunk = (HashMemoryChunk)
+			MemoryContextAlloc(hashtable->batchCxt,
+							   offsetof(HashMemoryChunkData, data) +
+							   HASH_CHUNK_SIZE);
+		newChunk->next.private = hashtable->primary_chunk;
+		hashtable->primary_chunk = newChunk;
 		newChunk->maxlen = HASH_CHUNK_SIZE;
 		newChunk->used = size;
 		newChunk->ntuples = 1;
 
-		newChunk->next = hashtable->chunks;
-		hashtable->chunks = newChunk;
-
 		return newChunk->data;
 	}
 
 	/* There is enough space in the current chunk, let's add the tuple */
-	ptr = hashtable->chunks->data + hashtable->chunks->used;
-	hashtable->chunks->used += size;
-	hashtable->chunks->ntuples += 1;
+	ptr = hashtable->primary_chunk->data + hashtable->primary_chunk->used;
+	hashtable->primary_chunk->used += size;
+	hashtable->primary_chunk->ntuples += 1;
 
 	/* return pointer to the start of the tuple memory */
 	return ptr;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 369e666..b8f90a6 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -21,8 +21,11 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/barrier.h"
 #include "utils/memutils.h"
 
+#include <unistd.h> /* TODO: remove */
 
 /*
  * States of the ExecHashJoin state machine
@@ -42,11 +45,13 @@
 static TupleTableSlot *ExecHashJoinOuterGetTuple(PlanState *outerNode,
 						  HashJoinState *hjstate,
 						  uint32 *hashvalue);
-static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
-						  BufFile *file,
+static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinTable hashtable,
 						  uint32 *hashvalue,
 						  TupleTableSlot *tupleSlot);
 static bool ExecHashJoinNewBatch(HashJoinState *hjstate);
+static void ExecHashJoinLoadBatch(HashJoinState *hjstate);
+static void ExecHashJoinExportBatches(HashJoinTable hashtable);
+static void ExecHashJoinPreloadNextBatch(HashJoinTable hashtable);
 
 
 /* ----------------------------------------------------------------
@@ -147,6 +152,14 @@ ExecHashJoin(HashJoinState *node)
 					/* no chance to not build the hash table */
 					node->hj_FirstOuterTupleSlot = NULL;
 				}
+				else if (hashNode->shared_table_data != NULL)
+				{
+					/*
+					 * TODO: The empty-outer optimization is not implemented
+					 * for shared hash tables yet.
+					 */
+					node->hj_FirstOuterTupleSlot = NULL;
+				}
 				else if (HJ_FILL_OUTER(node) ||
 						 (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
 						  !node->hj_OuterNotEmpty))
@@ -166,7 +179,7 @@ ExecHashJoin(HashJoinState *node)
 				/*
 				 * create the hash table
 				 */
-				hashtable = ExecHashTableCreate((Hash *) hashNode->ps.plan,
+				hashtable = ExecHashTableCreate(hashNode,
 												node->hj_HashOperators,
 												HJ_FILL_INNER(node));
 				node->hj_HashTable = hashtable;
@@ -177,12 +190,29 @@ ExecHashJoin(HashJoinState *node)
 				hashNode->hashtable = hashtable;
 				(void) MultiExecProcNode((PlanState *) hashNode);
 
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Assert(BarrierPhase(&hashtable->shared->barrier) >=
+						   PHJ_PHASE_HASHING);
+
+					/* Allow other backends to access batches we generated. */
+					ExecHashJoinExportBatches(hashtable);
+
+					/*
+					 * Check if we are a worker that attached too late to
+					 * avoid deadlock risk with the leader.
+					 */
+					if (ExecHashCheckForEarlyExit(hashtable))
+						return NULL;
+				}
+
 				/*
 				 * If the inner relation is completely empty, and we're not
 				 * doing a left outer join, we can quit without scanning the
 				 * outer relation.
 				 */
-				if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
+				if (!HashJoinTableIsShared(hashtable) && /* TODO:TM */
+					hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
 					return NULL;
 
 				/*
@@ -198,12 +228,73 @@ ExecHashJoin(HashJoinState *node)
 				 */
 				node->hj_OuterNotEmpty = false;
 
-				node->hj_JoinState = HJ_NEED_NEW_OUTER;
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Barrier *barrier = &hashtable->shared->barrier;
+					int phase = BarrierPhase(barrier);
+
+					/*
+					 * Map the current phase to the appropriate initial state
+					 * for this worker, so we can get started.
+					 */
+					Assert(BarrierPhase(barrier) >= PHJ_PHASE_PROBING);
+					hashtable->curbatch = PHJ_PHASE_TO_BATCHNO(phase);
+					switch (PHJ_PHASE_TO_SUBPHASE(phase))
+					{
+					case PHJ_SUBPHASE_PROMOTING:
+						/* Wait for serial phase to finish. */
+						BarrierWait(barrier, WAIT_EVENT_HASHJOIN_PROMOTING);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_LOADING);
+						/* fall through */
+					case PHJ_SUBPHASE_LOADING:
+						/* Help load the current batch. */
+						ExecHashUpdate(hashtable);
+						ExecHashJoinOpenBatch(hashtable, hashtable->curbatch,
+											  true);
+						ExecHashJoinLoadBatch(node);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_PROBING);
+						/* fall through */
+					case PHJ_SUBPHASE_PREPARING:
+						/* Wait for serial phase to finish. */
+						BarrierWait(barrier, WAIT_EVENT_HASHJOIN_PROMOTING);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_PROBING);
+						/* fall through */
+					case PHJ_SUBPHASE_PROBING:
+						/* Help probe the current batch. */
+						ExecHashUpdate(hashtable);
+						ExecHashJoinOpenBatch(hashtable, hashtable->curbatch,
+											  false);
+						node->hj_JoinState = HJ_NEED_NEW_OUTER;
+						break;
+					case PHJ_SUBPHASE_UNMATCHED:
+						/* Help scan for unmatched inner tuples. */
+						ExecHashUpdate(hashtable);
+						node->hj_JoinState = HJ_FILL_INNER_TUPLES;
+						break;
+					}
+					continue;
+				}
+				else
+				{
+					node->hj_JoinState = HJ_NEED_NEW_OUTER;
+					ExecHashJoinOpenBatch(hashtable, 0, false);
+				}
 
 				/* FALL THRU */
 
 			case HJ_NEED_NEW_OUTER:
 
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Assert(PHJ_PHASE_TO_BATCHNO(BarrierPhase(&hashtable->shared->barrier)) ==
+						   hashtable->curbatch);
+					Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+						   PHJ_SUBPHASE_PROBING);
+				}
+
 				/*
 				 * We don't have an outer tuple, try to get the next one
 				 */
@@ -213,6 +304,47 @@ ExecHashJoin(HashJoinState *node)
 				if (TupIsNull(outerTupleSlot))
 				{
 					/* end of batch, or maybe whole join */
+
+					/*
+					 * Switch to reading tuples from the next inner batch.  We
+					 * do this here because in the shared hash table case we
+					 * want to do this before ExecHashJoinPreloadNextBatch.
+					 */
+					if (hashtable->curbatch + 1 < hashtable->nbatch)
+						ExecHashJoinOpenBatch(hashtable,
+											  hashtable->curbatch + 1,
+											  true);
+
+					if (HashJoinTableIsShared(hashtable))
+					{
+						/* Allow other backends to access our batches. */
+						ExecHashJoinExportBatches(hashtable);
+						/*
+						 * Check if we are a leader that can't go further than
+						 * probing the first batch without deadlock risk,
+						 * because there are workers running.
+						 */
+						if (ExecHashCheckForEarlyExit(hashtable))
+							return NULL;
+
+						/*
+						 * We may be able to load some amount of the next
+						 * batch into spare work_mem, before we start waiting
+						 * for other workers to finish probing the current
+						 * batch.
+						 */
+						ExecHashJoinPreloadNextBatch(hashtable);
+
+						/*
+						 * You can't start searching for unmatched tuples
+						 * until all workers have finished probing, so we
+						 * synchronize here.
+						 */
+						BarrierWait(&hashtable->shared->barrier,
+									WAIT_EVENT_HASHJOIN_PROBING);
+						Assert(BarrierPhase(&hashtable->shared->barrier) ==
+							   PHJ_PHASE_UNMATCHED_BATCH(hashtable->curbatch));
+					}
 					if (HJ_FILL_INNER(node))
 					{
 						/* set up to scan for unmatched inner tuples */
@@ -250,9 +382,9 @@ ExecHashJoin(HashJoinState *node)
 					 * Save it in the corresponding outer-batch file.
 					 */
 					Assert(batchno > hashtable->curbatch);
-					ExecHashJoinSaveTuple(ExecFetchSlotMinimalTuple(outerTupleSlot),
-										  hashvalue,
-										&hashtable->outerBatchFile[batchno]);
+					ExecHashJoinSaveTuple(hashtable,
+										  ExecFetchSlotMinimalTuple(outerTupleSlot),
+										  hashvalue, batchno, false);
 					/* Loop around, staying in HJ_NEED_NEW_OUTER state */
 					continue;
 				}
@@ -296,6 +428,13 @@ ExecHashJoin(HashJoinState *node)
 				if (joinqual == NIL || ExecQual(joinqual, econtext, false))
 				{
 					node->hj_MatchedOuter = true;
+					/*
+					 * Note: it is OK to do this in a shared hash table
+					 * without any kind of memory synchronization, because the
+					 * only transition is 0->1, so ordering doesn't matter if
+					 * several backends do it, and there will be a memory
+					 * barrier before anyone reads it.
+					 */
 					HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
 
 					/* In an antijoin, we never return a matched tuple */
@@ -632,6 +771,29 @@ ExecEndHashJoin(HashJoinState *node)
 }
 
 /*
+ * For shared hash joins, load as much of the next batch as we can as part of
+ * the probing phase for the current batch.  This overlapping means that we do
+ * something useful before we start waiting for other workers.
+ */
+static void
+ExecHashJoinPreloadNextBatch(HashJoinTable hashtable)
+{
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Barrier *barrier PG_USED_FOR_ASSERTS_ONLY = &hashtable->shared->barrier;
+		int curbatch = hashtable->curbatch;
+		int next_batch = curbatch + 1;
+
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING_BATCH(curbatch));
+
+		if (next_batch < hashtable->nbatch)
+		{
+			/* TODO: Load into secondary hash table while memory is free! */
+		}
+	}
+}
+
+/*
  * ExecHashJoinOuterGetTuple
  *
  *		get the next outer tuple for hashjoin: either by
@@ -702,8 +864,7 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 		if (file == NULL)
 			return NULL;
 
-		slot = ExecHashJoinGetSavedTuple(hjstate,
-										 file,
+		slot = ExecHashJoinGetSavedTuple(hashtable,
 										 hashvalue,
 										 hjstate->hj_OuterTupleSlot);
 		if (!TupIsNull(slot))
@@ -726,13 +887,14 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	int			nbatch;
 	int			curbatch;
-	BufFile    *innerFile;
-	TupleTableSlot *slot;
-	uint32		hashvalue;
 
 	nbatch = hashtable->nbatch;
 	curbatch = hashtable->curbatch;
 
+	if (HashJoinTableIsShared(hashtable))
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_UNMATCHED_BATCH(curbatch));
+
 	if (curbatch > 0)
 	{
 		/*
@@ -776,7 +938,8 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	 * need to be reassigned.
 	 */
 	curbatch++;
-	while (curbatch < nbatch &&
+	while (!HashJoinTableIsShared(hashtable) &&
+		   curbatch < nbatch &&
 		   (hashtable->outerBatchFile[curbatch] == NULL ||
 			hashtable->innerBatchFile[curbatch] == NULL))
 	{
@@ -792,7 +955,6 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (hashtable->outerBatchFile[curbatch] &&
 			nbatch != hashtable->nbatch_outstart)
 			break;				/* must process due to rule 3 */
-		/* We can ignore this batch. */
 		/* Release associated temp files right away. */
 		if (hashtable->innerBatchFile[curbatch])
 			BufFileClose(hashtable->innerBatchFile[curbatch]);
@@ -812,48 +974,175 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	 * Reload the hash table with the new inner batch (which could be empty)
 	 */
 	ExecHashTableReset(hashtable);
+	ExecHashJoinLoadBatch(hjstate);
 
-	innerFile = hashtable->innerBatchFile[curbatch];
+	return true;
+}
+
+static void
+ExecHashJoinLoadBatch(HashJoinState *hjstate)
+{
+	HashJoinTable hashtable = hjstate->hj_HashTable;
+	int			curbatch = hashtable->curbatch;
+	TupleTableSlot *slot;
+	uint32		hashvalue;
 
-	if (innerFile != NULL)
+	if (HashJoinTableIsShared(hashtable))
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_LOADING);
+
+	/*
+	 * In HJ_NEED_NEW_OUTER, we already selected the current inner batch for
+	 * reading from.  If there is a shared hash table, we may have already
+	 * partially loaded the hash table in ExecHashJoinPreloadNextBatch.
+	 */
+	Assert(hashtable->batch_reader.batchno = curbatch);
+	Assert(hashtable->batch_reader.inner);
+
+	for (;;)
 	{
-		if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-				   errmsg("could not rewind hash-join temporary file: %m")));
+		slot = ExecHashJoinGetSavedTuple(hashtable,
+										 &hashvalue,
+										 hjstate->hj_HashTupleSlot);
 
-		while ((slot = ExecHashJoinGetSavedTuple(hjstate,
-												 innerFile,
-												 &hashvalue,
-												 hjstate->hj_HashTupleSlot)))
-		{
-			/*
-			 * NOTE: some tuples may be sent to future batches.  Also, it is
-			 * possible for hashtable->nbatch to be increased here!
-			 */
-			ExecHashTableInsert(hashtable, slot, hashvalue);
-		}
+		if (slot == NULL)
+			break;
 
 		/*
-		 * after we build the hash table, the inner batch file is no longer
-		 * needed
+		 * NOTE: some tuples may be sent to future batches.  Also, it is
+		 * possible for hashtable->nbatch to be increased here!
 		 */
-		BufFileClose(innerFile);
-		hashtable->innerBatchFile[curbatch] = NULL;
+		ExecHashTableInsert(hashtable, slot, hashvalue, false);
 	}
 
 	/*
-	 * Rewind outer batch file (if present), so that we can start reading it.
+	 * Now that we have finished loading this batch into the hash table, we
+	 * can set our outer batch read head to the start of the current batch,
+	 * and our inner batch read head to the start of the NEXT batch (as
+	 * expected by ExecHashJoinPreloadNextBatch).
 	 */
-	if (hashtable->outerBatchFile[curbatch] != NULL)
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Wait until all workers have finished loading their portion of the
+		 * hash table.
+		 */
+		if (BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASHJOIN_LOADING))
+		{
+			/* Serial phase: prepare to read this outer and next inner batch */
+			ExecHashJoinRewindBatches(hashtable, hashtable->curbatch);
+		}
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_PREPARING_BATCH(hashtable->curbatch));
+		BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASHJOIN_PREPARING);
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
+	}
+	else
+		ExecHashJoinRewindBatches(hashtable, hashtable->curbatch);
+
+	/*
+	 * The inner batch file is no longer needed by any participant, because
+	 * the hash table has been fully reloaded.
+	 */
+	ExecHashJoinCloseBatch(hashtable, hashtable->curbatch, true);
+
+	/* Prepare to read from the current outer batch. */
+	ExecHashJoinOpenBatch(hashtable, hashtable->curbatch, false);
+}
+
+/*
+ * Export a BufFile, copy the descriptor to DSA memory and return the
+ * dsa_pointer.
+ */
+static dsa_pointer
+make_batch_descriptor(dsa_area *area, BufFile *file)
+{
+	dsa_pointer pointer;
+	BufFileDescriptor *source;
+	BufFileDescriptor *target;
+	size_t size;
+
+	source = BufFileExport(file);
+	size = BufFileDescriptorSize(source);
+	pointer = dsa_allocate(area, size);
+	if (!DsaPointerIsValid(pointer))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed on dsa_allocate of size %zu.", size)));
+	target = dsa_get_address(area, pointer);
+	memcpy(target, source, size);
+	pfree(source);
+
+	return pointer;
+}
+
+/*
+ * Publish a batch descriptor for a future batch so that other participants
+ * can import it and read it.  If 'descriptor' is InvalidDsaPointer, then
+ * forget the published descriptor so that it will be reexported later.
+ */
+static void
+set_batch_descriptor(HashJoinTable hashtable, int batchno, bool inner,
+					 dsa_pointer descriptor)
+{
+	HashJoinParticipantState *participant;
+	dsa_pointer *level1;
+	dsa_pointer *level2;
+	int rank;
+	int index;
+
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+	rank = fls(batchno);
+	index = batchno % (1 << (rank - 1));
+	level1 = inner ? participant->inner_batch_descriptors
+				   : participant->outer_batch_descriptors;
+	if (level1[rank] == InvalidDsaPointer)
 	{
-		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
+		size_t size = sizeof(dsa_pointer) * (1 << rank);
+
+		level1[rank] = dsa_allocate(hashtable->area, size);
+		if (level1[rank] == InvalidDsaPointer)
 			ereport(ERROR,
-					(errcode_for_file_access(),
-				   errmsg("could not rewind hash-join temporary file: %m")));
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory"),
+					 errdetail("Failed on dsa_allocate of size %zu.", size)));
+		level2 = dsa_get_address(hashtable->area, level1[rank]);
+		memset(level2, 0, size);
 	}
+	level2 = dsa_get_address(hashtable->area, level1[rank]);
+	if (level2[index] != InvalidDsaPointer)
+		dsa_free(hashtable->area, level2[index]);
+	level2[index] = descriptor;
+}
 
-	return true;
+/*
+ * Get a batch descriptor published by a given participant, if there is one.
+ */
+static BufFileDescriptor *
+get_batch_descriptor(HashJoinTable hashtable, int participant_number,
+					 int batchno, bool inner)
+{
+	HashJoinParticipantState *participant;
+	dsa_pointer *level1;
+	dsa_pointer *level2;
+	int rank;
+	int index;
+
+	participant = &hashtable->shared->participants[participant_number];
+	rank = fls(batchno);
+	index = batchno % (1 << (rank - 1));
+	level1 = inner ? participant->inner_batch_descriptors
+				   : participant->outer_batch_descriptors;
+	if (level1[rank] == InvalidDsaPointer)
+		return NULL;
+	level2 = dsa_get_address(hashtable->area, level1[rank]);
+	if (level2[index] == InvalidDsaPointer)
+		return NULL;
+
+	return (BufFileDescriptor *)
+		dsa_get_address(hashtable->area, level2[index]);
 }
 
 /*
@@ -868,17 +1157,40 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
  * will get messed up.
  */
 void
-ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr)
+ExecHashJoinSaveTuple(HashJoinTable hashtable,
+					  MinimalTuple tuple, uint32 hashvalue,
+					  int batchno,
+					  bool inner)
 {
-	BufFile    *file = *fileptr;
+	BufFile    *file;
 	size_t		written;
 
+	if (inner)
+		file = hashtable->innerBatchFile[batchno];
+	else
+		file = hashtable->outerBatchFile[batchno];
 	if (file == NULL)
 	{
 		/* First write to this batch file, so open it. */
 		file = BufFileCreateTemp(false);
-		*fileptr = file;
+		if (inner)
+			hashtable->innerBatchFile[batchno] = file;
+		else
+			hashtable->outerBatchFile[batchno] = file;
+	}
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* This batch needs to be re-exported, if it was already exported. */
+		/*
+		 * TODO: This is far too expensive: need a bitmap?  or maybe just
+		 * export every batch when it's the next one to be processed,
+		 * regardless of whether we've written anything to it (the point being
+		 * that the list of files backing a BufFile can change when you write
+		 * to it)?  If we do that then we still need to export ALL before
+		 * exiting early.
+		 */
+		set_batch_descriptor(hashtable, batchno, inner, InvalidDsaPointer);
 	}
 
 	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
@@ -895,54 +1207,337 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 }
 
 /*
+ * Export unexported future batches created by this participant, so that other
+ * participants can read from them after they have finished reading their own.
+ */
+static void
+ExecHashJoinExportBatches(HashJoinTable hashtable)
+{
+	int i;
+
+	/* Find this participant's HashJoinParticipantState object. */
+	Assert(HashJoinParticipantNumber() < hashtable->shared->planned_participants);
+
+	/* Export future batches and copy their descriptors into DSA memory. */
+	for (i = hashtable->curbatch + 1; i < hashtable->nbatch; ++i)
+	{
+		if (hashtable->innerBatchFile[i] != NULL &&
+			get_batch_descriptor(hashtable, HashJoinParticipantNumber(), i, true) == InvalidDsaPointer)
+			set_batch_descriptor(hashtable, i, true,
+				make_batch_descriptor(hashtable->area, hashtable->innerBatchFile[i]));
+		if (hashtable->outerBatchFile[i] != NULL &&
+			get_batch_descriptor(hashtable, HashJoinParticipantNumber(), i, false) == InvalidDsaPointer)
+			set_batch_descriptor(hashtable, i, false,
+				make_batch_descriptor(hashtable->area, hashtable->outerBatchFile[i]));
+	}
+}
+
+/*
+ * Select the batch file that ExecHashJoinGetSavedTuple will read from.
+ */
+void
+ExecHashJoinOpenBatch(HashJoinTable hashtable, int batchno, bool inner)
+{
+	HashJoinBatchReader *batch_reader = &hashtable->batch_reader;
+
+	if (batchno == 0)
+		batch_reader->file = NULL;
+	else
+		batch_reader->file = inner
+			? hashtable->innerBatchFile[batchno]
+			: hashtable->outerBatchFile[batchno];
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		HashJoinParticipantState *participant;
+
+		participant =
+			&hashtable->shared->participants[HashJoinParticipantNumber()];
+		batch_reader->shared = inner
+			? &participant->inner_batch_reader
+			: &participant->outer_batch_reader;
+		/* We will seek to the shared position at next read. */
+		batch_reader->head.fileno = -1;
+		batch_reader->head.offset = -1;
+	}
+	else
+	{
+		batch_reader->shared = NULL;
+		/* Seek to start of batch now, if there is one. */
+		if (batch_reader->file != NULL)
+			BufFileSeek(batch_reader->file, 0, 0, SEEK_SET);
+	}
+
+	batch_reader->participant_number = HashJoinParticipantNumber();
+	batch_reader->batchno = batchno;
+	batch_reader->inner = inner;
+
+}
+
+/*
+ * Close a batch, once it is not needed by any participant.  This causes batch
+ * files created by this participant to be deleted.
+ */
+void
+ExecHashJoinCloseBatch(HashJoinTable hashtable, int batchno, bool inner)
+{
+	HashJoinParticipantState *participant;
+	HashJoinBatchReader *batch_reader;
+	BufFile *file;
+
+	/*
+	 * We only need to close the batch owned by THIS participant.  That causes
+	 * it to be deleted.  Batches opened in this backend but created by other
+	 * participants are closed by ExecHashJoinGetSavedTuple when it reaches
+	 * the end of the file, allowing them to be closed sooner.
+	 */
+	batch_reader = &hashtable->batch_reader;
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+	if (inner)
+	{
+		file = hashtable->innerBatchFile[batchno];
+		hashtable->innerBatchFile[batchno] = NULL;
+	}
+	else
+	{
+		file = hashtable->outerBatchFile[batchno];
+		hashtable->outerBatchFile[batchno] = NULL;
+	}
+	if (file == NULL)
+		return;
+
+	Assert(batch_reader->file == NULL || file == batch_reader->file);
+	BufFileClose(file);
+	batch_reader->file = NULL;
+}
+
+/*
+ * Rewind batch readers.  The outer batch reader is rewound to the start of
+ * batchno.  The inner batch reader is rewound to the start of batchno + 1, in
+ * anticipation of preloading the next batch.
+ */
+void
+ExecHashJoinRewindBatches(HashJoinTable hashtable, int batchno)
+{
+	HashJoinBatchReader *batch_reader;
+	int i;
+
+	batch_reader = &hashtable->batch_reader;
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(&hashtable->shared->barrier) == PHJ_PHASE_CREATING ||
+			   (PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+				PHJ_SUBPHASE_PREPARING &&
+				PHJ_PHASE_TO_BATCHNO(BarrierPhase(&hashtable->shared->barrier)) ==
+				batchno));
+
+		/*
+		 * Position the shared read heads for each participant's batch.
+		 * Readers will seek their BufFile as required to synchronize.
+		 */
+		for (i = 0; i < hashtable->shared->planned_participants; ++i)
+		{
+			HashJoinSharedBatchReader *reader;
+
+			reader = &hashtable->shared->participants[i].outer_batch_reader;
+			reader->batchno = batchno; /* for probing this batch */
+			reader->head.fileno = 0;
+			reader->head.offset = 0;
+
+			reader = &hashtable->shared->participants[i].inner_batch_reader;
+			reader->batchno = batchno + 1; /* for preloading the next batch */
+			reader->head.fileno = 0;
+			reader->head.offset = 0;
+		}
+	}
+}
+
+/*
  * ExecHashJoinGetSavedTuple
- *		read the next tuple from a batch file.  Return NULL if no more.
+ *		read the next tuple from the batch selected with
+ *		ExecHashJoinOpenBatch, including the batch files of
+ *		other participants if the hash table is shared.  Return NULL if no
+ *		more.
  *
  * On success, *hashvalue is set to the tuple's hash value, and the tuple
  * itself is stored in the given slot.
  */
 static TupleTableSlot *
-ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
-						  BufFile *file,
+ExecHashJoinGetSavedTuple(HashJoinTable hashtable,
 						  uint32 *hashvalue,
 						  TupleTableSlot *tupleSlot)
 {
-	uint32		header[2];
-	size_t		nread;
-	MinimalTuple tuple;
+	TupleTableSlot *result = NULL;
+	HashJoinBatchReader *batch_reader = &hashtable->batch_reader;
+	BufFileDescriptor *descriptor;
 
-	/*
-	 * Since both the hash value and the MinimalTuple length word are uint32,
-	 * we can read them both in one BufFileRead() call without any type
-	 * cheating.
-	 */
-	nread = BufFileRead(file, (void *) header, sizeof(header));
-	if (nread == 0)				/* end of file */
+	for (;;)
 	{
-		ExecClearTuple(tupleSlot);
-		return NULL;
-	}
-	if (nread != sizeof(header))
-		ereport(ERROR,
-				(errcode_for_file_access(),
+		uint32		header[2];
+		size_t		nread;
+		MinimalTuple tuple;
+		bool		can_close = false;
+
+		if (batch_reader->file == NULL)
+		{
+			/*
+			 * No file found for the current participant.  Try stealing tuples
+			 * from the next participant.
+			 */
+			goto next_participant;
+		}
+
+		if (HashJoinTableIsShared(hashtable))
+		{
+			LWLockAcquire(&batch_reader->shared->lock, LW_EXCLUSIVE);
+			Assert(batch_reader->shared->batchno == batch_reader->batchno);
+			if (batch_reader->shared->error)
+			{
+				/* Don't try to read if reading failed in some other backend. */
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read from hash-join temporary file")));
+			}
+
+			/* Set the shared error flag, which we'll clear if we succeed. */
+			batch_reader->shared->error = true;
+
+			/*
+			 * If another worker has moved the shared read head since we last read,
+			 * we'll need to seek to the new shared position.
+			 */
+			if (batch_reader->head.fileno != batch_reader->shared->head.fileno ||
+				batch_reader->head.offset != batch_reader->shared->head.offset)
+			{
+				BufFileSeek(batch_reader->file,
+							batch_reader->shared->head.fileno,
+							batch_reader->shared->head.offset,
+							SEEK_SET);
+				batch_reader->head = batch_reader->shared->head;
+			}
+		}
+
+		/* Try to read the size and hash. */
+		nread = BufFileRead(batch_reader->file, (void *) header, sizeof(header));
+		if (nread > 0)
+		{
+			if (nread != sizeof(header))
+			{
+				ereport(ERROR,
+					(errcode_for_file_access(),
 				 errmsg("could not read from hash-join temporary file: %m")));
-	*hashvalue = header[0];
-	tuple = (MinimalTuple) palloc(header[1]);
-	tuple->t_len = header[1];
-	nread = BufFileRead(file,
-						(void *) ((char *) tuple + sizeof(uint32)),
-						header[1] - sizeof(uint32));
-	if (nread != header[1] - sizeof(uint32))
-		ereport(ERROR,
-				(errcode_for_file_access(),
+			}
+			*hashvalue = header[0];
+			tuple = (MinimalTuple) palloc(header[1]);
+			tuple->t_len = header[1];
+			nread = BufFileRead(batch_reader->file,
+								(void *) ((char *) tuple + sizeof(uint32)),
+								header[1] - sizeof(uint32));
+			if (nread != header[1] - sizeof(uint32))
+			{
+				ereport(ERROR,
+						(errcode_for_file_access(),
 				 errmsg("could not read from hash-join temporary file: %m")));
-	return ExecStoreMinimalTuple(tuple, tupleSlot, true);
-}
+			}
+
+			result = ExecStoreMinimalTuple(tuple, tupleSlot, true);
+
+		}
+
+		if (HashJoinTableIsShared(hashtable))
+		{
+			if (nread == 0 &&
+				batch_reader->participant_number !=
+				HashJoinParticipantNumber())
+			{
+				/*
+				 * We've reached the end of another paticipant's batch file,
+				 * so close it now.  We'll deal with closing THIS
+				 * participant's batch file later, because we don't want the
+				 * files to be deleted just yet.
+				 */
+				can_close = true;
+			}
+			/* Commit new head position to shared memory and clear error. */
+			BufFileTell(batch_reader->file,
+						&batch_reader->head.fileno,
+						&batch_reader->head.offset);
+			batch_reader->shared->head = batch_reader->head;
+			batch_reader->shared->error = false;
+			LWLockRelease(&batch_reader->shared->lock);
+		}
+
+		if (can_close)
+		{
+			BufFileClose(batch_reader->file);
+			batch_reader->file = NULL;
+		}
+
+		if (result != NULL)
+			return result;
+
+next_participant:
+		if (!HashJoinTableIsShared(hashtable))
+		{
+			/* Private hash table, end of batch. */
+			ExecClearTuple(tupleSlot);
+			return NULL;
+		}
+
+		/* Try the next participant's batch file. */
+		batch_reader->participant_number =
+			(batch_reader->participant_number + 1) %
+				hashtable->shared->planned_participants;
+		if (batch_reader->participant_number == HashJoinParticipantNumber())
+		{
+			/*
+			 * We've made it all the way back to the file we started with,
+			 * which is the one that this backend wrote.  So there are no more
+			 * tuples to be had in any participant's batch file.
+			 */
+			ExecClearTuple(tupleSlot);
+			return NULL;
+		}
 
+		/* Import the BufFile from that participant, if it exported one. */
+		descriptor = get_batch_descriptor(hashtable,
+										  batch_reader->participant_number,
+										  batch_reader->batchno,
+										  batch_reader->inner);
+		if (descriptor == NULL)
+			batch_reader->file = NULL;
+		else
+		{
+			HashJoinParticipantState *participant;
+
+			batch_reader->file = BufFileImport(descriptor);
+			participant =
+				&hashtable->shared->participants[batch_reader->participant_number];
+			if (batch_reader->inner)
+				batch_reader->shared = &participant->inner_batch_reader;
+			else
+				batch_reader->shared = &participant->outer_batch_reader;
+			batch_reader->head.fileno = batch_reader->head.offset = -1;
+		}
+	}
+}
 
 void
 ExecReScanHashJoin(HashJoinState *node)
 {
+	HashState *hashNode = (HashState *) innerPlanState(node);
+
+	/* We can't use HashJoinTableIsShared if the table is NULL. */
+	if (hashNode->shared_table_data != NULL)
+	{
+		elog(ERROR, "TODO: shared ExecReScanHashJoin not implemented");
+
+		/* Coordinate a rewind to the shared hash table creation phase. */
+		BarrierWaitSet(&hashNode->shared_table_data->barrier, PHJ_PHASE_INIT,
+					   WAIT_EVENT_HASHJOIN_REWINDING);
+	}
+
 	/*
 	 * In a multi-batch join, we currently have to do rescans the hard way,
 	 * primarily because batch temp files may have already been released. But
@@ -977,6 +1572,14 @@ ExecReScanHashJoin(HashJoinState *node)
 
 			/* ExecHashJoin can skip the BUILD_HASHTABLE step */
 			node->hj_JoinState = HJ_NEED_NEW_OUTER;
+
+			if (HashJoinTableIsShared(node->hj_HashTable))
+			{
+				/* Coordinate a rewind to the shared probing phase. */
+				BarrierWaitSet(&hashNode->shared_table_data->barrier,
+							   PHJ_PHASE_PROBING,
+							   WAIT_EVENT_HASHJOIN_REWINDING2);
+			}
 		}
 		else
 		{
@@ -985,6 +1588,14 @@ ExecReScanHashJoin(HashJoinState *node)
 			node->hj_HashTable = NULL;
 			node->hj_JoinState = HJ_BUILD_HASHTABLE;
 
+			if (HashJoinTableIsShared(node->hj_HashTable))
+			{
+				/* Coordinate a rewind to the shared hash table creation phase. */
+				BarrierWaitSet(&hashNode->shared_table_data->barrier,
+							   PHJ_PHASE_INIT,
+							   WAIT_EVENT_HASHJOIN_REWINDING3);
+			}
+
 			/*
 			 * if chgParam of subnode is not null then plan will be re-scanned
 			 * by first ExecProcNode.
@@ -1011,3 +1622,110 @@ ExecReScanHashJoin(HashJoinState *node)
 	if (node->js.ps.lefttree->chgParam == NULL)
 		ExecReScan(node->js.ps.lefttree);
 }
+
+void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt)
+{
+	size_t size;
+
+	size = offsetof(SharedHashJoinTableData, participants) +
+		sizeof(HashJoinParticipantState) * (pcxt->nworkers + 1);
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+static void
+configure_reader_locks(HashJoinParticipantState *participants, int count)
+{
+	int i;
+
+	static LWLockTranche inner_tranche;
+	static LWLockTranche outer_tranche;
+
+	inner_tranche.name = "Hash Join/inner batch";
+	inner_tranche.array_base =
+		(char *) &participants[0].inner_batch_reader.lock;
+	inner_tranche.array_stride = sizeof(HashJoinParticipantState);
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN_INNER_BATCH_READER,
+						  &inner_tranche);
+
+	outer_tranche.name = "Hash Join/outer batch";
+	outer_tranche.array_base =
+		(char *) &participants[0].outer_batch_reader.lock;
+	outer_tranche.array_stride = sizeof(HashJoinParticipantState);
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN_OUTER_BATCH_READER,
+						  &outer_tranche);
+
+	for (i = 0; i < count; ++i)
+	{
+		LWLockInitialize(&participants[i].inner_batch_reader.lock,
+						 LWTRANCHE_PARALLEL_HASH_JOIN_INNER_BATCH_READER);
+		LWLockInitialize(&participants[i].outer_batch_reader.lock,
+						 LWTRANCHE_PARALLEL_HASH_JOIN_OUTER_BATCH_READER);
+	}
+}
+
+void
+ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
+{
+	HashState *hashNode;
+	SharedHashJoinTable shared;
+	size_t size;
+	int planned_participants;
+
+	/*
+	 * Set up the state needed to coordinate access to the shared hash table,
+	 * using the plan node ID as the toc key.
+	 */
+	planned_participants = pcxt->nworkers + 1;	/* possible workers + leader */
+	size = offsetof(SharedHashJoinTableData, participants) +
+		sizeof(HashJoinParticipantState) * planned_participants;
+	shared = shm_toc_allocate(pcxt->toc, size);
+	BarrierInit(&shared->barrier, 0);
+	shared->primary_buckets = InvalidDsaPointer;
+	shared->secondary_buckets = InvalidDsaPointer;
+	pg_atomic_init_u32(&shared->next_unmatched_bucket, 0);
+	pg_atomic_init_u64(&shared->total_primary_tuples, 0);
+	pg_atomic_init_u64(&shared->total_secondary_tuples, 0);
+	dsa_pointer_atomic_init(&shared->head_primary_chunk, InvalidDsaPointer);
+	dsa_pointer_atomic_init(&shared->head_secondary_chunk, InvalidDsaPointer);
+	dsa_pointer_atomic_init(&shared->chunks_to_rebucket, InvalidDsaPointer);
+	shared->planned_participants = planned_participants;
+	shm_toc_insert(pcxt->toc, state->js.ps.plan->plan_node_id, shared);
+	configure_reader_locks(shared->participants, planned_participants);
+
+	/*
+	 * Pass the SharedHashJoinTable to the hash node.  If the Gather node
+	 * running in the leader backend decides to execute the hash join, it
+	 * hasn't called ExecHashJoinInitializeWorker so it doesn't have
+	 * state->shared_table_data set up.  So we must do it here.
+	 */
+	hashNode = (HashState *) innerPlanState(state);
+	hashNode->shared_table_data = shared;
+}
+
+void
+ExecHashJoinInitializeWorker(HashJoinState *state, shm_toc *toc)
+{
+	HashState  *hashNode;
+
+	state->hj_sharedHashJoinTable =
+		shm_toc_lookup(toc, state->js.ps.plan->plan_node_id);
+
+	/*
+	 * Inject SharedHashJoinTable into the hash node.  It could instead have
+	 * its own ExecHashInitializeWorker function, but we only want to set its
+	 * 'parallel_aware' flag if we want to tell it to actually build the hash
+	 * table in parallel.  Since its parallel_aware flag also controls whether
+	 * its 'InitializeWorker' function gets called, and it also needs access
+	 * to this object for serial shared hash mode, we'll pass it on here
+	 * instead of depending on that.
+	 */
+	hashNode = (HashState *) innerPlanState(state);
+	hashNode->shared_table_data = state->hj_sharedHashJoinTable;
+	Assert(hashNode->shared_table_data != NULL);
+
+	Assert(HashJoinParticipantNumber() <
+		   hashNode->shared_table_data->planned_participants);
+
+	configure_reader_locks(hashNode->shared_table_data->participants, 0);
+}
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 00bf3a5..361eb5d 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,6 +31,8 @@
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
 
+#include <unistd.h>
+
 static void InitScanRelation(SeqScanState *node, EState *estate, int eflags);
 static TupleTableSlot *SeqNext(SeqScanState *node);
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ae86954..ca215dd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1993,6 +1993,7 @@ _outHashPath(StringInfo str, const HashPath *node)
 
 	WRITE_NODE_FIELD(path_hashclauses);
 	WRITE_INT_FIELD(num_batches);
+	WRITE_ENUM_FIELD(table_type, HashPathTableType);
 }
 
 static void
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2a49639..79c7650 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -104,6 +104,7 @@
 double		seq_page_cost = DEFAULT_SEQ_PAGE_COST;
 double		random_page_cost = DEFAULT_RANDOM_PAGE_COST;
 double		cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
+double		cpu_shared_tuple_cost = DEFAULT_CPU_SHARED_TUPLE_COST;
 double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double		parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
@@ -2694,7 +2695,8 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 					  List *hashclauses,
 					  Path *outer_path, Path *inner_path,
 					  SpecialJoinInfo *sjinfo,
-					  SemiAntiJoinFactors *semifactors)
+					  SemiAntiJoinFactors *semifactors,
+					  HashPathTableType table_type)
 {
 	Cost		startup_cost = 0;
 	Cost		run_cost = 0;
@@ -2725,6 +2727,26 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;
 
 	/*
+	 * If this is a shared hash table, there is a extra charge for inserting
+	 * each tuple into the shared hash table, to cover the overhead of memory
+	 * synchronization that makes the hash table slightly slower to build than
+	 * a private hash table.  There is no extra charge for probing the hash
+	 * table for outer path row, on the basis that read-only access to the
+	 * hash table shouldn't generate any extra memory synchronization.
+	 *
+	 * TODO: Really what we want is some guess at the number of cache sync
+	 * overhead generated by inserting into cachelines that have been
+	 * invalidated by someone else inserting into a bucket in the same
+	 * cacheline.  Not sure if it's better to introduce a
+	 * cpu_cacheline_sync_cost (or _miss_cost?) and then here estimate the
+	 * number of collisions we expect based by num buckets, cacheline size,
+	 * num workers.  But that might be too detailed/low level/variable
+	 * heavy/bogus.
+	 */
+	if (table_type != HASHPATH_TABLE_PRIVATE)
+		startup_cost += cpu_shared_tuple_cost * inner_path_rows;
+
+	/*
 	 * Get hash table size that executor would use for inner relation.
 	 *
 	 * XXX for the moment, always assume that skew optimization will be
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index cc7384f..87c4cef 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -483,7 +483,8 @@ try_hashjoin_path(PlannerInfo *root,
 				  Path *inner_path,
 				  List *hashclauses,
 				  JoinType jointype,
-				  JoinPathExtraData *extra)
+				  JoinPathExtraData *extra,
+				  HashPathTableType table_type)
 {
 	Relids		required_outer;
 	JoinCostWorkspace workspace;
@@ -508,7 +509,7 @@ try_hashjoin_path(PlannerInfo *root,
 	 */
 	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
 						  outer_path, inner_path,
-						  extra->sjinfo, &extra->semifactors);
+						  extra->sjinfo, &extra->semifactors, table_type);
 
 	if (add_path_precheck(joinrel,
 						  workspace.startup_cost, workspace.total_cost,
@@ -525,7 +526,8 @@ try_hashjoin_path(PlannerInfo *root,
 									  inner_path,
 									  extra->restrictlist,
 									  required_outer,
-									  hashclauses));
+									  hashclauses,
+									  table_type));
 	}
 	else
 	{
@@ -546,7 +548,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 						  Path *inner_path,
 						  List *hashclauses,
 						  JoinType jointype,
-						  JoinPathExtraData *extra)
+						  JoinPathExtraData *extra,
+						  HashPathTableType table_type)
 {
 	JoinCostWorkspace workspace;
 
@@ -571,7 +574,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 	 */
 	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
 						  outer_path, inner_path,
-						  extra->sjinfo, &extra->semifactors);
+						  extra->sjinfo, &extra->semifactors,
+						  table_type);
 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
 		return;
 
@@ -587,7 +591,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 										  inner_path,
 										  extra->restrictlist,
 										  NULL,
-										  hashclauses));
+										  hashclauses,
+										  table_type));
 }
 
 /*
@@ -1356,7 +1361,8 @@ hash_inner_and_outer(PlannerInfo *root,
 							  cheapest_total_inner,
 							  hashclauses,
 							  jointype,
-							  extra);
+							  extra,
+							  HASHPATH_TABLE_PRIVATE);
 			/* no possibility of cheap startup here */
 		}
 		else if (jointype == JOIN_UNIQUE_INNER)
@@ -1372,7 +1378,8 @@ hash_inner_and_outer(PlannerInfo *root,
 							  cheapest_total_inner,
 							  hashclauses,
 							  jointype,
-							  extra);
+							  extra,
+							  HASHPATH_TABLE_PRIVATE);
 			if (cheapest_startup_outer != NULL &&
 				cheapest_startup_outer != cheapest_total_outer)
 				try_hashjoin_path(root,
@@ -1381,7 +1388,8 @@ hash_inner_and_outer(PlannerInfo *root,
 								  cheapest_total_inner,
 								  hashclauses,
 								  jointype,
-								  extra);
+								  extra,
+								  HASHPATH_TABLE_PRIVATE);
 		}
 		else
 		{
@@ -1402,7 +1410,8 @@ hash_inner_and_outer(PlannerInfo *root,
 								  cheapest_total_inner,
 								  hashclauses,
 								  jointype,
-								  extra);
+								  extra,
+								  HASHPATH_TABLE_PRIVATE);
 
 			foreach(lc1, outerrel->cheapest_parameterized_paths)
 			{
@@ -1436,7 +1445,8 @@ hash_inner_and_outer(PlannerInfo *root,
 									  innerpath,
 									  hashclauses,
 									  jointype,
-									  extra);
+									  extra,
+									  HASHPATH_TABLE_PRIVATE);
 				}
 			}
 		}
@@ -1445,23 +1455,32 @@ hash_inner_and_outer(PlannerInfo *root,
 		 * If the joinrel is parallel-safe, we may be able to consider a
 		 * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
 		 * because the outer path will be partial, and therefore we won't be
-		 * able to properly guarantee uniqueness.  Similarly, we can't handle
-		 * JOIN_FULL and JOIN_RIGHT, because they can produce false null
-		 * extended rows.  Also, the resulting path must not be parameterized.
+		 * able to properly guarantee uniqueness.  Also, the resulting path
+		 * must not be parameterized.
 		 */
 		if (joinrel->consider_parallel &&
 			jointype != JOIN_UNIQUE_OUTER &&
-			jointype != JOIN_FULL &&
-			jointype != JOIN_RIGHT &&
 			outerrel->partial_pathlist != NIL &&
 			bms_is_empty(joinrel->lateral_relids))
 		{
 			Path	   *cheapest_partial_outer;
+			Path	   *cheapest_partial_inner = NULL;
 			Path	   *cheapest_safe_inner = NULL;
 
 			cheapest_partial_outer =
 				(Path *) linitial(outerrel->partial_pathlist);
 
+			/* Can we use a partial inner plan too? */
+			if (innerrel->partial_pathlist != NIL)
+				cheapest_partial_inner =
+					(Path *) linitial(innerrel->partial_pathlist);
+			if (cheapest_partial_inner != NULL)
+				try_partial_hashjoin_path(root, joinrel,
+										  cheapest_partial_outer,
+										  cheapest_partial_inner,
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_SHARED_PARALLEL);
+
 			/*
 			 * Normally, given that the joinrel is parallel-safe, the cheapest
 			 * total inner path will also be parallel-safe, but if not, we'll
@@ -1488,10 +1507,20 @@ hash_inner_and_outer(PlannerInfo *root,
 			}
 
 			if (cheapest_safe_inner != NULL)
+			{
+				/* Try a shared table with only one worker building the table. */
 				try_partial_hashjoin_path(root, joinrel,
 										  cheapest_partial_outer,
 										  cheapest_safe_inner,
-										  hashclauses, jointype, extra);
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_SHARED_SERIAL);
+				/* Also private hash tables, built by each worker. */
+				try_partial_hashjoin_path(root, joinrel,
+										  cheapest_partial_outer,
+										  cheapest_safe_inner,
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_PRIVATE);
+			}
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ad49674..4954c4c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3938,6 +3938,23 @@ create_hashjoin_plan(PlannerInfo *root,
 	copy_plan_costsize(&hash_plan->plan, inner_plan);
 	hash_plan->plan.startup_cost = hash_plan->plan.total_cost;
 
+	/*
+	 * Set the table as sharable if appropriate, with parallel or serial
+	 * building.
+	 */
+	switch (best_path->table_type)
+	{
+	case HASHPATH_TABLE_SHARED_PARALLEL:
+		hash_plan->shared_table = true;
+		hash_plan->plan.parallel_aware = true;
+		break;
+	case HASHPATH_TABLE_SHARED_SERIAL:
+		hash_plan->shared_table = true;
+		break;
+	case HASHPATH_TABLE_PRIVATE:
+		break;
+	}
+
 	join_plan = make_hashjoin(tlist,
 							  joinclauses,
 							  otherclauses,
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index abb7507..68cabe6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2096,6 +2096,7 @@ create_mergejoin_path(PlannerInfo *root,
  * 'required_outer' is the set of required outer rels
  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
  *		(this should be a subset of the restrict_clauses list)
+ * 'table_type' for level of hash table sharing
  */
 HashPath *
 create_hashjoin_path(PlannerInfo *root,
@@ -2108,7 +2109,8 @@ create_hashjoin_path(PlannerInfo *root,
 					 Path *inner_path,
 					 List *restrict_clauses,
 					 Relids required_outer,
-					 List *hashclauses)
+					 List *hashclauses,
+					 HashPathTableType table_type)
 {
 	HashPath   *pathnode = makeNode(HashPath);
 
@@ -2123,9 +2125,13 @@ create_hashjoin_path(PlannerInfo *root,
 								  sjinfo,
 								  required_outer,
 								  &restrict_clauses);
-	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_aware =
+		joinrel->consider_parallel &&
+		(table_type == HASHPATH_TABLE_SHARED_SERIAL ||
+		 table_type == HASHPATH_TABLE_SHARED_PARALLEL);
 	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
 		outer_path->parallel_safe && inner_path->parallel_safe;
+	pathnode->table_type = table_type;
 	/* This is a foolish way to estimate parallel_workers, but for now... */
 	pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a392197..c1e8819 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3393,6 +3393,51 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_HASH_CREATING:
+			event_name = "Hash/Creating";
+			break;
+		case WAIT_EVENT_HASH_HASHING:
+			event_name = "Hash/Hashing";
+			break;
+		case WAIT_EVENT_HASH_RESIZING:
+			event_name = "Hash/Resizing";
+			break;
+		case WAIT_EVENT_HASH_REBUCKETING:
+			event_name = "Hash/Rebucketing";
+			break;
+		case WAIT_EVENT_HASH_INIT:
+			event_name = "Hash/Init";
+			break;
+		case WAIT_EVENT_HASH_DESTROY:
+			event_name = "Hash/Destroy";
+			break;
+		case WAIT_EVENT_HASH_UNMATCHED:
+			event_name = "Hash/Unmatched";
+			break;
+		case WAIT_EVENT_HASH_PROMOTING:
+			event_name = "Hash/Promoting";
+			break;
+		case WAIT_EVENT_HASHJOIN_PROMOTING:
+			event_name = "HashJoin/Promoting";
+			break;
+		case WAIT_EVENT_HASHJOIN_PREPARING:
+			event_name = "HashJoin/Preparing";
+			break;
+		case WAIT_EVENT_HASHJOIN_PROBING:
+			event_name = "HashJoin/Probing";
+			break;
+		case WAIT_EVENT_HASHJOIN_LOADING:
+			event_name = "HashJoin/Loading";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING:
+			event_name = "HashJoin/Rewinding";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING2:
+			event_name = "HashJoin/Rewinding2";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING3:
+			event_name = "HashJoin/Rewinding3";;
+			break;
 		/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 042be79..0fc8404 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -42,6 +42,8 @@
 #include "storage/buf_internals.h"
 #include "utils/resowner.h"
 
+extern int ParallelWorkerNumber;
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large temporary BufFiles to be spread across
@@ -89,6 +91,24 @@ struct BufFile
 	char		buffer[BLCKSZ];
 };
 
+/*
+ * Serialized representation of a single file managed by a BufFile.
+ */
+typedef struct BufFileFileDescriptor
+{
+	char path[MAXPGPATH];
+} BufFileFileDescriptor;
+
+/*
+ * Serialized representation of a BufFile, to be created by BufFileExport and
+ * consumed by BufFileImport.
+ */
+struct BufFileDescriptor
+{
+	size_t num_files;
+	BufFileFileDescriptor files[FLEXIBLE_ARRAY_MEMBER];
+};
+
 static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
@@ -178,6 +198,77 @@ BufFileCreateTemp(bool interXact)
 	return file;
 }
 
+/*
+ * Export a BufFile description in a serialized form so that another backend
+ * can attach to it and read from it.  The format is opaque, but it may be
+ * bitwise copied, and its size may be obtained with BufFileDescriptorSize().
+ */
+BufFileDescriptor *
+BufFileExport(BufFile *file)
+{
+	BufFileDescriptor *descriptor;
+	int i;
+
+	/* Flush output from local buffers. */
+	BufFileFlush(file);
+
+	/* Create and fill in a descriptor. */
+	descriptor = palloc0(offsetof(BufFileDescriptor, files) +
+						 sizeof(BufFileFileDescriptor) * file->numFiles);
+	descriptor->num_files = file->numFiles;
+	for (i = 0; i < descriptor->num_files; ++i)
+		strcpy(descriptor->files[i].path, FilePathName(file->files[i]));
+
+	return descriptor;
+}
+
+/*
+ * Return the size in bytes of a BufFileDescriptor, so that it can be copied.
+ */
+size_t
+BufFileDescriptorSize(const BufFileDescriptor *descriptor)
+{
+	return offsetof(BufFileDescriptor, files) +
+		sizeof(BufFileFileDescriptor) * descriptor->num_files;
+}
+
+/*
+ * Open a BufFile that was created by another backend and then exported.  The
+ * file must be read-only in all backends, and is still owned by the backend
+ * that created it.  This provides a way for cooperating backends to share
+ * immutable temporary data such as hash join batches.
+ */
+BufFile *
+BufFileImport(BufFileDescriptor *descriptor)
+{
+	BufFile    *file = (BufFile *) palloc(sizeof(BufFile));
+	int i;
+
+	file->numFiles = descriptor->num_files;
+	file->files = (File *) palloc0(sizeof(File) * descriptor->num_files);
+	file->offsets = (off_t *) palloc0(sizeof(off_t) * descriptor->num_files);
+	file->isTemp = false;
+	file->isInterXact = true; /* prevent cleanup by this backend */
+	file->dirty = false;
+	file->resowner = CurrentResourceOwner;
+	file->curFile = 0;
+	file->curOffset = 0L;
+	file->pos = 0;
+	file->nbytes = 0;
+
+	for (i = 0; i < descriptor->num_files; ++i)
+	{
+		file->files[i] =
+			PathNameOpenFile(descriptor->files[i].path,
+							 O_RDONLY | PG_BINARY, 0600);
+		if (file->files[i] <= 0)
+			elog(ERROR, "failed to import \"%s\": %m",
+				 descriptor->files[i].path);
+	}
+
+	return file;
+}
+
 #ifdef NOT_USED
 /*
  * Create a BufFile and attach it to an already-opened virtual File.
diff --git a/src/backend/storage/ipc/barrier.c b/src/backend/storage/ipc/barrier.c
index 8b83c1d..5a45103 100644
--- a/src/backend/storage/ipc/barrier.c
+++ b/src/backend/storage/ipc/barrier.c
@@ -16,6 +16,7 @@
 
 #include "storage/barrier.h"
 
+
 /*
  * Initialize this barrier, setting a static number of participants that we
  * will wait for at each computation phase.  To use a dynamic number of
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2d3cf9e..6c79733 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -788,7 +788,6 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 				raw_wait_event = UINT32_ACCESS_ONCE(proc->wait_event_info);
 				wait_event_type = pgstat_get_wait_event_type(raw_wait_event);
 				wait_event = pgstat_get_wait_event(raw_wait_event);
-
 			}
 			else
 			{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 65660c1..9b49918 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2857,6 +2857,16 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 	{
+		{"cpu_shared_tuple_cost", PGC_USERSET, QUERY_TUNING_COST,
+			gettext_noop("Sets the planner's estimate of the cost of "
+						 "sharing each tuple with other parallel workers."),
+			NULL
+		},
+		&cpu_shared_tuple_cost,
+		DEFAULT_CPU_TUPLE_COST, -DBL_MAX, DBL_MAX,
+		NULL, NULL, NULL
+	},
+	{
 		{"cpu_index_tuple_cost", PGC_USERSET, QUERY_TUNING_COST,
 			gettext_noop("Sets the planner's estimate of the cost of "
 						 "processing each index entry during an index scan."),
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 6d0e12b..1bbf376 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -15,7 +15,13 @@
 #define HASHJOIN_H
 
 #include "nodes/execnodes.h"
+#include "port/atomics.h"
+#include "storage/barrier.h"
 #include "storage/buffile.h"
+#include "storage/dsa.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/spin.h"
 
 /* ----------------------------------------------------------------
  *				hash-join hash table structures
@@ -63,7 +69,12 @@
 
 typedef struct HashJoinTupleData
 {
-	struct HashJoinTupleData *next;		/* link to next tuple in same bucket */
+	/* link to next tuple in same bucket */
+	union
+	{
+		dsa_pointer shared;
+		struct HashJoinTupleData *private;
+	} next;
 	uint32		hashvalue;		/* tuple's hash code */
 	/* Tuple data, in MinimalTuple format, follows on a MAXALIGN boundary */
 }	HashJoinTupleData;
@@ -94,7 +105,12 @@ typedef struct HashJoinTupleData
 typedef struct HashSkewBucket
 {
 	uint32		hashvalue;		/* common hash value */
-	HashJoinTuple tuples;		/* linked list of inner-relation tuples */
+	/* linked list of inner-relation tuples */
+	union
+	{
+		dsa_pointer shared;
+		HashJoinTuple private;
+	} tuples;
 } HashSkewBucket;
 
 #define SKEW_BUCKET_OVERHEAD  MAXALIGN(sizeof(HashSkewBucket))
@@ -103,8 +119,9 @@ typedef struct HashSkewBucket
 #define SKEW_MIN_OUTER_FRACTION  0.01
 
 /*
- * To reduce palloc overhead, the HashJoinTuples for the current batch are
- * packed in 32kB buffers instead of pallocing each tuple individually.
+ * To reduce palloc/dsa_allocate overhead, the HashJoinTuples for the current
+ * batch are packed in 32kB buffers instead of pallocing each tuple
+ * individually.
  */
 typedef struct HashMemoryChunkData
 {
@@ -112,17 +129,120 @@ typedef struct HashMemoryChunkData
 	size_t		maxlen;			/* size of the buffer holding the tuples */
 	size_t		used;			/* number of buffer bytes already used */
 
-	struct HashMemoryChunkData *next;	/* pointer to the next chunk (linked
-										 * list) */
+	/* pointer to the next chunk (linked  list) */
+	union
+	{
+		dsa_pointer shared;
+		struct HashMemoryChunkData *private;
+	} next;
 
 	char		data[FLEXIBLE_ARRAY_MEMBER];	/* buffer allocated at the end */
 }	HashMemoryChunkData;
 
 typedef struct HashMemoryChunkData *HashMemoryChunk;
 
+
+
 #define HASH_CHUNK_SIZE			(32 * 1024L)
 #define HASH_CHUNK_THRESHOLD	(HASH_CHUNK_SIZE / 4)
 
+/*
+ * Read head position in a shared batch file.
+ */
+typedef struct HashJoinBatchPosition
+{
+	int fileno;
+	off_t offset;
+} HashJoinBatchPosition;
+
+/*
+ * The state exposed in shared memory for each participant to coordinate
+ * reading of batch files that it wrote.
+ */
+typedef struct HashJoinSharedBatchReader
+{
+	int batchno;				/* the batch number we are currently reading */
+
+	LWLock lock;				/* protects access to the members below */
+	bool error;					/* has an IO error occurred? */
+	HashJoinBatchPosition head;	/* shared read head for current batch */
+} HashJoinSharedBatchReader;
+
+/*
+ * The state exposed in shared memory by each participant allowing its batch
+ * files to be read by other participants.
+ */
+typedef struct HashJoinParticipantState
+{
+	/*
+	 * Arrays of pointers to arrays of pointers to BufFileDesciptor objects
+	 * exported by this participant.  The descriptor for batch i is in slot
+	 * i % (1 << fls(i - 1)) of the array at index fls(i).
+	 *
+	 * This arrangement means that we can modify future batches without
+	 * moving/reallocating the current batch.  The current batch is therefore
+	 * immutable and accessible by other backends which need to read it.
+	 */
+	dsa_pointer inner_batch_descriptors[32];	/* number of bits in batchno */
+	dsa_pointer outer_batch_descriptors[32];
+
+	/*
+	 * The shared state used to coordinate reading from the current batch.  We
+	 * need separate objects for the outer and inner side, because in the
+	 * probing phase some participants can be reading from the outer batch,
+	 * while others can be reading from the inner side to preload the next
+	 * batch.
+	 */
+	HashJoinSharedBatchReader inner_batch_reader;
+	HashJoinSharedBatchReader outer_batch_reader;
+} HashJoinParticipantState;
+
+/*
+ * The state used by each backend to manage reading from batch files written
+ * by all participants.
+ */
+typedef struct HashJoinBatchReader
+{
+	int participant_number;				/* read which participant's batch? */
+	int batchno;						/* which batch are we reading? */
+	bool inner;							/* inner or outer? */
+	HashJoinSharedBatchReader *shared;	/* holder of the shared read head */
+	BufFile *file;						/* the file opened in this backend */
+	HashJoinBatchPosition head;			/* local read head position */
+} HashJoinBatchReader;
+
+/*
+ * State for a shared hash join table.  Each backend participating in a hash
+ * join with a shared hash table also has a HashJoinTableData object in
+ * backend-private memory, which points to this shared state in the DSM
+ * segment.
+ */
+typedef struct SharedHashJoinTableData
+{
+	Barrier barrier;				/* for synchronizing workers */
+	dsa_pointer primary_buckets;	/* primary hash table */
+	dsa_pointer secondary_buckets;	/* hash table for preloading next batch */
+	bool at_least_one_worker;		/* did at least one worker join in time? */
+	int nbuckets;
+	int nbuckets_optimal;
+	pg_atomic_uint32 next_unmatched_bucket;
+	pg_atomic_uint64 total_primary_tuples;
+	pg_atomic_uint64 total_secondary_tuples;
+	dsa_pointer_atomic head_primary_chunk;
+	dsa_pointer_atomic head_secondary_chunk;
+	dsa_pointer_atomic chunks_to_rebucket;
+	int planned_participants;		/* number of planned workers + leader */
+
+	/* state exposed by each participant for sharing batches */
+	HashJoinParticipantState participants[FLEXIBLE_ARRAY_MEMBER];
+} SharedHashJoinTableData;
+
+typedef union HashJoinBucketHead
+{
+	dsa_pointer_atomic shared;
+	HashJoinTuple private;
+} HashJoinBucketHead;
+
 typedef struct HashJoinTableData
 {
 	int			nbuckets;		/* # buckets in the in-memory hash table */
@@ -134,9 +254,11 @@ typedef struct HashJoinTableData
 	int			log2_nbuckets_optimal;	/* log2(nbuckets_optimal) */
 
 	/* buckets[i] is head of list of tuples in i'th in-memory bucket */
-	struct HashJoinTupleData **buckets;
+	HashJoinBucketHead *buckets;
 	/* buckets array is per-batch storage, as are all the tuples */
 
+	HashJoinBucketHead *next_buckets;	/* for preloading next batch */
+
 	bool		keepNulls;		/* true to store unmatchable NULL tuples */
 
 	bool		skewEnabled;	/* are we using skew optimization? */
@@ -185,7 +307,73 @@ typedef struct HashJoinTableData
 	MemoryContext batchCxt;		/* context for this-batch-only storage */
 
 	/* used for dense allocation of tuples (into linked chunks) */
-	HashMemoryChunk chunks;		/* one list for the whole batch */
+	HashMemoryChunk primary_chunk;		/* current chunk for this batch */
+	HashMemoryChunk secondary_chunk;	/* current chunk for next batch */
+	HashMemoryChunk chunks_to_rebucket;	/* after resizing table */
+	dsa_pointer primary_chunk_shared;	/* DSA pointer to primary_chunk */
+	dsa_pointer secondary_chunk_shared;	/* DSA pointer to secondary_chunk */
+
+	/* State for coordinating shared tables for parallel hash joins. */
+	dsa_area *area;
+	SharedHashJoinTableData *shared;	/* the shared state */
+	int attached_at_phase;				/* the phase this participant joined */
+	bool detached_early;				/* did we decide to detach early? */
+	HashJoinBatchReader batch_reader;	/* state for reading batches in */
 }	HashJoinTableData;
 
+/* Check if a HashJoinTable is shared by parallel workers. */
+#define HashJoinTableIsShared(table) ((table)->shared != NULL)
+
+/* The phases of parallel hash computation. */
+#define PHJ_PHASE_INIT					0
+#define PHJ_PHASE_CREATING				1
+#define PHJ_PHASE_HASHING				2
+#define PHJ_PHASE_RESIZING  			3
+#define PHJ_PHASE_REBUCKETING 			4
+#define PHJ_PHASE_PROBING				5	/* PHJ_PHASE_PROBING_BATCH(0) */
+#define PHJ_PHASE_UNMATCHED				6	/* PHJ_PHASE_UNMATCHED_BATCH(0) */
+
+/* The subphases for batches. */
+#define PHJ_SUBPHASE_PROMOTING			0
+#define PHJ_SUBPHASE_LOADING			1
+#define PHJ_SUBPHASE_PREPARING			2
+#define PHJ_SUBPHASE_PROBING			3
+#define PHJ_SUBPHASE_UNMATCHED			4
+
+/* The phases of parallel processing for batch(n). */
+#define PHJ_PHASE_PROMOTING_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 5 - 4)
+#define PHJ_PHASE_LOADING_BATCH(n)		(PHJ_PHASE_UNMATCHED + (n) * 5 - 3)
+#define PHJ_PHASE_PREPARING_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 5 - 2)
+#define PHJ_PHASE_PROBING_BATCH(n)		(PHJ_PHASE_UNMATCHED + (n) * 5 - 1)
+#define PHJ_PHASE_UNMATCHED_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 5 - 0)
+
+/* Phase number -> sub-phase within a batch. */
+#define PHJ_PHASE_TO_SUBPHASE(p)										\
+	(((int)(p) - PHJ_PHASE_UNMATCHED + PHJ_SUBPHASE_UNMATCHED) % 5)
+
+/* Phase number -> batch number. */
+#define PHJ_PHASE_TO_BATCHNO(p)											\
+	(((int)(p) - PHJ_PHASE_UNMATCHED + PHJ_SUBPHASE_UNMATCHED) / 5)
+
+/*
+ * Is a given phase one in which a new hash table array is being assigned by
+ * one elected backend?  That includes initial creation, reallocation during
+ * resize, and promotion of secondary hash table to primary.  Workers that
+ * show up and attach at an arbitrary time must wait such phases out before
+ * doing anything with the hash table.
+ */
+#define PHJ_PHASE_MUTATING_TABLE(p)									\
+	((p) == PHJ_PHASE_CREATING ||									\
+	 (p) == PHJ_PHASE_RESIZING ||									\
+	 (PHJ_PHASE_TO_BATCHNO(p) > 0 &&								\
+	  PHJ_PHASE_TO_SUBPHASE(p) == PHJ_SUBPHASE_PROMOTING))
+
+/*
+ * Return the 'participant number' for a process participating in a parallel
+ * hash join.  We give a number < hashtable->shared->planned_participants
+ * to each potential participant, including the leader.
+ */
+#define HashJoinParticipantNumber() \
+	(IsParallelWorker() ? ParallelWorkerNumber + 1 : 0)
+
 #endif   /* HASHJOIN_H */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index 8cf6d15..b1e80f3 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -22,12 +22,12 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(Hash *node, List *hashOperators,
+extern HashJoinTable ExecHashTableCreate(HashState *node, List *hashOperators,
 					bool keepNulls);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
 extern void ExecHashTableInsert(HashJoinTable hashtable,
 					TupleTableSlot *slot,
-					uint32 hashvalue);
+					uint32 hashvalue, bool secondary);
 extern bool ExecHashGetHashValue(HashJoinTable hashtable,
 					 ExprContext *econtext,
 					 List *hashkeys,
@@ -49,5 +49,7 @@ extern void ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 						int *numbatches,
 						int *num_skew_mcvs);
 extern int	ExecHashGetSkewBucket(HashJoinTable hashtable, uint32 hashvalue);
+extern void ExecHashUpdate(HashJoinTable hashtable);
+extern bool ExecHashCheckForEarlyExit(HashJoinTable hashtable);
 
 #endif   /* NODEHASH_H */
diff --git a/src/include/executor/nodeHashjoin.h b/src/include/executor/nodeHashjoin.h
index f24127a..d123e7e 100644
--- a/src/include/executor/nodeHashjoin.h
+++ b/src/include/executor/nodeHashjoin.h
@@ -14,15 +14,27 @@
 #ifndef NODEHASHJOIN_H
 #define NODEHASHJOIN_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "storage/buffile.h"
+#include "storage/shm_toc.h"
 
 extern HashJoinState *ExecInitHashJoin(HashJoin *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecHashJoin(HashJoinState *node);
 extern void ExecEndHashJoin(HashJoinState *node);
 extern void ExecReScanHashJoin(HashJoinState *node);
 
-extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr);
+extern void ExecHashJoinSaveTuple(HashJoinTable hashtable,
+					  MinimalTuple tuple, uint32 hashvalue,
+					  int batchno, bool inner);
+extern void ExecHashJoinRewindBatches(HashJoinTable hashtable, int batchno);
+extern void ExecHashJoinOpenBatch(HashJoinTable hashtable,
+					  int batchno, bool inner);
+extern void ExecHashJoinCloseBatch(HashJoinTable hashtable,
+					  int batchno, bool inner);
+
+extern void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt);
+extern void ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt);
+extern void ExecHashJoinInitializeWorker(HashJoinState *state, shm_toc *toc);
 
 #endif   /* NODEHASHJOIN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2fadf76..9ae55be 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1738,6 +1738,7 @@ typedef struct MergeJoinState
 /* these structs are defined in executor/hashjoin.h: */
 typedef struct HashJoinTupleData *HashJoinTuple;
 typedef struct HashJoinTableData *HashJoinTable;
+typedef struct SharedHashJoinTableData *SharedHashJoinTable;
 
 typedef struct HashJoinState
 {
@@ -1759,6 +1760,7 @@ typedef struct HashJoinState
 	int			hj_JoinState;
 	bool		hj_MatchedOuter;
 	bool		hj_OuterNotEmpty;
+	SharedHashJoinTable hj_sharedHashJoinTable;
 } HashJoinState;
 
 
@@ -1982,6 +1984,9 @@ typedef struct HashState
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
 	List	   *hashkeys;		/* list of ExprState nodes */
 	/* hashkeys is same as parent's hj_InnerHashKeys */
+
+	/* The following are the same as the parent's. */
+	SharedHashJoinTable shared_table_data;
 } HashState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index e2fbc7d..e8f90d9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -782,6 +782,7 @@ typedef struct Hash
 	bool		skewInherit;	/* is outer join rel an inheritance tree? */
 	Oid			skewColType;	/* datatype of the outer key column */
 	int32		skewColTypmod;	/* typmod of the outer key column */
+	bool		shared_table;	/* table shared by multiple workers? */
 	/* all other info is in the parent HashJoin node */
 } Hash;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3a1255a..8b06551 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1258,6 +1258,16 @@ typedef struct MergePath
 	bool		materialize_inner;		/* add Materialize to inner? */
 } MergePath;
 
+typedef enum
+{
+	/* Every worker builds its own private copy of the hash table. */
+	HASHPATH_TABLE_PRIVATE,
+	/* One worker builds a shared hash table, and all workers probe it. */
+	HASHPATH_TABLE_SHARED_SERIAL,
+	/* All workers build a shared hash table, and then probe it. */
+	HASHPATH_TABLE_SHARED_PARALLEL
+} HashPathTableType;
+
 /*
  * A hashjoin path has these fields.
  *
@@ -1272,6 +1282,7 @@ typedef struct HashPath
 	JoinPath	jpath;
 	List	   *path_hashclauses;		/* join clauses used for hashing */
 	int			num_batches;	/* number of batches expected */
+	HashPathTableType table_type;		/* level of sharedness */
 } HashPath;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 2a4df2f..7bb0d1d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -24,6 +24,7 @@
 #define DEFAULT_SEQ_PAGE_COST  1.0
 #define DEFAULT_RANDOM_PAGE_COST  4.0
 #define DEFAULT_CPU_TUPLE_COST	0.01
+#define DEFAULT_CPU_SHARED_TUPLE_COST 0.0
 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005
 #define DEFAULT_CPU_OPERATOR_COST  0.0025
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
@@ -48,6 +49,7 @@ typedef enum
 extern PGDLLIMPORT double seq_page_cost;
 extern PGDLLIMPORT double random_page_cost;
 extern PGDLLIMPORT double cpu_tuple_cost;
+extern PGDLLIMPORT double cpu_shared_tuple_cost;
 extern PGDLLIMPORT double cpu_index_tuple_cost;
 extern PGDLLIMPORT double cpu_operator_cost;
 extern PGDLLIMPORT double parallel_tuple_cost;
@@ -144,7 +146,8 @@ extern void initial_cost_hashjoin(PlannerInfo *root,
 					  List *hashclauses,
 					  Path *outer_path, Path *inner_path,
 					  SpecialJoinInfo *sjinfo,
-					  SemiAntiJoinFactors *semifactors);
+					  SemiAntiJoinFactors *semifactors,
+					  HashPathTableType table_type);
 extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 					JoinCostWorkspace *workspace,
 					SpecialJoinInfo *sjinfo,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 71d9154..5f4ca87 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -134,7 +134,8 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
 					 Path *inner_path,
 					 List *restrict_clauses,
 					 Relids required_outer,
-					 List *hashclauses);
+					 List *hashclauses,
+					 HashPathTableType table_type);
 
 extern ProjectionPath *create_projection_path(PlannerInfo *root,
 					   RelOptInfo *rel,
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0b85b7a..0157d52 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -785,7 +785,22 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_HASH_CREATING,
+	WAIT_EVENT_HASH_HASHING,
+	WAIT_EVENT_HASH_RESIZING,
+	WAIT_EVENT_HASH_REBUCKETING,
+	WAIT_EVENT_HASH_INIT,
+	WAIT_EVENT_HASH_DESTROY,
+	WAIT_EVENT_HASH_UNMATCHED,
+	WAIT_EVENT_HASH_PROMOTING,
+	WAIT_EVENT_HASHJOIN_PROMOTING,
+	WAIT_EVENT_HASHJOIN_PROBING,
+	WAIT_EVENT_HASHJOIN_LOADING,
+	WAIT_EVENT_HASHJOIN_PREPARING,
+	WAIT_EVENT_HASHJOIN_REWINDING,
+	WAIT_EVENT_HASHJOIN_REWINDING2, /* TODO: rename me */
+	WAIT_EVENT_HASHJOIN_REWINDING3 /* TODO: rename me */
 } WaitEventIPC;
 
 /* ----------
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 809e596..044262d 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -30,12 +30,17 @@
 
 typedef struct BufFile BufFile;
 
+typedef struct BufFileDescriptor BufFileDescriptor;
+
 /*
  * prototypes for functions in buffile.c
  */
 
 extern BufFile *BufFileCreateTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
+extern BufFileDescriptor *BufFileExport(BufFile *file);
+extern BufFile *BufFileImport(BufFileDescriptor *descriptor);
+extern size_t BufFileDescriptorSize(const BufFileDescriptor *descriptor);
 extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern size_t BufFileWrite(BufFile *file, void *ptr, size_t size);
 extern int	BufFileSeek(BufFile *file, int fileno, off_t offset, int whence);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 951e421..7af6e04 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -236,6 +236,8 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_EXEC_AREA,
+	LWTRANCHE_PARALLEL_HASH_JOIN_INNER_BATCH_READER,
+	LWTRANCHE_PARALLEL_HASH_JOIN_OUTER_BATCH_READER,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Thomas Munro (#3)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Nov 3, 2016 at 4:19 PM, Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

Obviously I'm actively working on developing and stabilising all this.
Some of the things I'm working on are: work_mem accounting, batch
increases, rescans and figuring out if the resource management for
those BufFiles is going to work. There are quite a lot of edge cases
some of which I'm still figuring out, but I feel like this approach is
workable. At this stage I want to share what I'm doing to see if
others have feedback, ideas, blood curdling screams of horror, etc. I
will have better patches and a set of test queries soon. Thanks for
reading.

This patch doesn't receive any review. Patch is not applying properly to
HEAD.
Moved to next CF with "waiting on author" status.

Regards,
Hari Babu
Fujitsu Australia

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Haribabu Kommi (#4)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Dec 3, 2016 at 1:38 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Moved to next CF with "waiting on author" status.

Unfortunately it's been a bit trickier than I anticipated to get the
interprocess batch file sharing and hash table shrinking working
correctly and I don't yet have a new patch in good enough shape to
post in time for the January CF. More soon.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@heroku.com

about 9 years ago

In reply to: Thomas Munro (#5)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Dec 31, 2016 at 2:52 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Unfortunately it's been a bit trickier than I anticipated to get the
interprocess batch file sharing and hash table shrinking working
correctly and I don't yet have a new patch in good enough shape to
post in time for the January CF. More soon.

I noticed a bug in your latest revision:

+   /*
+    * In HJ_NEED_NEW_OUTER, we already selected the current inner batch for
+    * reading from.  If there is a shared hash table, we may have already
+    * partially loaded the hash table in ExecHashJoinPreloadNextBatch.
+    */
+   Assert(hashtable->batch_reader.batchno = curbatch);
+   Assert(hashtable->batch_reader.inner);

Obviously this isn't supposed to be an assignment.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Peter Geoghegan (#6)

Re: WIP: [[Parallel] Shared] Hash

On Mon, Jan 2, 2017 at 3:17 PM, Peter Geoghegan <pg@heroku.com> wrote:

I noticed a bug in your latest revision:

+   /*
+    * In HJ_NEED_NEW_OUTER, we already selected the current inner batch for
+    * reading from.  If there is a shared hash table, we may have already
+    * partially loaded the hash table in ExecHashJoinPreloadNextBatch.
+    */
+   Assert(hashtable->batch_reader.batchno = curbatch);
+   Assert(hashtable->batch_reader.inner);

Obviously this isn't supposed to be an assignment.

Right, thanks! I will post a new rebased version soon with that and
some other nearby problems fixed.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#7)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Jan 3, 2017 at 10:53 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I will post a new rebased version soon with that and
some other nearby problems fixed.

Here is a new WIP patch. I have plenty of things to tidy up (see note
at end), but the main ideas are now pretty clear and I'd appreciate
some feedback. The main changes since the last patch, other than
debugging, are:

* the number of batches now increases if work_mem would be exceeded;
the work of 'shrinking' the hash table in memory in that case is done
in parallel

* work_mem accounting is done at chunk level, instead of tuples

* interlocking has been rethought

Previously, I had some ideas about using some lock free tricks for
managing chunks of memory, but you may be relieved to hear that I
abandoned those plans. Now, atomic ops are used only for one thing:
pushing tuples into the shared hash table buckets. An LWLock called
'chunk_lock' protects various linked lists of chunks of memory, and
also the shared work_mem accounting. The idea is that backends can
work independently on HASH_CHUNK_SIZE blocks of tuples at a time in
between needing to acquire that lock briefly. Also, there is now a
second barrier, used to coordinate hash table shrinking. This can
happen any number of times during PHJ_PHASE_HASHING and
PHJ_PHASE_LOADING_BATCH(n) phases as required to stay under work_mem,
so it needed to be a separate barrier.

The communication in this patch is a bit more complicated than other
nearby parallel query projects I've looked at; probably the worst bit
is the leader deadlock avoidance stuff (see
ExecHashCheckForEarlyExit), and the second worst bit is probably the
switch statements for allowing participants to show up late and get in
sync, which makes that other problem even more annoying; without those
problems and with just the right kind of reusable shared tuplestore,
this would be a vastly simpler patch. Those are not really
fundamental problems of parallel joins using a shared hash tables, but
they're problems I don't have a better solution to right now.

Stepping back a bit, I am aware of the following approaches to hash
join parallelism:

1. Run the inner plan and build a private hash table in each
participant, and then scatter the outer plan arbitrarily across
participants. This is what 9.6 does, and it's a good plan for small
hash tables with fast inner plans, but a terrible plan for expensive
or large inner plans. Communication overhead: zero; CPU overhead:
runs the inner plan in k workers simultaneously; memory overhead:
builds k copies of the hashtable; disk overhead: may need to spill k
copies of all batches to disk if work_mem exceeded; restrictions:
Can't do right/full joins because no shared 'matched' flags.

2. Run a partition-wise hash join[1]/messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com. Communication overhead: zero;
CPU overhead: zero; memory overhead: zero; disk overhead: zero;
restrictions: the schema must include compatible partition keys, and
potential parallelism is limited by the number of partitions.

3. Repartition the data on the fly, and then run a partition-wise
hash join. Communication overhead: every tuple on at least one and
possibly both sides must be rerouted to the correct participant; CPU
overhead: zero, once repartitioning is done; memory overhead: none;
disk overhead: may need to spill partitions to disk if work_mem is
exceeded

4. Scatter both the inner and outer plans arbitrarily across
participants (ie uncorrelated partitioning), and build a shared hash
table. Communication overhead: synchronisation of build/probe phases,
but no tuple rerouting; CPU overhead: none; memory overhead: none;
disk overhead: may need to spill batches to disk; restrictions: none
in general, but currently we have to drop the leader after the first
batch of a multi-batch join due to our consumer/producer leader
problem mentioned in earlier messages.

We have 1. This proposal aims to provide 4. It seems we have 2 on
the way (that technique works for all 3 join algorithms without any
changes to the join operators and looks best by any measure, but is
limited by the user's schema, ie takes careful planning on the user's
part instead of potentially helping any join). Other databases
including SQL Server offer 3. I suspect that 4 is probably a better
fit than 3 for Postgres today, because the communication overhead of
shovelling nearly all tuples through extra tuple queues to route them
to the right hash table would surely be very high, though I can see
that it's very attractive to have a reusable tuple repartitioning
operator and then run k disjoint communication-free joins (again,
without code change to the join operator, and to the benefit of all
join operators).

About the shared batch reading code: this patch modifies BufFile so
that a temporary file can be shared read-only with other participants,
and then introduces a mechanism for coordinating shared reads. Each
worker starts out reading all the tuples from the file that it wrote,
before attempting to steal tuples from the files written by other
participants, until there are none left anywhere. In the best case
they all write out and then read back in just their own files with
minimal contention, and contention rises as tuples are less evenly
distributed among participants, but we never quite get the best case
because the leader always leaves behind a bunch of batches for the
others to deal with when it quits early. Maybe I should separate all
the batch reader stuff into another patch so it doesn't clutter the
hash join code up so much? I will start reviewing Parallel Tuplesort
shortly, which includes some related ideas.

Some assorted notes on the status: I need to do some thinking about
the file cleanup logic: both explicit deletes at the earliest possible
time, and failure/error paths. Currently the creator of each file is
responsible for cleaning it up, but I guess if the creator aborts
early the file disappears underneath the others' feet, and then I
guess they might raise a confusing error report that races against the
root cause error report; I'm looking into that. Rescans and skew
buckets not finished yet. The new chunk-queue based
ExecScanHashTableForUnmatched isn't tested yet (it replaces and
earlier version that was doing a bucket-by-bucket parallel scan).
There are several places where I haven't changed the private hash
table code to match the shared version because I'm not sure about
that, in particular the idea of chunk-based accounting (which happens
to be convenient for this code, but I also believe it to be more
correct). I'm still trying to decide how to report the hash table
tuple count and size: possibly the grand totals. Generally I need to
do some tidying and provide a suite of queries that hits interesting
cases. I hope to move on these things fairly quickly now that I've
got the hash table resizing and batch sharing stuff working (a puzzle
that kept me very busy for a while) though I'm taking a break for a
bit to do some reviewing.

The test query I've been looking at recently is TPCH Q9. With scale
1GB and work_mem = 64KB, I get a query plan that includes three
different variants of Hash node: Hash (run in every backend, duplicate
hash tables), Shared Hash (run in just one backend, but allowed to use
the sum of work_mem of all the backends, so usually wins by avoiding
batching), and Parallel Shared Hash (run in parallel and using sum of
work_mem). As an anecdatum, I see around 2.5x speedup against master,
using only 2 workers in both cases, though it seems to be bimodal,
either 2x or 2.8x, which I expect has something to do with that leader
exit stuff and I'm looking into that.. More on performance soon.

Thanks for reading!

[1]: /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-hash-v3.patchapplication/octet-stream; name=parallel-hash-v3.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c762fb0..43e85fc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1023,7 +1023,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			pname = sname = "Limit";
 			break;
 		case T_Hash:
-			pname = sname = "Hash";
+			if (((Hash *) plan)->shared_table)
+				pname = sname = "Shared Hash";
+			else
+				pname = sname = "Hash";
 			break;
 		default:
 			pname = sname = "???";
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 86d9fb5..361d56a 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -27,6 +27,7 @@
 #include "executor/executor.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
+#include "executor/nodeHashjoin.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/tqueue.h"
 #include "nodes/nodeFuncs.h"
@@ -203,6 +204,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecCustomScanEstimate((CustomScanState *) planstate,
 									   e->pcxt);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinEstimate((HashJoinState *) planstate,
+									 e->pcxt);
+				break;
 			default:
 				break;
 		}
@@ -255,6 +260,9 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecCustomScanInitializeDSM((CustomScanState *) planstate,
 											d->pcxt);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinInitializeDSM((HashJoinState *) planstate,
+										  d->pcxt);
 			default:
 				break;
 		}
@@ -731,6 +739,10 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 				ExecCustomScanInitializeWorker((CustomScanState *) planstate,
 											   toc);
 				break;
+			case T_HashJoinState:
+				ExecHashJoinInitializeWorker((HashJoinState *) planstate,
+											 toc);
+				break;
 			default:
 				break;
 		}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index b8edd36..5c402bb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -806,6 +806,9 @@ ExecShutdownNode(PlanState *node)
 		case T_GatherState:
 			ExecShutdownGather((GatherState *) node);
 			break;
+		case T_HashJoinState:
+			ExecShutdownHashJoin((HashJoinState *) node);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 11db08f..5301bc0 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -25,6 +25,7 @@
 #include <limits.h>
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "commands/tablespace.h"
 #include "executor/execdebug.h"
@@ -32,14 +33,17 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "pgstat.h"
+#include "port/atomics.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
 #include "utils/lsyscache.h"
+#include "utils/probes.h"
 #include "utils/syscache.h"
 
-
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecHashShrink(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
 					  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -47,8 +51,28 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 						uint32 hashvalue,
 						int bucketNumber);
 static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
+static void ExecHashTableComputeOptimalBuckets(HashJoinTable hashtable);
+
+static HashJoinTuple next_tuple_in_bucket(HashJoinTable table,
+										  HashJoinTuple tuple);
+static HashJoinTuple first_tuple_in_skew_bucket(HashJoinTable table,
+												int skew_bucket_no);
+static HashJoinTuple first_tuple_in_skew_bucket(HashJoinTable table,
+												int bucket_no);
+static void insert_tuple_into_bucket(HashJoinTable table, int bucket_no,
+									 HashJoinTuple tuple,
+									 dsa_pointer tuple_pointer);
+static void insert_tuple_into_skew_bucket(HashJoinTable table,
+										  int bucket_no,
+										  HashJoinTuple tuple,
+										  dsa_pointer tuple_pointer);
 
 static void *dense_alloc(HashJoinTable hashtable, Size size);
+static void *dense_alloc_shared(HashJoinTable hashtable, Size size,
+								dsa_pointer *chunk_shared,
+								bool secondary,
+								bool force);
+
 
 /* ----------------------------------------------------------------
  *		ExecHash
@@ -64,6 +88,98 @@ ExecHash(HashState *node)
 }
 
 /* ----------------------------------------------------------------
+ *		ExecHashCheckForEarlyExit
+ *
+ *		return true if this process needs to abandon work on the
+ *		hash join to avoid a deadlock
+ * ----------------------------------------------------------------
+ */
+bool
+ExecHashCheckForEarlyExit(HashJoinTable hashtable)
+{
+	/*
+	 * The golden rule of leader deadlock avoidance: since leader processes
+	 * have two separate roles, namely reading from worker queues AND executing
+	 * the same plan as workers, we must never allow a leader to wait for
+	 * workers if there is any possibility those workers have emitted tuples.
+	 * Otherwise we could get into a situation where a worker fills up its
+	 * output tuple queue and begins waiting for the leader to read, while
+	 * the leader is busy waiting for the worker.
+	 *
+	 * Parallel hash joins with shared tables are inherently susceptible to
+	 * such deadlocks because there are points at which all participants must
+	 * wait (you can't start check for unmatched tuples in the hash table until
+	 * probing has completed in all workers, etc).
+	 *
+	 * So we follow these rules:
+	 *
+	 * 1.  If there are workers participating, the leader MUST NOT not
+	 *     participate in any further work after probing the first batch, so
+	 *     that it never has to wait for workers that might have emitted
+	 *     tuples.
+	 *
+	 * 2.  If there are no workers participating, the leader MUST run all the
+	 *     batches to completion, because that's the only way for the join
+	 *     to complete.  There is no deadlock risk if there are no workers.
+	 *
+	 * 3.  Workers MUST NOT participate if the hashing phase has finished by
+	 *     the time they have joined, so that the leader can reliably determine
+	 *     whether there are any workers running when it comes to the point
+	 *     where it must choose between 1 and 2.
+	 *
+	 * In other words, if the leader makes it all the way through hashing and
+	 * probing before any workers show up, then the leader will run the whole
+	 * hash join on its own.  If workers do show up any time before hashing is
+	 * finished, the leader will stop executing the join after helping probe
+	 * the first batch.   In the unlikely event of the first worker showing up
+	 * after the leader has finished hashing, it will exit because it's too
+	 * late, the leader has already decided to do all the work alone.
+	 */
+
+	if (!IsParallelWorker())
+	{
+		/* Running in the leader process. */
+		if (BarrierPhase(&hashtable->shared->barrier) >= PHJ_PHASE_PROBING &&
+			hashtable->shared->at_least_one_worker)
+		{
+			/* Abandon ship due to rule 1.  There are workers running. */
+			TRACE_POSTGRESQL_HASH_LEADER_EARLY_EXIT();
+			return true;
+		}
+		else
+		{
+			/*
+			 * Continue processing due to rule 2.  There are no workers, and
+			 * any workers that show up later will abandon ship.
+			 */
+		}
+	}
+	else
+	{
+		/* Running in a worker process. */
+		if (hashtable->attached_at_phase < PHJ_PHASE_PROBING)
+		{
+			/*
+			 * Advertise that there are workers, so that the leader can
+			 * choose between rules 1 and 2.  It's OK that several workers can
+			 * write to this variable without immediately memory
+			 * synchronization, because the leader will only read it in a later
+			 * phase (see above).
+			 */
+			hashtable->shared->at_least_one_worker = true;
+		}
+		else
+		{
+			/* Abandon ship due to rule 3. */
+			TRACE_POSTGRESQL_HASH_WORKER_EARLY_EXIT();
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/* ----------------------------------------------------------------
  *		MultiExecHash
  *
  *		build hash table for hashjoin, doing partitioning if more
@@ -79,6 +195,7 @@ MultiExecHash(HashState *node)
 	TupleTableSlot *slot;
 	ExprContext *econtext;
 	uint32		hashvalue;
+	Barrier	   *barrier = NULL;
 
 	/* must provide our own instrumentation support */
 	if (node->ps.instrument)
@@ -90,6 +207,63 @@ MultiExecHash(HashState *node)
 	outerNode = outerPlanState(node);
 	hashtable = node->hashtable;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Synchronize parallel hash table builds.  At this stage we know that
+		 * the shared hash table has been created, but we don't know if our
+		 * peers are still in MultiExecHash and if so how far through.  We use
+		 * the phase to synchronize with them.
+		 */
+		barrier = &hashtable->shared->barrier;
+
+		switch (BarrierPhase(barrier))
+		{
+		case PHJ_PHASE_BEGINNING:
+			/* ExecHashTableCreate already handled this phase. */
+			Assert(false);
+		case PHJ_PHASE_CREATING:
+			/* Wait for serial phase, and then either hash or wait. */
+			if (BarrierWait(barrier, WAIT_EVENT_HASH_CREATING))
+				goto hash;
+			else if (node->ps.plan->parallel_aware)
+				goto hash;
+			else
+				goto post_hash;
+		case PHJ_PHASE_HASHING:
+			/* Hashing is already underway.  Can we join in? */
+			if (node->ps.plan->parallel_aware)
+				goto hash;
+			else
+				goto post_hash;
+		case PHJ_PHASE_RESIZING:
+			/* Can't help with serial phase. */
+			goto post_resize;
+		case PHJ_PHASE_REBUCKETING:
+			/* Rebucketing is in progress.  Let's help do that. */
+			goto rebucket;
+		default:
+			/* The hash table building work is already finished. */
+			goto finish;
+		}
+	}
+
+ hash:
+	TRACE_POSTGRESQL_HASH_HASHING_START();
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Make sure our local hashtable is up-to-date so we can hash. */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_HASHING);
+		ExecHashUpdate(hashtable);
+
+		/*
+		 * Attach to the second barrier that is just used for coordinating
+		 * shrinking during the hashing phase, in case we run out of work_mem.
+		 */
+		BarrierAttach(&hashtable->shared->shrink_barrier);
+	}
+
 	/*
 	 * set expression context
 	 */
@@ -123,22 +297,106 @@ MultiExecHash(HashState *node)
 			else
 			{
 				/* Not subject to skew optimization, so insert normally */
-				ExecHashTableInsert(hashtable, slot, hashvalue);
+				ExecHashTableInsert(hashtable, slot, hashvalue, false);
 			}
-			hashtable->totalTuples += 1;
+			/*
+			 * Shared tuple counters are managed by dense_alloc_shared.  For
+			 * private hash tables we maintain the counter here.
+			 */
+			if (!HashJoinTableIsShared(hashtable))
+				hashtable->totalTuples += 1;
 		}
 	}
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Detach from the shrink barrier. */
+		BarrierDetach(&hashtable->shared->shrink_barrier);
+	}
+
+	TRACE_POSTGRESQL_HASH_HASHING_DONE();
+
+ post_hash:
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		bool elected_to_resize;
+
+		/*
+		 * Wait for all backends to finish hashing.  If only one worker is
+		 * running the hashing phase because of a non-partial inner plan, the
+		 * other workers will pile up here waiting.  If multiple worker are
+		 * hashing, they should finish close to each other in time.
+		 *
+		 * TODO: Even if only one backend is allowed to run the plan, other
+		 * backends might as well stand ready to help with rebatching work if
+		 * the need arises.  Maybe we need a way to 'arrive' at a barrier, but
+		 * not block, then a way to loop on another condition variable,
+		 * running ExecHashShrink each time we're woken, and break when all
+		 * partipants have arrived at the barrier (ie when
+		 * BarrierPhase(barrier) reports that the phase has advanced).
+		 */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_HASHING);
+		elected_to_resize = BarrierWait(barrier, WAIT_EVENT_HASH_HASHING);
+		/*
+		 * Resizing is a serial phase.  All but one should skip ahead to
+		 * rebucketing, but all workers should update their copy of the shared
+		 * tuple count with the final total first.
+		 */
+		/*
+		hashtable->totalTuples =
+			pg_atomic_read_u64(&hashtable->shared->total_primary_tuples);
+		*/
+		if (!elected_to_resize)
+			goto post_resize;
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+	}
+
 	/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-	if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-		ExecHashIncreaseNumBuckets(hashtable);
+	ExecHashIncreaseNumBuckets(hashtable);
+
+ post_resize:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+		BarrierWait(&hashtable->shared->barrier,
+					WAIT_EVENT_HASH_RESIZING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REBUCKETING);
+	}
+
+ rebucket:
+	/* If the table was resized, insert tuples into the new buckets. */
+	ExecHashUpdate(hashtable);
+	ExecHashRebucket(hashtable);
 
 	/* Account for the buckets in spaceUsed (reported in EXPLAIN ANALYZE) */
-	hashtable->spaceUsed += hashtable->nbuckets * sizeof(HashJoinTuple);
+	hashtable->spaceUsed += hashtable->nbuckets * sizeof(HashJoinBucketHead);
 	if (hashtable->spaceUsed > hashtable->spacePeak)
 		hashtable->spacePeak = hashtable->spaceUsed;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REBUCKETING);
+		BarrierWait(barrier, WAIT_EVENT_HASH_REBUCKETING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING);
+	}
+
+ finish:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * All hashing work has finished.  The other workers may be probing or
+		 * processing unmatched tuples for the initial batch, or dealing with
+		 * later batches.  The next synchronization point is in ExecHashJoin's
+		 * HJ_BUILD_HASHTABLE case, which will figure that out and synchronize
+		 * its local state machine with the parallel processing group's phase.
+		 */
+		Assert(BarrierPhase(barrier) >= PHJ_PHASE_PROBING);
+		ExecHashUpdate(hashtable);
+	}
+
 	/* must provide our own instrumentation support */
+	/* TODO: report only the tuples that WE hashed here? */
 	if (node->ps.instrument)
 		InstrStopNode(node->ps.instrument, hashtable->totalTuples);
 
@@ -243,10 +501,13 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
+ExecHashTableCreate(HashState *state, List *hashOperators, bool keepNulls)
 {
+	Hash	   *node;
 	HashJoinTable hashtable;
+	SharedHashJoinTable shared_hashtable;
 	Plan	   *outerNode;
+	size_t		space_allowed;
 	int			nbuckets;
 	int			nbatch;
 	int			num_skew_mcvs;
@@ -261,10 +522,15 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	 * "outer" subtree of this node, but the inner relation of the hashjoin).
 	 * Compute the appropriate size of the hash table.
 	 */
+	node = (Hash *) state->ps.plan;
 	outerNode = outerPlan(node);
-
+	shared_hashtable = state->shared_table_data;
 	ExecChooseHashTableSize(outerNode->plan_rows, outerNode->plan_width,
 							OidIsValid(node->skewTable),
+							shared_hashtable != NULL,
+							shared_hashtable != NULL ?
+							shared_hashtable->planned_participants - 1 : 0,
+							&space_allowed,
 							&nbuckets, &nbatch, &num_skew_mcvs);
 
 	/* nbuckets must be a power of 2 */
@@ -301,11 +567,19 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	hashtable->outerBatchFile = NULL;
 	hashtable->spaceUsed = 0;
 	hashtable->spacePeak = 0;
-	hashtable->spaceAllowed = work_mem * 1024L;
+	hashtable->spaceAllowed = space_allowed;
 	hashtable->spaceUsedSkew = 0;
 	hashtable->spaceAllowedSkew =
 		hashtable->spaceAllowed * SKEW_WORK_MEM_PERCENT / 100;
-	hashtable->chunks = NULL;
+	hashtable->chunk = NULL;
+	hashtable->chunk_preload = NULL;
+	hashtable->chunks_to_rebucket = NULL;
+	hashtable->chunk_shared = InvalidDsaPointer;
+	hashtable->chunk_preload_shared = InvalidDsaPointer;
+	hashtable->area = state->ps.state->es_query_dsa;
+	hashtable->shared = state->shared_table_data;
+	hashtable->preloaded_spare_tuple = false;
+	hashtable->detached_early = false;
 
 #ifdef HJDEBUG
 	printf("Hashjoin %p: initial nbatch = %d, nbuckets = %d\n",
@@ -340,7 +614,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 
 	/*
 	 * Create temporary memory contexts in which to keep the hashtable working
-	 * storage.  See notes in executor/hashjoin.h.
+	 * storage if using private hash table.  See notes in executor/hashjoin.h.
 	 */
 	hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
 											   "HashTableContext",
@@ -368,23 +642,95 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 		PrepareTempTablespaces();
 	}
 
-	/*
-	 * Prepare context for the first-scan space allocations; allocate the
-	 * hashbucket array therein, and set each bucket "empty".
-	 */
-	MemoryContextSwitchTo(hashtable->batchCxt);
+	MemoryContextSwitchTo(oldcxt);
 
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Barrier *barrier;
 
-	/*
-	 * Set up for skew optimization, if possible and there's a need for more
-	 * than one batch.  (In a one-batch join, there's no point in it.)
-	 */
-	if (nbatch > 1)
-		ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+		/*
+		 * Attach to the barrier.  The corresponding detach operation is in
+		 * ExecHashTableDestroy.
+		 */
+		barrier = &hashtable->shared->barrier;
+		hashtable->attached_at_phase = BarrierAttach(barrier);
 
-	MemoryContextSwitchTo(oldcxt);
+		/*
+		 * So far we have no idea whether there are any other participants, and
+		 * if so, what phase they are working on.  The only thing we care about
+		 * at this point is whether someone has already created the shared
+		 * hash table yet.  If not, one backend will be elected to do that
+		 * now.
+		 */
+		if (BarrierPhase(barrier) == PHJ_PHASE_BEGINNING)
+		{
+			if (BarrierWait(barrier, WAIT_EVENT_HASH_BEGINNING))
+			{
+				/* Serial phase: create the hash tables */
+				Size bytes;
+				HashJoinBucketHead *buckets;
+				int i;
+				SharedHashJoinTable shared;
+				dsa_area *area;
+
+				shared = hashtable->shared;
+				area = hashtable->area;
+				bytes = nbuckets * sizeof(HashJoinBucketHead);
+
+				/* Allocate the hash table buckets. */
+				shared->buckets = dsa_allocate(area, bytes);
+				if (!DsaPointerIsValid(shared->buckets))
+					ereport(ERROR,
+							(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+							 errmsg("out of memory")));
+
+				/* Initialize the hash table buckets to empty. */
+				buckets = dsa_get_address(area, shared->buckets);
+				for (i = 0; i < nbuckets; ++i)
+					dsa_pointer_atomic_init(&buckets[i].shared,
+											InvalidDsaPointer);
+
+				/* Initialize the rest of parallel_state. */
+				hashtable->shared->nbuckets = nbuckets;
+				hashtable->shared->nbatch = nbatch;
+				hashtable->shared->size = bytes;
+				hashtable->shared->size_preloaded = 0;
+				ExecHashJoinRewindBatches(hashtable, 0);
+
+				/* TODO: ExecHashBuildSkewHash */
+
+				/*
+				 * The backend-local pointers in hashtable will be set up by
+				 * ExecHashUpdate, at each point where they might have
+				 * changed.
+				 */
+			}
+			Assert(BarrierPhase(&hashtable->shared->barrier) ==
+				   PHJ_PHASE_CREATING);
+			/* The next synchronization point is in MultiExecHash. */
+		}
+	}
+	else
+	{
+		/*
+		 * Prepare context for the first-scan space allocations; allocate the
+		 * hashbucket array therein, and set each bucket "empty".
+		 */
+		MemoryContextSwitchTo(hashtable->batchCxt);
+
+		hashtable->buckets = (HashJoinBucketHead *)
+			palloc0(nbuckets * sizeof(HashJoinBucketHead));
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/*
+		 * Set up for skew optimization, if possible and there's a need for
+		 * more than one batch.  (In a one-batch join, there's no point in
+		 * it.)
+		 */
+		if (nbatch > 1)
+			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+	}
 
 	return hashtable;
 }
@@ -402,6 +748,8 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 
 void
 ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
+						bool shared, int parallel_workers,
+						size_t *space_allowed,
 						int *numbuckets,
 						int *numbatches,
 						int *num_skew_mcvs)
@@ -432,9 +780,15 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 	inner_rel_bytes = ntuples * tupsize;
 
 	/*
-	 * Target in-memory hashtable size is work_mem kilobytes.
+	 * Target in-memory hashtable size is work_mem kilobytes.  Shared hash
+	 * tables are allowed to multiply work_mem by the number of participants,
+	 * since other non-shared memory based plans allow each participant to use
+	 * work_mem for the same total.
 	 */
 	hash_table_bytes = work_mem * 1024L;
+	if (shared && parallel_workers > 0)
+		hash_table_bytes *= parallel_workers + 1;	/* one for the leader */
+	*space_allowed = hash_table_bytes;
 
 	/*
 	 * If skew optimization is possible, estimate the number of skew buckets
@@ -481,8 +835,8 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 	 * Note that both nbuckets and nbatch must be powers of 2 to make
 	 * ExecHashGetBucketAndBatch fast.
 	 */
-	max_pointers = (work_mem * 1024L) / sizeof(HashJoinTuple);
-	max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));
+	max_pointers = (work_mem * 1024L) / sizeof(HashJoinBucketHead);
+	max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinBucketHead));
 	/* If max_pointers isn't a power of 2, must round it down to one */
 	mppow2 = 1L << my_log2(max_pointers);
 	if (max_pointers != mppow2)
@@ -504,7 +858,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 	 * If there's not enough space to store the projected number of tuples and
 	 * the required bucket headers, we will need multiple batches.
 	 */
-	bucket_bytes = sizeof(HashJoinTuple) * nbuckets;
+	bucket_bytes = sizeof(HashJoinBucketHead) * nbuckets;
 	if (inner_rel_bytes + bucket_bytes > hash_table_bytes)
 	{
 		/* We'll need multiple batches */
@@ -519,12 +873,12 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 		 * NTUP_PER_BUCKET tuples, whose projected size already includes
 		 * overhead for the hash code, pointer to the next tuple, etc.
 		 */
-		bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinTuple));
+		bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinBucketHead));
 		lbuckets = 1L << my_log2(hash_table_bytes / bucket_size);
 		lbuckets = Min(lbuckets, max_pointers);
 		nbuckets = (int) lbuckets;
 		nbuckets = 1 << my_log2(nbuckets);
-		bucket_bytes = nbuckets * sizeof(HashJoinTuple);
+		bucket_bytes = nbuckets * sizeof(HashJoinBucketHead);
 
 		/*
 		 * Buckets are simple pointers to hashjoin tuples, while tupsize
@@ -564,6 +918,31 @@ ExecHashTableDestroy(HashJoinTable hashtable)
 {
 	int			i;
 
+	/* Detached, if we haven't already. */
+	if (HashJoinTableIsShared(hashtable) && !hashtable->detached_early)
+	{
+		Barrier *barrier = &hashtable->shared->barrier;
+
+		/*
+		 * TODO: Can we just detach if there is only one batch, but wait here
+		 * if there is more than one (to make sure batch files created by this
+		 * participant are not deleted)?  When detaching, the last one to
+		 * detach should do the cleanup work, and/or leave things in the right
+		 * state for rescanning.
+		 */
+
+		if (BarrierWait(barrier, WAIT_EVENT_HASH_DESTROY))
+		{
+			/* Serial: free the tables */
+			if (DsaPointerIsValid(hashtable->shared->buckets))
+			{
+				dsa_free(hashtable->area, hashtable->shared->buckets);
+				hashtable->shared->buckets = InvalidDsaPointer;
+			}
+		}
+		BarrierDetach(&hashtable->shared->barrier);
+	}
+
 	/*
 	 * Make sure all the temp files are closed.  We skip batch 0, since it
 	 * can't have any temp files (and the arrays might not even exist if
@@ -584,37 +963,13 @@ ExecHashTableDestroy(HashJoinTable hashtable)
 	pfree(hashtable);
 }
 
-/*
- * ExecHashIncreaseNumBatches
- *		increase the original number of batches in order to reduce
- *		current memory consumption
- */
 static void
-ExecHashIncreaseNumBatches(HashJoinTable hashtable)
+extend_batch_file_arrays(HashJoinTable hashtable, int nbatch)
 {
-	int			oldnbatch = hashtable->nbatch;
-	int			curbatch = hashtable->curbatch;
-	int			nbatch;
 	MemoryContext oldcxt;
-	long		ninmemory;
-	long		nfreed;
-	HashMemoryChunk oldchunks;
+	int oldnbatch = hashtable->nbatch;
 
-	/* do nothing if we've decided to shut off growth */
-	if (!hashtable->growEnabled)
-		return;
-
-	/* safety check to avoid overflow */
-	if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2)))
-		return;
-
-	nbatch = oldnbatch * 2;
-	Assert(nbatch > 1);
-
-#ifdef HJDEBUG
-	printf("Hashjoin %p: increasing nbatch to %d because space = %zu\n",
-		   hashtable, nbatch, hashtable->spaceUsed);
-#endif
+	TRACE_POSTGRESQL_HASH_INCREASE_BATCHES(nbatch);
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
 
@@ -641,9 +996,49 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 			   (nbatch - oldnbatch) * sizeof(BufFile *));
 	}
 
+	hashtable->nbatch = nbatch;
+
 	MemoryContextSwitchTo(oldcxt);
+}
 
-	hashtable->nbatch = nbatch;
+/*
+ * ExecHashIncreaseNumBatches
+ *		increase the original number of batches in order to reduce
+ *		current memory consumption
+ */
+static void
+ExecHashIncreaseNumBatches(HashJoinTable hashtable)
+{
+	int			oldnbatch = hashtable->nbatch;
+	int			curbatch = hashtable->curbatch;
+	int			nbatch;
+	long		ninmemory;
+	long		nfreed;
+	HashMemoryChunk oldchunks;
+
+	/*
+	 * TODO: Should private hash tables also switch to chunk-based memory
+	 * accounting, done in dense_alloc, and use ExecHashShrink?
+	 */
+	Assert(!HashJoinTableIsShared(hashtable));
+
+	/* do nothing if we've decided to shut off growth */
+	if (!hashtable->growEnabled)
+		return;
+
+	/* safety check to avoid overflow */
+	if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2)))
+		return;
+
+	nbatch = oldnbatch * 2;
+	Assert(nbatch > 1);
+
+#ifdef HJDEBUG
+	printf("Hashjoin %p: increasing nbatch to %d because space = %zu\n",
+		   hashtable, nbatch, hashtable->spaceUsed);
+#endif
+
+	extend_batch_file_arrays(hashtable, nbatch);
 
 	/*
 	 * Scan through the existing hash table entries and dump out any that are
@@ -661,7 +1056,7 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 		hashtable->log2_nbuckets = hashtable->log2_nbuckets_optimal;
 
 		hashtable->buckets = repalloc(hashtable->buckets,
-								sizeof(HashJoinTuple) * hashtable->nbuckets);
+								sizeof(HashJoinBucketHead) * hashtable->nbuckets);
 	}
 
 	/*
@@ -669,14 +1064,14 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	 * buckets now and not have to keep track which tuples in the buckets have
 	 * already been processed. We will free the old chunks as we go.
 	 */
-	memset(hashtable->buckets, 0, sizeof(HashJoinTuple) * hashtable->nbuckets);
-	oldchunks = hashtable->chunks;
-	hashtable->chunks = NULL;
+	memset(hashtable->buckets, 0, sizeof(HashJoinBucketHead) * hashtable->nbuckets);
+	oldchunks = hashtable->chunk;
+	hashtable->chunk = NULL;
 
 	/* so, let's scan through the old chunks, and all tuples in each chunk */
 	while (oldchunks != NULL)
 	{
-		HashMemoryChunk nextchunk = oldchunks->next;
+		HashMemoryChunk nextchunk = oldchunks->next.private;
 
 		/* position within the buffer (up to oldchunks->used) */
 		size_t		idx = 0;
@@ -699,20 +1094,23 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 				/* keep tuple in memory - copy it into the new chunk */
 				HashJoinTuple copyTuple;
 
-				copyTuple = (HashJoinTuple) dense_alloc(hashtable, hashTupleSize);
+				copyTuple = (HashJoinTuple)
+					dense_alloc(hashtable, hashTupleSize);
 				memcpy(copyTuple, hashTuple, hashTupleSize);
 
 				/* and add it back to the appropriate bucket */
-				copyTuple->next = hashtable->buckets[bucketno];
-				hashtable->buckets[bucketno] = copyTuple;
+				insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+										 InvalidDsaPointer);
 			}
 			else
 			{
 				/* dump it out */
 				Assert(batchno > curbatch);
-				ExecHashJoinSaveTuple(HJTUPLE_MINTUPLE(hashTuple),
+				ExecHashJoinSaveTuple(hashtable,
+									  HJTUPLE_MINTUPLE(hashTuple),
 									  hashTuple->hashvalue,
-									  &hashtable->innerBatchFile[batchno]);
+									  batchno,
+									  true);
 
 				hashtable->spaceUsed -= hashTupleSize;
 				nfreed++;
@@ -758,8 +1156,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 static void
 ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 {
-	HashMemoryChunk chunk;
-
 	/* do nothing if not an increase (it's called increase for a reason) */
 	if (hashtable->nbuckets >= hashtable->nbuckets_optimal)
 		return;
@@ -780,45 +1176,412 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
 	 * Just reallocate the proper number of buckets - we don't need to walk
 	 * through them - we can walk the dense-allocated chunks (just like in
 	 * ExecHashIncreaseNumBatches, but without all the copying into new
-	 * chunks)
+	 * chunks): see ExecHashRebucket, which must be called next.
+	 */
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Size bytes;
+		int i;
+
+		/* Serial phase: only one backend reallocates. */
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_RESIZING);
+
+		/* Free the old hash table. */
+		dsa_free(hashtable->area, hashtable->shared->buckets);
+
+		/* Allocate replacement. */
+		bytes = hashtable->nbuckets * sizeof(HashJoinBucketHead);
+		hashtable->shared->buckets = dsa_allocate(hashtable->area, bytes);
+		if (!DsaPointerIsValid(hashtable->shared->buckets))
+			ereport(ERROR,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("out of memory")));
+
+		/* Initialize empty hash table buckets. */
+		hashtable->buckets =
+			dsa_get_address(hashtable->area,
+							hashtable->shared->buckets);
+		for (i = 0; i < hashtable->nbuckets; ++i)
+			dsa_pointer_atomic_write(&hashtable->buckets[i].shared,
+									 InvalidDsaPointer);
+		hashtable->shared->nbuckets = hashtable->nbuckets;
+
+		/* Update size accounting. */
+		hashtable->shared->size += bytes / 2;
+
+		/* Move all chunks to the rebucket list. */
+		hashtable->shared->chunks_to_rebucket = hashtable->shared->chunks;
+		hashtable->shared->chunks = InvalidDsaPointer;
+	}
+	else
+	{
+		hashtable->buckets =
+			(HashJoinBucketHead *) repalloc(hashtable->buckets,
+											hashtable->nbuckets * sizeof(HashJoinBucketHead));
+
+		memset(hashtable->buckets, 0, hashtable->nbuckets * sizeof(HashJoinBucketHead));
+		/* Move all chunks to the rebucket list. */
+		hashtable->chunks_to_rebucket = hashtable->chunk;
+		hashtable->chunk = NULL;
+	}
+}
+
+/*
+ * Pop a memory chunk from a given list.  Returns a backend-local pointer to
+ * the chunk, or NULL if the list is empty.  Also sets *chunk_out to the
+ * dsa_pointer to the chunk.
+ */
+static HashMemoryChunk
+ExecHashPopChunk(HashJoinTable hashtable,
+				 dsa_pointer *chunk_out,
+				 dsa_pointer *head)
+{
+	HashMemoryChunk chunk;
+
+	Assert(LWLockHeldByMe(&hashtable->shared->chunk_lock));
+
+	if (!DsaPointerIsValid(*head))
+		return NULL;
+
+	*chunk_out = *head;
+	chunk = (HashMemoryChunk)
+		dsa_get_address(hashtable->area, *chunk_out);
+	*head = chunk->next.shared;
+
+	return chunk;
+}
+
+/*
+ * Push a shared memory chunk onto a given list.
+ */
+static void
+ExecHashPushChunk(HashJoinTable hashtable,
+				  HashMemoryChunk chunk,
+				  dsa_pointer chunk_shared,
+				  dsa_pointer *head)
+{
+	Assert(LWLockHeldByMeInMode(&hashtable->shared->chunk_lock, LW_EXCLUSIVE));
+	Assert(chunk == dsa_get_address(hashtable->area, chunk_shared));
+
+	chunk->next.shared = *head;
+	*head = chunk_shared;
+}
+
+/*
+ * ExecHashRebucket
+ *		insert the tuples from hashtable->chunks_to_rebucket into the hashtable
+ */
+void
+ExecHashRebucket(HashJoinTable hashtable)
+{
+	HashMemoryChunk chunk;
+	dsa_pointer chunk_shared;
+	int chunks_processed = 0;
+
+	TRACE_POSTGRESQL_HASH_REBUCKET_START();
+
+	/*
+	 * Scan through all tuples in all chunks in the rebucket list to rebuild
+	 * the hash table.
+	 */
+	if (HashJoinTableIsShared(hashtable))
+	{
+		LWLockAcquire(&hashtable->shared->chunk_lock, LW_EXCLUSIVE);
+		chunk =
+			ExecHashPopChunk(hashtable, &chunk_shared,
+							 &hashtable->shared->chunks_to_rebucket);
+		LWLockRelease(&hashtable->shared->chunk_lock);
+	}
+	else
+		chunk = hashtable->chunks_to_rebucket;
+	while (chunk != NULL)
+	{
+		/* process all tuples stored in this chunk */
+		size_t		idx = 0;
+
+		while (idx < chunk->used)
+		{
+			HashJoinTuple hashTuple = (HashJoinTuple) (chunk->data + idx);
+			dsa_pointer hashTuple_shared = chunk_shared +
+				offsetof(HashMemoryChunkData, data) + idx;
+			int			bucketno;
+			int			batchno;
+
+			ExecHashGetBucketAndBatch(hashtable, hashTuple->hashvalue,
+									  &bucketno, &batchno);
+
+			/* add the tuple to the proper bucket */
+			insert_tuple_into_bucket(hashtable, bucketno, hashTuple,
+									 hashTuple_shared);
+
+			/* advance index past the tuple */
+			idx += MAXALIGN(HJTUPLE_OVERHEAD +
+							HJTUPLE_MINTUPLE(hashTuple)->t_len);
+		}
+		++chunks_processed;
+
+		/* Push chunk back onto the chunk list and move to the next. */
+		if (HashJoinTableIsShared(hashtable))
+		{
+			LWLockAcquire(&hashtable->shared->chunk_lock, LW_EXCLUSIVE);
+			ExecHashPushChunk(hashtable, chunk, chunk_shared,
+							  &hashtable->shared->chunks);
+			chunk =
+				ExecHashPopChunk(hashtable, &chunk_shared,
+								 &hashtable->shared->chunks_to_rebucket);
+			LWLockRelease(&hashtable->shared->chunk_lock);
+		}
+		else
+		{
+			HashMemoryChunk next = chunk->next.private;
+
+			chunk->next.private = hashtable->chunk;
+			hashtable->chunk = chunk;
+			chunk = next;
+		}
+	}
+
+	TRACE_POSTGRESQL_HASH_REBUCKET_DONE(chunks_processed);
+}
+
+static void
+ExecHashTableComputeOptimalBuckets(HashJoinTable hashtable)
+{
+	double		ntuples = (hashtable->totalTuples - hashtable->skewTuples);
+
+	/*
+	 * Guard against integer overflow and alloc size overflow.  The
+	 * MaxAllocSize limitation doesn't really apply for shared hash tables,
+	 * since DSA has no such limit, but for now let's apply the same limit.
 	 */
-	hashtable->buckets =
-		(HashJoinTuple *) repalloc(hashtable->buckets,
-								hashtable->nbuckets * sizeof(HashJoinTuple));
+	while (ntuples > (hashtable->nbuckets_optimal * NTUP_PER_BUCKET) &&
+		   hashtable->nbuckets_optimal <= INT_MAX / 2 &&
+		   hashtable->nbuckets_optimal * 2 <= MaxAllocSize / sizeof(HashJoinBucketHead))
+	{
+		hashtable->nbuckets_optimal *= 2;
+		hashtable->log2_nbuckets_optimal += 1;
+	}
+}
+
+/*
+ * Process the queue of chunks whose tuples need to be redistributed into the
+ * correct batches until it is empty.  Hopefully this will shrink the hash
+ * table, keeping about half of the tuples in memory and sending the rest to a
+ * future batch.
+ */
+static void
+ExecHashShrink(HashJoinTable hashtable)
+{
+	Size size_before_shrink = 0;
+	Size tuples_in_memory = 0;
+	Size tuples_written_out = 0;
+	dsa_pointer chunk_shared;
+	HashMemoryChunk chunk;
+	bool elected_to_decide = false;
+
+	TRACE_POSTGRESQL_HASH_SHRINK_START(hashtable->nbatch);
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Since a newly launched participant could arrive while shrinking is
+		 * already underway, we need to be able to jump to the correct place
+		 * in this function.
+		 */
+		switch (BarrierPhase(&hashtable->shared->shrink_barrier))
+		{
+		case PHJ_SHRINK_PHASE_BEGINNING: /* likely case */
+			break;
+		case PHJ_SHRINK_PHASE_CLEARING:
+			goto clearing;
+		case PHJ_SHRINK_PHASE_WORKING:
+			goto working;
+		case PHJ_SHRINK_PHASE_DECIDING:
+			goto deciding;
+		}
+
+		/*
+		 * We wait until all participants have reached this point.  We need to
+		 * do that because we can't clear the hash table if any partipicant is
+		 * still inserting tuples into it, and we can't modify chunks that any
+		 * participant is still writing into.
+		 */
+		if (BarrierWait(&hashtable->shared->shrink_barrier,
+						WAIT_EVENT_HASH_SHRINKING1))
+		{
+			/* TODO: could also resize hash table here! */
+
+			/* Serial phase: one participant clears the hash table. */
+			memset(hashtable->buckets, 0,
+				   hashtable->nbuckets * sizeof(HashJoinBucketHead));
 
-	memset(hashtable->buckets, 0, hashtable->nbuckets * sizeof(HashJoinTuple));
+			/*
+			 * This participant will also make the decision about whether to
+			 * disable further attempts to shrink.
+			 */
+			size_before_shrink = hashtable->shared->size;
+			elected_to_decide = true;
+		}
+	clearing:
+		/* Wait until hash table is cleared. */
+		BarrierWait(&hashtable->shared->shrink_barrier,
+					WAIT_EVENT_HASH_SHRINKING2);
+
+		Assert(hashtable->shared->nbatch == hashtable->nbatch);
+	}
+	else
+	{
+		/* Clear the hash table. */
+		memset(hashtable->buckets, 0,
+			   sizeof(HashJoinBucketHead) * hashtable->nbuckets);
+	}
+
+	/* Pop first chunk from the shrink queue. */
+	if (HashJoinTableIsShared(hashtable))
+	{
+	working:
+		LWLockAcquire(&hashtable->shared->chunk_lock, LW_EXCLUSIVE);
+		chunk = ExecHashPopChunk(hashtable, &chunk_shared,
+								 &hashtable->shared->chunks_to_shrink);
+		LWLockRelease(&hashtable->shared->chunk_lock);
+	}
+	else
+		chunk = hashtable->chunks_to_shrink;
+
+	/* Process queue until empty. */
+	while (chunk != NULL)
+	{
+		Size idx = 0;
+
+		/* Process all tuples stored in this chunk. */
+		while (idx < chunk->used)
+		{
+			HashJoinTuple hashTuple = (HashJoinTuple) (chunk->data + idx);
+			MinimalTuple tuple = HJTUPLE_MINTUPLE(hashTuple);
+			dsa_pointer	copyTupleShared = InvalidDsaPointer;
+			int			hashTupleSize = (HJTUPLE_OVERHEAD + tuple->t_len);
+			int			bucketno;
+			int			batchno;
+
+			ExecHashGetBucketAndBatch(hashtable, hashTuple->hashvalue,
+									  &bucketno, &batchno);
+
+			if (batchno == hashtable->curbatch)
+			{
+				/* keep tuple in memory - copy it into the new chunk */
+				HashJoinTuple copyTuple;
+
+				if (HashJoinTableIsShared(hashtable))
+					copyTuple = (HashJoinTuple)
+						dense_alloc_shared(hashtable, hashTupleSize,
+										   &copyTupleShared, false, false);
+				else
+					copyTuple = (HashJoinTuple)
+						dense_alloc(hashtable, hashTupleSize);
+				memcpy(copyTuple, hashTuple, hashTupleSize);
+
+				/* and add it back to the appropriate bucket */
+				insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+										 copyTupleShared);
+				++tuples_in_memory;
+			}
+			else
+			{
+				/* dump it out */
+				Assert(batchno > hashtable->curbatch);
+				ExecHashJoinSaveTuple(hashtable,
+									  HJTUPLE_MINTUPLE(hashTuple),
+									  hashTuple->hashvalue,
+									  batchno,
+									  true);
+
+				hashtable->spaceUsed -= hashTupleSize;
+				++tuples_written_out;
+			}
 
-	/* scan through all tuples in all chunks to rebuild the hash table */
-	for (chunk = hashtable->chunks; chunk != NULL; chunk = chunk->next)
-	{
-		/* process all tuples stored in this chunk */
-		size_t		idx = 0;
+			/* next tuple in this chunk */
+			idx += MAXALIGN(hashTupleSize);
+		}
 
-		while (idx < chunk->used)
+		/* Free chunk and pop next from the shrink queue. */
+		if (HashJoinTableIsShared(hashtable))
 		{
-			HashJoinTuple hashTuple = (HashJoinTuple) (chunk->data + idx);
-			int			bucketno;
-			int			batchno;
+			Size size = chunk->maxlen + offsetof(HashMemoryChunkData, data);
+
+			TRACE_POSTGRESQL_HASH_FREE_CHUNK(size);
+			dsa_free(hashtable->area, chunk_shared);
+
+			LWLockAcquire(&hashtable->shared->chunk_lock, LW_EXCLUSIVE);
+			Assert(hashtable->shared->size > size);
+			hashtable->shared->size -= size;
+			hashtable->shared->tuples_in_memory += tuples_in_memory;
+			hashtable->shared->tuples_written_out += tuples_written_out;
+			tuples_in_memory = 0;
+			tuples_written_out = 0;
+			chunk = ExecHashPopChunk(hashtable, &chunk_shared,
+									 &hashtable->shared->chunks_to_shrink);
+			LWLockRelease(&hashtable->shared->chunk_lock);
+		}
+		else
+		{
+			HashMemoryChunk next = chunk->next.private;
 
-			ExecHashGetBucketAndBatch(hashtable, hashTuple->hashvalue,
-									  &bucketno, &batchno);
+			pfree(chunk);
+			chunk = next;
+		}
+	}
 
-			/* add the tuple to the proper bucket */
-			hashTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = hashTuple;
+	/* Decide if shrinking actually reduced memory usage. */
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Wait until all have finished shrinking chunks.  We need to do that
+		 * because we need the total tuple counts before we can decide whether
+		 * to prevent further attempts at shrinking.
+		 */
+		BarrierWait(&hashtable->shared->shrink_barrier,
+					WAIT_EVENT_HASH_SHRINKING3);
 
-			/* advance index past the tuple */
-			idx += MAXALIGN(HJTUPLE_OVERHEAD +
-							HJTUPLE_MINTUPLE(hashTuple)->t_len);
+		if (elected_to_decide)
+		{
+			/* Serial phase: one participant decides. */
+			if (hashtable->shared->tuples_in_memory == 0 ||
+				hashtable->shared->tuples_written_out == 0)
+			{
+				TRACE_POSTGRESQL_HASH_SHRINK_DISABLED();
+				hashtable->shared->shrinking_enabled = false;
+			}
+
+			TRACE_POSTGRESQL_HASH_SHRINK_STATS(hashtable->shared->tuples_in_memory,
+											   hashtable->shared->tuples_written_out,
+											   size_before_shrink,
+											   hashtable->shared->size);
+		}
+	deciding:
+		/* Wait for above decision to be made. */
+		BarrierWaitSet(&hashtable->shared->shrink_barrier,
+					   PHJ_SHRINK_PHASE_BEGINNING,
+					   WAIT_EVENT_HASH_SHRINKING4);
+	}
+	else
+	{
+		if (tuples_in_memory == 0 || tuples_written_out == 0)
+		{
+			TRACE_POSTGRESQL_HASH_SHRINK_DISABLED();
+			hashtable->growEnabled = false;
 		}
 	}
-}
 
+	TRACE_POSTGRESQL_HASH_SHRINK_DONE();
+}
 
 /*
  * ExecHashTableInsert
  *		insert a tuple into the hash table depending on the hash value
- *		it may just go to a temp file for later batches
+ *		it may just go to a temp file for later batches; if 'preload' is
+ *		then it may be loaded into a chunk but not actually inserted yet;
+ *		return true on success, false if we ran out of work_mem
  *
  * Note: the passed TupleTableSlot may contain a regular, minimal, or virtual
  * tuple; the minimal case in particular is certain to happen while reloading
@@ -826,10 +1589,11 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
  * case by not forcing the slot contents into minimal form; not clear if it's
  * worth the messiness required.
  */
-void
+bool
 ExecHashTableInsert(HashJoinTable hashtable,
 					TupleTableSlot *slot,
-					uint32 hashvalue)
+					uint32 hashvalue,
+					bool preload)
 {
 	MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
 	int			bucketno;
@@ -839,20 +1603,61 @@ ExecHashTableInsert(HashJoinTable hashtable,
 							  &bucketno, &batchno);
 
 	/*
-	 * decide whether to put the tuple in the hash table or a temp file
+	 * decide whether to put the tuple in memory or in a temp file
 	 */
-	if (batchno == hashtable->curbatch)
+	if (batchno == hashtable->curbatch + (preload ? 1 : 0))
 	{
 		/*
 		 * put the tuple in hash table
 		 */
 		HashJoinTuple hashTuple;
 		int			hashTupleSize;
-		double		ntuples = (hashtable->totalTuples - hashtable->skewTuples);
+		dsa_pointer hashTuple_shared = InvalidDsaPointer;
 
 		/* Create the HashJoinTuple */
 		hashTupleSize = HJTUPLE_OVERHEAD + tuple->t_len;
-		hashTuple = (HashJoinTuple) dense_alloc(hashtable, hashTupleSize);
+
+	retry:
+		if (HashJoinTableIsShared(hashtable))
+			hashTuple = (HashJoinTuple)
+				dense_alloc_shared(hashtable, hashTupleSize,
+								   &hashTuple_shared, preload, true);
+		else
+			hashTuple = (HashJoinTuple)
+				dense_alloc(hashtable, hashTupleSize);
+
+		/* Check for failure for allocate. */
+		if (!hashTuple)
+		{
+			if (preload)
+			{
+				/*
+				 * There is no more work_mem into which to preload tuples for
+				 * the next batch, so tell caller to stop doing that.
+				 */
+				Assert(HashJoinTableIsShared(hashtable));
+				return false;
+			}
+			else
+			{
+				/*
+				 * Either dense_alloc_shared has decided that we should
+				 * increase the number of batches or another participant has
+				 * already decided to do that, so we should go and help shrink
+				 * the hash table by sending tuples to future batches.
+				 */
+				Assert(HashJoinTableIsShared(hashtable));
+				ExecHashShrink(hashtable);
+
+				/*
+				 * Try again.  Hopefully memory has been freed up, or we've
+				 * decided to stop respecting work_mem because increasing the
+				 * number of batches isn't helping (large numbers of tuples
+				 * with the same hash value can't be separated).
+				 */
+				goto retry;
+			}
+		}
 
 		hashTuple->hashvalue = hashvalue;
 		memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
@@ -865,33 +1670,32 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		 */
 		HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
-		/* Push it onto the front of the bucket's list */
-		hashTuple->next = hashtable->buckets[bucketno];
-		hashtable->buckets[bucketno] = hashTuple;
+		/* Push it onto the front of the bucket's list, unless preloading */
+		if (!preload)
+			insert_tuple_into_bucket(hashtable, bucketno, hashTuple,
+									 hashTuple_shared);
 
 		/*
 		 * Increase the (optimal) number of buckets if we just exceeded the
 		 * NTUP_PER_BUCKET threshold, but only when there's still a single
 		 * batch.
 		 */
-		if (hashtable->nbatch == 1 &&
-			ntuples > (hashtable->nbuckets_optimal * NTUP_PER_BUCKET))
-		{
-			/* Guard against integer overflow and alloc size overflow */
-			if (hashtable->nbuckets_optimal <= INT_MAX / 2 &&
-				hashtable->nbuckets_optimal * 2 <= MaxAllocSize / sizeof(HashJoinTuple))
-			{
-				hashtable->nbuckets_optimal *= 2;
-				hashtable->log2_nbuckets_optimal += 1;
-			}
-		}
+		if (hashtable->nbatch == 1)
+			ExecHashTableComputeOptimalBuckets(hashtable);
+
+		/*
+		 * TODO: Get rid of the following code, and use the same pattern as
+		 * above, namely let dense_alloc count chunk size (it's more
+		 * accurate!) and let it tell you when you need to back off and
+		 * ExecHashShrink?
+		 */
 
 		/* Account for space used, and back off if we've used too much */
 		hashtable->spaceUsed += hashTupleSize;
 		if (hashtable->spaceUsed > hashtable->spacePeak)
 			hashtable->spacePeak = hashtable->spaceUsed;
 		if (hashtable->spaceUsed +
-			hashtable->nbuckets_optimal * sizeof(HashJoinTuple)
+			hashtable->nbuckets_optimal * sizeof(HashJoinBucketHead)
 			> hashtable->spaceAllowed)
 			ExecHashIncreaseNumBatches(hashtable);
 	}
@@ -900,11 +1704,15 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		/*
 		 * put the tuple into a temp file for later batches
 		 */
-		Assert(batchno > hashtable->curbatch);
-		ExecHashJoinSaveTuple(tuple,
+		Assert(batchno > hashtable->curbatch + (preload ? 1 : 0));
+		ExecHashJoinSaveTuple(hashtable,
+							  tuple,
 							  hashvalue,
-							  &hashtable->innerBatchFile[batchno]);
+							  batchno,
+							  true);
 	}
+
+	return true;
 }
 
 /*
@@ -1047,6 +1855,134 @@ ExecHashGetBucketAndBatch(HashJoinTable hashtable,
 }
 
 /*
+ * Update the local hashtable with the current pointers and sizes from
+ * hashtable->parallel_state.
+ */
+void
+ExecHashUpdate(HashJoinTable hashtable)
+{
+	Barrier *barrier;
+
+	if (!HashJoinTableIsShared(hashtable))
+		return;
+
+	barrier = &hashtable->shared->barrier;
+
+	/*
+	 * This should only be called in a phase when the hash table is not being
+	 * mutated (ie resized, swapped etc).
+	 */
+	Assert(!PHJ_PHASE_MUTATING_TABLE(
+		BarrierPhase(&hashtable->shared->barrier)));
+
+	/* The hash table. */
+	hashtable->buckets = (HashJoinBucketHead *)
+		dsa_get_address(hashtable->area, hashtable->shared->buckets);
+	hashtable->nbuckets = hashtable->shared->nbuckets;
+	/* TODO nbatch? */
+	hashtable->log2_nbuckets = my_log2(hashtable->nbuckets);
+
+	hashtable->curbatch = PHJ_PHASE_TO_BATCHNO(BarrierPhase(barrier));
+}
+
+/*
+ * Get the next tuple in the same bucket as 'tuple'.
+ */
+static HashJoinTuple
+next_tuple_in_bucket(HashJoinTable table, HashJoinTuple tuple)
+{
+	if (HashJoinTableIsShared(table))
+		return (HashJoinTuple)
+			dsa_get_address(table->area, tuple->next.shared);
+	else
+		return tuple->next.private;
+}
+
+/*
+ * Get the first tuple in a given skew bucket identified by number.
+ */
+static HashJoinTuple
+first_tuple_in_skew_bucket(HashJoinTable table, int skew_bucket_no)
+{
+	if (HashJoinTableIsShared(table))
+		return (HashJoinTuple)
+			dsa_get_address(table->area,
+							table->skewBucket[skew_bucket_no]->tuples.shared);
+	else
+		return table->skewBucket[skew_bucket_no]->tuples.private;
+}
+
+/*
+ * Get the first tuple in a given bucket identified by number.
+ */
+static HashJoinTuple
+first_tuple_in_bucket(HashJoinTable table, int bucket_no)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		dsa_pointer p =
+			dsa_pointer_atomic_read(&table->buckets[bucket_no].shared);
+		return (HashJoinTuple) dsa_get_address(table->area, p);
+	}
+	else
+		return table->buckets[bucket_no].private;
+}
+
+/*
+ * Insert a tuple at the front of a given bucket identified by number.  For
+ * shared hash joins, tuple_shared must be provided, pointing to the tuple in
+ * the dsa_area backing the table.  For private hash joins, it should be
+ * InvalidDsaPointer.
+ */
+static void
+insert_tuple_into_bucket(HashJoinTable table, int bucket_no,
+						 HashJoinTuple tuple, dsa_pointer tuple_shared)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		Assert(tuple == dsa_get_address(table->area, tuple_shared));
+		for (;;)
+		{
+			tuple->next.shared =
+				dsa_pointer_atomic_read(&table->buckets[bucket_no].shared);
+			if (dsa_pointer_atomic_compare_exchange(&table->buckets[bucket_no].shared,
+													&tuple->next.shared,
+													tuple_shared))
+				break;
+		}
+	}
+	else
+	{
+		tuple->next.private = table->buckets[bucket_no].private;
+		table->buckets[bucket_no].private = tuple;
+	}
+}
+
+/*
+ * Insert a tuple at the front of a given skew bucket identified by number.
+ * For shared hash joins, tuple_shared must be provided, pointing to the tuple
+ * in the dsa_area backing the table.  For private hash joins, it should be
+ * InvalidDsaPointer.
+ */
+static void
+insert_tuple_into_skew_bucket(HashJoinTable table, int skew_bucket_no,
+							  HashJoinTuple tuple,
+							  dsa_pointer tuple_shared)
+{
+	if (HashJoinTableIsShared(table))
+	{
+		tuple->next.shared =
+			table->skewBucket[skew_bucket_no]->tuples.shared;
+		table->skewBucket[skew_bucket_no]->tuples.shared = tuple_shared;
+	}
+	else
+	{
+		tuple->next.private = table->skewBucket[skew_bucket_no]->tuples.private;
+		table->skewBucket[skew_bucket_no]->tuples.private = tuple;
+	}
+}
+
+/*
  * ExecScanHashBucket
  *		scan a hash bucket for matches to the current outer tuple
  *
@@ -1073,11 +2009,12 @@ ExecScanHashBucket(HashJoinState *hjstate,
 	 * otherwise scan the standard hashtable bucket.
 	 */
 	if (hashTuple != NULL)
-		hashTuple = hashTuple->next;
+		hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 	else if (hjstate->hj_CurSkewBucketNo != INVALID_SKEW_BUCKET_NO)
-		hashTuple = hashtable->skewBucket[hjstate->hj_CurSkewBucketNo]->tuples;
+		hashTuple = first_tuple_in_skew_bucket(hashtable,
+											   hjstate->hj_CurSkewBucketNo);
 	else
-		hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
+		hashTuple = first_tuple_in_bucket(hashtable, hjstate->hj_CurBucketNo);
 
 	while (hashTuple != NULL)
 	{
@@ -1101,7 +2038,7 @@ ExecScanHashBucket(HashJoinState *hjstate,
 			}
 		}
 
-		hashTuple = hashTuple->next;
+		hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 	}
 
 	/*
@@ -1144,6 +2081,81 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	HashJoinTuple hashTuple = hjstate->hj_CurTuple;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_UNMATCHED_BATCH(hashtable->curbatch));
+
+		/*
+		 * For the parallel verison, we'll let each participant pull chunks
+		 * from the queue to work on independently.
+		 */
+		for (;;)
+		{
+			/* Do we need a new chunk? */
+			if (hashtable->chunk == NULL)
+			{
+				dsa_pointer chunk_shared;
+
+				/*
+				 * Try to pop a chunk from the unmatched queue, and put it
+				 * back on the main chunks list.
+				 */
+				LWLockAcquire(&hashtable->shared->chunk_lock, LW_EXCLUSIVE);
+				hashtable->chunk =
+					ExecHashPopChunk(hashtable, &chunk_shared,
+									 &hashtable->shared->chunks_unmatched);
+				if (hashtable->chunk != NULL)
+					ExecHashPushChunk(hashtable, hashtable->chunk,
+									  chunk_shared,
+									  &hashtable->shared->chunks);
+				LWLockRelease(&hashtable->shared->chunk_lock);
+
+				/* If no more chunks in the queue: we're done. */
+				if (hashtable->chunk == NULL)
+					return false;
+
+				hashtable->chunk_unmatched_pos = 0;
+			}
+
+			/* Does the current chunk have any more tuples? */
+			if (hashtable->chunk_unmatched_pos >= hashtable->chunk->used)
+			{
+				/* Try a new chunk. */
+				hashtable->chunk = NULL;
+				continue;
+			}
+			hashTuple = (HashJoinTuple)
+				hashtable->chunk->data + hashtable->chunk_unmatched_pos;
+
+			/* Move to the next tuple in this chunk. */
+			hashtable->chunk_unmatched_pos +=
+				HJTUPLE_OVERHEAD + HJTUPLE_MINTUPLE(hashTuple)->t_len;
+
+			/* Is it unmatched? */
+			if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(hashTuple)))
+			{
+				TupleTableSlot *inntuple;
+
+				/* insert hashtable's tuple into exec slot */
+				inntuple = ExecStoreMinimalTuple(HJTUPLE_MINTUPLE(hashTuple),
+												 hjstate->hj_HashTupleSlot,
+												 false);		/* do not pfree */
+				econtext->ecxt_innertuple = inntuple;
+
+				/*
+				 * Reset temp memory each time; although this function doesn't
+				 * do any qual eval, the caller will, so let's keep it
+				 * parallel to ExecScanHashBucket.
+				 */
+				ResetExprContext(econtext);
+
+				hjstate->hj_CurTuple = hashTuple;
+				return true;
+			}
+		}
+	}
+
 	for (;;)
 	{
 		/*
@@ -1152,21 +2164,21 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 		 * bucket.
 		 */
 		if (hashTuple != NULL)
-			hashTuple = hashTuple->next;
-		else if (hjstate->hj_CurBucketNo < hashtable->nbuckets)
+			hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		{
-			hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];
-			hjstate->hj_CurBucketNo++;
-		}
-		else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets)
-		{
-			int			j = hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo];
+			if (hjstate->hj_CurBucketNo < hashtable->nbuckets)
+				hashTuple = first_tuple_in_bucket(hashtable,
+												  hjstate->hj_CurBucketNo++);
+			else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets)
+			{
+				int			j = hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo];
 
-			hashTuple = hashtable->skewBucket[j]->tuples;
-			hjstate->hj_CurSkewBucketNo++;
+				hashTuple = first_tuple_in_skew_bucket(hashtable, j);
+				hjstate->hj_CurSkewBucketNo++;
+			}
+			else
+				break;				/* finished all buckets */
 		}
-		else
-			break;				/* finished all buckets */
 
 		while (hashTuple != NULL)
 		{
@@ -1191,7 +2203,7 @@ ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 				return true;
 			}
 
-			hashTuple = hashTuple->next;
+			hashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		}
 	}
 
@@ -1212,6 +2224,59 @@ ExecHashTableReset(HashJoinTable hashtable)
 	MemoryContext oldcxt;
 	int			nbuckets = hashtable->nbuckets;
 
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Wait for all workers to finish accessing the hash table. */
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_UNMATCHED);
+		if (BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_UNMATCHED))
+		{
+			/* Serial phase: set up hash table for new batch. */
+			int i;
+
+			Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+				   PHJ_SUBPHASE_PROMOTING);
+
+			/* Clear the hash table. */
+			for (i = 0; i < nbuckets; ++i)
+				dsa_pointer_atomic_write(&hashtable->buckets[i].shared,
+										 InvalidDsaPointer);
+
+			/* Free all the chunks. */
+			/* TODO: Put them on a freelist instead?  Better than making one backend free them all! */
+			while (DsaPointerIsValid(hashtable->shared->chunks))
+			{
+				HashMemoryChunk chunk = (HashMemoryChunk)
+					dsa_get_address(hashtable->area, hashtable->shared->chunks);
+				dsa_pointer next = chunk->next.shared;
+
+				dsa_free(hashtable->area, hashtable->shared->chunks);
+				hashtable->shared->chunks = next;
+			}
+
+			/* Any preloaded chunks for the next batch need to be bucketed. */
+			hashtable->shared->chunks_to_rebucket =
+				hashtable->shared->chunks_preloaded;
+			hashtable->shared->chunks_preloaded = InvalidDsaPointer;
+
+			/* Update the hash table size: it now has the preloaded chunks. */
+			hashtable->shared->size =
+				(hashtable->nbuckets * sizeof(HashJoinBucketHead)) +
+				hashtable->shared->size_preloaded;
+			hashtable->shared->size_preloaded = 0;
+		}
+		/* Wait again, so that all workers now have the new table. */
+		BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_PROMOTING);
+		Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+			   PHJ_SUBPHASE_LOADING);
+		ExecHashUpdate(hashtable);
+
+		/* Forget the current chunks. */
+		hashtable->chunk = NULL;
+		hashtable->chunk_preload = NULL;
+		return;
+	}
+
 	/*
 	 * Release all the hash buckets and tuples acquired in the prior pass, and
 	 * reinitialize the context for a new pass.
@@ -1220,15 +2285,15 @@ ExecHashTableReset(HashJoinTable hashtable)
 	oldcxt = MemoryContextSwitchTo(hashtable->batchCxt);
 
 	/* Reallocate and reinitialize the hash bucket headers. */
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	hashtable->buckets = (HashJoinBucketHead *)
+		palloc0(nbuckets * sizeof(HashJoinBucketHead));
 
 	hashtable->spaceUsed = 0;
 
 	MemoryContextSwitchTo(oldcxt);
 
 	/* Forget the chunks (the memory was freed by the context reset above). */
-	hashtable->chunks = NULL;
+	hashtable->chunk = NULL;
 }
 
 /*
@@ -1241,10 +2306,14 @@ ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 	HashJoinTuple tuple;
 	int			i;
 
+	/* TODO: share this work out? */
+
 	/* Reset all flags in the main table ... */
 	for (i = 0; i < hashtable->nbuckets; i++)
 	{
-		for (tuple = hashtable->buckets[i]; tuple != NULL; tuple = tuple->next)
+		for (tuple = first_tuple_in_bucket(hashtable, i);
+			 tuple != NULL;
+			 tuple = next_tuple_in_bucket(hashtable, tuple))
 			HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(tuple));
 	}
 
@@ -1252,9 +2321,10 @@ ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 	for (i = 0; i < hashtable->nSkewBuckets; i++)
 	{
 		int			j = hashtable->skewBucketNums[i];
-		HashSkewBucket *skewBucket = hashtable->skewBucket[j];
 
-		for (tuple = skewBucket->tuples; tuple != NULL; tuple = tuple->next)
+		for (tuple = first_tuple_in_skew_bucket(hashtable, j);
+			 tuple != NULL;
+			 tuple = next_tuple_in_bucket(hashtable, tuple))
 			HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(tuple));
 	}
 }
@@ -1414,11 +2484,11 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 				continue;
 
 			/* Okay, create a new skew bucket for this hashvalue. */
-			hashtable->skewBucket[bucket] = (HashSkewBucket *)
+			hashtable->skewBucket[bucket] = (HashSkewBucket *) /* TODO */
 				MemoryContextAlloc(hashtable->batchCxt,
 								   sizeof(HashSkewBucket));
 			hashtable->skewBucket[bucket]->hashvalue = hashvalue;
-			hashtable->skewBucket[bucket]->tuples = NULL;
+			hashtable->skewBucket[bucket]->tuples.private = NULL;
 			hashtable->skewBucketNums[hashtable->nSkewBuckets] = bucket;
 			hashtable->nSkewBuckets++;
 			hashtable->spaceUsed += SKEW_BUCKET_OVERHEAD;
@@ -1496,18 +2566,29 @@ ExecHashSkewTableInsert(HashJoinTable hashtable,
 	MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
 	HashJoinTuple hashTuple;
 	int			hashTupleSize;
+	dsa_pointer tuple_pointer;
 
 	/* Create the HashJoinTuple */
 	hashTupleSize = HJTUPLE_OVERHEAD + tuple->t_len;
-	hashTuple = (HashJoinTuple) MemoryContextAlloc(hashtable->batchCxt,
-												   hashTupleSize);
+	if (HashJoinTableIsShared(hashtable))
+	{
+		tuple_pointer = dsa_allocate(hashtable->area, hashTupleSize);
+		hashTuple = (HashJoinTuple) dsa_get_address(hashtable->area,
+													tuple_pointer);
+	}
+	else
+	{
+		tuple_pointer = InvalidDsaPointer;
+		hashTuple = (HashJoinTuple) MemoryContextAlloc(hashtable->batchCxt,
+													   hashTupleSize);
+	}
 	hashTuple->hashvalue = hashvalue;
 	memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len);
 	HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple));
 
 	/* Push it onto the front of the skew bucket's list */
-	hashTuple->next = hashtable->skewBucket[bucketNumber]->tuples;
-	hashtable->skewBucket[bucketNumber]->tuples = hashTuple;
+	insert_tuple_into_skew_bucket(hashtable, bucketNumber, hashTuple,
+								  tuple_pointer);
 
 	/* Account for space used, and back off if we've used too much */
 	hashtable->spaceUsed += hashTupleSize;
@@ -1538,6 +2619,9 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 	int			batchno;
 	HashJoinTuple hashTuple;
 
+	/* TODO: skew buckets not yet supported for parallel mode */
+	Assert(!HashJoinTableIsShared(hashtable));
+
 	/* Locate the bucket to remove */
 	bucketToRemove = hashtable->skewBucketNums[hashtable->nSkewBuckets - 1];
 	bucket = hashtable->skewBucket[bucketToRemove];
@@ -1552,10 +2636,10 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 	ExecHashGetBucketAndBatch(hashtable, hashvalue, &bucketno, &batchno);
 
 	/* Process all tuples in the bucket */
-	hashTuple = bucket->tuples;
+	hashTuple = first_tuple_in_skew_bucket(hashtable, bucketToRemove);
 	while (hashTuple != NULL)
 	{
-		HashJoinTuple nextHashTuple = hashTuple->next;
+		HashJoinTuple nextHashTuple = next_tuple_in_bucket(hashtable, hashTuple);
 		MinimalTuple tuple;
 		Size		tupleSize;
 
@@ -1581,8 +2665,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 			memcpy(copyTuple, hashTuple, tupleSize);
 			pfree(hashTuple);
 
-			copyTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = copyTuple;
+			insert_tuple_into_bucket(hashtable, bucketno, copyTuple,
+									 InvalidDsaPointer);
 
 			/* We have reduced skew space, but overall space doesn't change */
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -1591,8 +2675,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 		{
 			/* Put the tuple into a temp file for later batches */
 			Assert(batchno > hashtable->curbatch);
-			ExecHashJoinSaveTuple(tuple, hashvalue,
-								  &hashtable->innerBatchFile[batchno]);
+			ExecHashJoinSaveTuple(hashtable, tuple, hashvalue,
+								  batchno, true);
 			pfree(hashTuple);
 			hashtable->spaceUsed -= tupleSize;
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -1636,6 +2720,198 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 }
 
 /*
+ * Allocate 'size' bytes from the currently active shared HashMemoryChunk, or
+ * create a new chunk if necessary.  This is similar to the private memory
+ * version, but also deals with 'preload' chunks and coordination with other
+ * participants.
+ *
+ * If respect_work_mem is true, then return NULL if the number of batches has
+ * been increased in order to avoid exceeding work_mem.  Pass false to allow
+ * work_mem to be exceeded (as can be temporarily needed by ExecHashShrink, or
+ * if increasing the number of batches doesn't seem to be helping us shrink
+ * the memory usage).
+ */
+static void *
+dense_alloc_shared(HashJoinTable hashtable,
+				   Size size,
+				   dsa_pointer *shared,
+				   bool preload,
+				   bool respect_work_mem)
+{
+	dsa_pointer chunk_shared;
+	HashMemoryChunk chunk;
+	Size chunk_size;
+
+	/* just in case the size is not already aligned properly */
+	size = MAXALIGN(size);
+
+	/*
+	 * Fast path: if there is enough space in this backend's current chunk,
+	 * then we can allocate without any locking or work_mem accounting.  If
+	 * HASH_CHUNK_SIZE is large enough, this strategy should keep lock
+	 * contention low.  It doesn't matter if another participant has decided
+	 * to increase the number of batches; we'll finish filling up this chunk
+	 * and then find out about the increase when we need to allocate a new
+	 * chunk.
+	 */
+	chunk = preload ? hashtable->chunk_preload : hashtable->chunk;
+	if (chunk != NULL &&
+		size < HASH_CHUNK_THRESHOLD &&
+		chunk->maxlen - chunk->used >= size)
+	{
+		void *result;
+
+		chunk_shared = preload
+			? hashtable->chunk_preload_shared
+			: hashtable->chunk_shared;
+		Assert(chunk == dsa_get_address(hashtable->area, chunk_shared));
+		*shared = chunk_shared +
+			offsetof(HashMemoryChunkData, data) +
+			chunk->used;
+		result = chunk->data + chunk->used;
+		chunk->used += size;
+		chunk->ntuples += 1;
+
+		Assert(chunk->used <= chunk->maxlen);
+		Assert(result == dsa_get_address(hashtable->area, *shared));
+
+		return result;
+	}
+
+	/*
+	 * Slow path: try to allocate a new chunk, while also coordinating with
+	 * other participants to keep memory usage under work_mem by increasing
+	 * the number of batches as required.
+	 */
+	LWLockAcquire(&hashtable->shared->chunk_lock, LW_EXCLUSIVE);
+
+	/* Check if some other participant has increased nbatch. */
+	if (hashtable->shared->nbatch > hashtable->nbatch)
+	{
+		Assert(!preload);
+		Assert(respect_work_mem);
+		extend_batch_file_arrays(hashtable, hashtable->shared->nbatch);
+
+		hashtable->chunk = NULL;
+		hashtable->chunk_shared = InvalidDsaPointer;
+		LWLockRelease(&hashtable->shared->chunk_lock);
+
+		/*
+		 * Whenever nbatch changes, every participant attached to
+		 * shrink_barrier must run ExecHashShrink to help shrink the hash
+		 * table.  So return NULL to tell caller to go and do that.
+		 */
+		return NULL;
+	}
+
+	/* Oversized tuples get their own chunk. */
+	if (size > HASH_CHUNK_THRESHOLD)
+		chunk_size = size + offsetof(HashMemoryChunkData, data);
+	else
+		chunk_size = HASH_CHUNK_SIZE;
+
+	/* If appropriate, check if work_mem would be exceeded by a new chunk. */
+	if (respect_work_mem &&
+		hashtable->shared->shrinking_enabled &&
+		(hashtable->shared->size +
+		 hashtable->shared->size_preloaded +
+		 chunk_size) > (work_mem * 1024L))
+	{
+		/*
+		 * It would.  If allocating for the current batch (ie not preloading
+		 * the next batch), increase number of batches so we can shrink the
+		 * hash table.
+		 */
+		if (!preload)
+		{
+			hashtable->shared->nbatch *= 2;
+			extend_batch_file_arrays(hashtable, hashtable->shared->nbatch);
+
+			/* All allocated chunks now need to be shrunk. */
+			hashtable->shared->chunks_to_shrink = hashtable->shared->chunks;
+			hashtable->shared->chunks = InvalidDsaPointer;
+			hashtable->shared->tuples_in_memory = 0;
+			hashtable->shared->tuples_written_out = 0;
+
+			hashtable->chunk = NULL;
+			hashtable->chunk_shared = InvalidDsaPointer;
+		}
+		LWLockRelease(&hashtable->shared->chunk_lock);
+
+		/*
+		 * If the caller is preloading, it should now stop doing that because
+		 * there is no more work_mem.  If it is loading, it should now run
+		 * ExecHashShrink so we can get some memory back.
+		 */
+		return NULL;
+	}
+
+	/* We are cleared to allocate a new chunk. */
+	chunk_shared = dsa_allocate(hashtable->area, chunk_size);
+	if (!DsaPointerIsValid(chunk_shared))
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("out of memory")));
+	TRACE_POSTGRESQL_HASH_ALLOCATE_CHUNK(chunk_size);
+	if (preload)
+		hashtable->shared->size_preloaded += chunk_size;
+	else
+		hashtable->shared->size += chunk_size;
+
+	/* Set up the chunk. */
+	chunk = (HashMemoryChunk) dsa_get_address(hashtable->area, chunk_shared);
+	*shared = chunk_shared + offsetof(HashMemoryChunkData, data);
+	chunk->maxlen = chunk_size - offsetof(HashMemoryChunkData, data);
+	chunk->used = size;
+	chunk->ntuples = 1;
+
+	/*
+	 * Push it onto the appropriate list of chunks, so that it can be found if
+	 * we need to rebucket or shrink the whole hash table.
+	 */
+	ExecHashPushChunk(hashtable, chunk, chunk_shared,
+					  preload
+					  ? &hashtable->shared->chunks_preloaded
+					  : &hashtable->shared->chunks);
+
+	if (size > HASH_CHUNK_THRESHOLD)
+	{
+		/*
+		 * Count oversized tuples immediately, but don't bother making this
+		 * chunk the 'current' chunk because it has no more space in it for
+		 * next time.
+		 */
+		if (preload)
+			++hashtable->shared->tuples_next_batch;
+		else
+			++hashtable->shared->tuples_this_batch;
+	}
+	else
+	{
+		/*
+		 * Make this the current chunk so that we can use the fast path to
+		 * fill the rest of it up in future called.  We will count this tuple
+		 * later, when the chunk is full.
+		 */
+		if (preload)
+		{
+			hashtable->chunk_preload = chunk;
+			hashtable->chunk_preload_shared = chunk_shared;
+		}
+		else
+		{
+			hashtable->chunk = chunk;
+			hashtable->chunk_shared = chunk_shared;
+		}
+	}
+	LWLockRelease(&hashtable->shared->chunk_lock);
+
+	Assert(chunk->data == dsa_get_address(hashtable->area, *shared));
+
+	return chunk->data;
+}
+
+/*
  * Allocate 'size' bytes from the currently active HashMemoryChunk
  */
 static void *
@@ -1653,26 +2929,28 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 */
 	if (size > HASH_CHUNK_THRESHOLD)
 	{
+
 		/* allocate new chunk and put it at the beginning of the list */
-		newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt,
-								 offsetof(HashMemoryChunkData, data) + size);
+		newChunk = (HashMemoryChunk)
+			MemoryContextAlloc(hashtable->batchCxt,
+							   offsetof(HashMemoryChunkData, data) + size);
 		newChunk->maxlen = size;
 		newChunk->used = 0;
-		newChunk->ntuples = 0;
+		newChunk->ntuples=  0;
 
 		/*
 		 * Add this chunk to the list after the first existing chunk, so that
 		 * we don't lose the remaining space in the "current" chunk.
 		 */
-		if (hashtable->chunks != NULL)
+		if (hashtable->chunk != NULL)
 		{
-			newChunk->next = hashtable->chunks->next;
-			hashtable->chunks->next = newChunk;
+			newChunk->next.private = hashtable->chunk->next.private;
+			hashtable->chunk->next.private = newChunk;
 		}
 		else
 		{
-			newChunk->next = hashtable->chunks;
-			hashtable->chunks = newChunk;
+			newChunk->next.private = NULL;
+			hashtable->chunk = newChunk;
 		}
 
 		newChunk->used += size;
@@ -1685,27 +2963,27 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 * See if we have enough space for it in the current chunk (if any). If
 	 * not, allocate a fresh chunk.
 	 */
-	if ((hashtable->chunks == NULL) ||
-		(hashtable->chunks->maxlen - hashtable->chunks->used) < size)
+	if ((hashtable->chunk == NULL) ||
+		(hashtable->chunk->maxlen - hashtable->chunk->used) < size)
 	{
 		/* allocate new chunk and put it at the beginning of the list */
-		newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt,
-					  offsetof(HashMemoryChunkData, data) + HASH_CHUNK_SIZE);
-
+		newChunk = (HashMemoryChunk)
+			MemoryContextAlloc(hashtable->batchCxt,
+							   offsetof(HashMemoryChunkData, data) +
+							   HASH_CHUNK_SIZE);
+		newChunk->next.private = hashtable->chunk;
+		hashtable->chunk = newChunk;
 		newChunk->maxlen = HASH_CHUNK_SIZE;
 		newChunk->used = size;
 		newChunk->ntuples = 1;
 
-		newChunk->next = hashtable->chunks;
-		hashtable->chunks = newChunk;
-
 		return newChunk->data;
 	}
 
 	/* There is enough space in the current chunk, let's add the tuple */
-	ptr = hashtable->chunks->data + hashtable->chunks->used;
-	hashtable->chunks->used += size;
-	hashtable->chunks->ntuples += 1;
+	ptr = hashtable->chunk->data + hashtable->chunk->used;
+	hashtable->chunk->used += size;
+	hashtable->chunk->ntuples += 1;
 
 	/* return pointer to the start of the tuple memory */
 	return ptr;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index b41e4e2..e267bab 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -21,8 +21,10 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/barrier.h"
 #include "utils/memutils.h"
-
+#include "utils/probes.h"
 
 /*
  * States of the ExecHashJoin state machine
@@ -42,11 +44,16 @@
 static TupleTableSlot *ExecHashJoinOuterGetTuple(PlanState *outerNode,
 						  HashJoinState *hjstate,
 						  uint32 *hashvalue);
-static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
-						  BufFile *file,
+static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinTable hashtable,
 						  uint32 *hashvalue,
 						  TupleTableSlot *tupleSlot);
 static bool ExecHashJoinNewBatch(HashJoinState *hjstate);
+static void ExecHashJoinLoadBatch(HashJoinState *hjstate);
+static void ExecHashJoinExportAllBatches(HashJoinTable hashtable);
+static void ExecHashJoinExportBatch(HashJoinTable hashtable, int batchno, bool inner);
+static void ExecHashJoinImportBatch(HashJoinTable hashtable,
+									HashJoinBatchReader *reader);
+static void ExecHashJoinPreloadNextBatch(HashJoinState *hjstate);
 
 
 /* ----------------------------------------------------------------
@@ -147,6 +154,14 @@ ExecHashJoin(HashJoinState *node)
 					/* no chance to not build the hash table */
 					node->hj_FirstOuterTupleSlot = NULL;
 				}
+				else if (hashNode->shared_table_data != NULL)
+				{
+					/*
+					 * TODO: The empty-outer optimization is not implemented
+					 * for shared hash tables yet.
+					 */
+					node->hj_FirstOuterTupleSlot = NULL;
+				}
 				else if (HJ_FILL_OUTER(node) ||
 						 (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
 						  !node->hj_OuterNotEmpty))
@@ -166,7 +181,7 @@ ExecHashJoin(HashJoinState *node)
 				/*
 				 * create the hash table
 				 */
-				hashtable = ExecHashTableCreate((Hash *) hashNode->ps.plan,
+				hashtable = ExecHashTableCreate(hashNode,
 												node->hj_HashOperators,
 												HJ_FILL_INNER(node));
 				node->hj_HashTable = hashtable;
@@ -177,12 +192,57 @@ ExecHashJoin(HashJoinState *node)
 				hashNode->hashtable = hashtable;
 				(void) MultiExecProcNode((PlanState *) hashNode);
 
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Assert(BarrierPhase(&hashtable->shared->barrier) >=
+						   PHJ_PHASE_HASHING);
+
+					/*
+					 * Check if we are a worker that attached too late to
+					 * avoid deadlock risk with the leader, or a leader that
+					 * arrived here too late.
+					 */
+					if (ExecHashCheckForEarlyExit(hashtable))
+					{
+						/*
+						 * Other participants will need to handle all future
+						 * batches written by me.  We don't detach until after
+						 * we've exported all batches, otherwise the phase
+						 * might advance and another participant might try to
+						 * import them.
+						 */
+						if (BarrierPhase(&hashtable->shared->barrier) <=
+							PHJ_PHASE_PROBING)
+							ExecHashJoinExportAllBatches(hashtable);
+						BarrierDetach(&hashtable->shared->barrier);
+						hashtable->detached_early = true;
+						return NULL;
+					}
+
+					/*
+					 * Export just the next batch, if there is one, because it
+					 * is now read-only and other participants may decide to
+					 * read from it.  Future batches can still be written to
+					 * if work_mem is exceeded by any future batch and we
+					 * decide to increase their number, so we can't export
+					 * those yet.  We'll export the batch files written by
+					 * each participant only as they become read-only, but
+					 * before any participant reads from them.
+					 */
+					if (hashtable->nbatch > 1)
+					{
+						ExecHashJoinExportBatch(hashtable, 1, false);
+						ExecHashJoinExportBatch(hashtable, 1, true);
+					}
+				}
+
 				/*
 				 * If the inner relation is completely empty, and we're not
 				 * doing a left outer join, we can quit without scanning the
 				 * outer relation.
 				 */
-				if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
+				if (!HashJoinTableIsShared(hashtable) && /* TODO:TM */
+					hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
 					return NULL;
 
 				/*
@@ -198,12 +258,73 @@ ExecHashJoin(HashJoinState *node)
 				 */
 				node->hj_OuterNotEmpty = false;
 
-				node->hj_JoinState = HJ_NEED_NEW_OUTER;
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Barrier *barrier = &hashtable->shared->barrier;
+					int phase = BarrierPhase(barrier);
+
+					/*
+					 * Map the current phase to the appropriate initial state
+					 * for this worker, so we can get started.
+					 */
+					Assert(BarrierPhase(barrier) >= PHJ_PHASE_PROBING);
+					hashtable->curbatch = PHJ_PHASE_TO_BATCHNO(phase);
+					switch (PHJ_PHASE_TO_SUBPHASE(phase))
+					{
+					case PHJ_SUBPHASE_PROMOTING:
+						/* Wait for serial phase to finish. */
+						BarrierWait(barrier, WAIT_EVENT_HASHJOIN_PROMOTING);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_LOADING);
+						/* fall through */
+					case PHJ_SUBPHASE_LOADING:
+						/* Help load the current batch. */
+						ExecHashUpdate(hashtable);
+						ExecHashJoinOpenBatch(hashtable, hashtable->curbatch,
+											  true);
+						ExecHashJoinLoadBatch(node);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_PROBING);
+						/* fall through */
+					case PHJ_SUBPHASE_PREPARING:
+						/* Wait for serial phase to finish. */
+						BarrierWait(barrier, WAIT_EVENT_HASHJOIN_PROMOTING);
+						Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(barrier)) ==
+							   PHJ_SUBPHASE_PROBING);
+						/* fall through */
+					case PHJ_SUBPHASE_PROBING:
+						/* Help probe the current batch. */
+						ExecHashUpdate(hashtable);
+						ExecHashJoinOpenBatch(hashtable, hashtable->curbatch,
+											  false);
+						node->hj_JoinState = HJ_NEED_NEW_OUTER;
+						break;
+					case PHJ_SUBPHASE_UNMATCHED:
+						/* Help scan for unmatched inner tuples. */
+						ExecHashUpdate(hashtable);
+						node->hj_JoinState = HJ_FILL_INNER_TUPLES;
+						break;
+					}
+					continue;
+				}
+				else
+				{
+					node->hj_JoinState = HJ_NEED_NEW_OUTER;
+					ExecHashJoinOpenBatch(hashtable, 0, false);
+				}
 
 				/* FALL THRU */
 
 			case HJ_NEED_NEW_OUTER:
 
+				if (HashJoinTableIsShared(hashtable))
+				{
+					Assert(PHJ_PHASE_TO_BATCHNO(BarrierPhase(&hashtable->shared->barrier)) ==
+						   hashtable->curbatch);
+					Assert(PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+						   PHJ_SUBPHASE_PROBING);
+				}
+
 				/*
 				 * We don't have an outer tuple, try to get the next one
 				 */
@@ -213,6 +334,67 @@ ExecHashJoin(HashJoinState *node)
 				if (TupIsNull(outerTupleSlot))
 				{
 					/* end of batch, or maybe whole join */
+
+					/*
+					 * Switch to reading tuples from the next inner batch.  We
+					 * do this here because in the shared hash table case we
+					 * want to do this before ExecHashJoinPreloadNextBatch.
+					 */
+					if (hashtable->curbatch + 1 < hashtable->nbatch)
+						ExecHashJoinOpenBatch(hashtable,
+											  hashtable->curbatch + 1,
+											  true);
+
+					if (HashJoinTableIsShared(hashtable))
+					{
+						/*
+						 * Check if we are a leader that can't go further than
+						 * probing the first batch without deadlock risk,
+						 * because there are workers running.
+						 */
+						if (ExecHashCheckForEarlyExit(hashtable))
+						{
+							/*
+							 * Other backends will need to handle all future
+							 * batches written by me.  We don't detach until
+							 * after we've exported all batches, otherwise
+							 * another participant might try to import them
+							 * too soon.
+							 */
+							ExecHashJoinExportAllBatches(hashtable);
+							BarrierDetach(&hashtable->shared->barrier);
+							hashtable->detached_early = true;
+							return NULL;
+						}
+
+						/*
+						 * We may be able to load some amount of the next
+						 * batch into spare work_mem, before we start waiting
+						 * for other workers to finish probing the current
+						 * batch.
+						 */
+						ExecHashJoinPreloadNextBatch(node);
+
+						/*
+						 * We can't start searching for unmatched tuples until
+						 * all participants have finished probing, so we
+						 * synchronize here.
+						 */
+						if (BarrierWait(&hashtable->shared->barrier,
+										WAIT_EVENT_HASHJOIN_PROBING))
+						{
+							/* Serial phase: prepare for unmatched. */
+							if (HJ_FILL_INNER(node))
+							{
+								hashtable->chunk = NULL;
+								hashtable->shared->chunks_unmatched =
+									hashtable->shared->chunks;
+								hashtable->shared->chunks = InvalidDsaPointer;
+							}
+						}
+						Assert(BarrierPhase(&hashtable->shared->barrier) ==
+							   PHJ_PHASE_UNMATCHED_BATCH(hashtable->curbatch));
+					}
 					if (HJ_FILL_INNER(node))
 					{
 						/* set up to scan for unmatched inner tuples */
@@ -250,9 +432,9 @@ ExecHashJoin(HashJoinState *node)
 					 * Save it in the corresponding outer-batch file.
 					 */
 					Assert(batchno > hashtable->curbatch);
-					ExecHashJoinSaveTuple(ExecFetchSlotMinimalTuple(outerTupleSlot),
-										  hashvalue,
-										&hashtable->outerBatchFile[batchno]);
+					ExecHashJoinSaveTuple(hashtable,
+										  ExecFetchSlotMinimalTuple(outerTupleSlot),
+										  hashvalue, batchno, false);
 					/* Loop around, staying in HJ_NEED_NEW_OUTER state */
 					continue;
 				}
@@ -296,6 +478,13 @@ ExecHashJoin(HashJoinState *node)
 				if (joinqual == NIL || ExecQual(joinqual, econtext, false))
 				{
 					node->hj_MatchedOuter = true;
+					/*
+					 * Note: it is OK to do this in a shared hash table
+					 * without any kind of memory synchronization, because the
+					 * only transition is 0->1, so ordering doesn't matter if
+					 * several backends do it, and there will be a memory
+					 * barrier before anyone reads it.
+					 */
 					HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
 
 					/* In an antijoin, we never return a matched tuple */
@@ -631,6 +820,88 @@ ExecEndHashJoin(HashJoinState *node)
 	ExecEndNode(innerPlanState(node));
 }
 
+void
+ExecShutdownHashJoin(HashJoinState *node)
+{
+	/*
+	 * TODO: Figure out how to handle this!  For now, just clear the shared
+	 * hash table so that ExecEndHashJoin won't blow up when it's called after
+	 * the dsa_area has been detached...
+	 */
+	if (node->hj_HashTable)
+		node->hj_HashTable->shared = NULL;
+}
+
+/*
+ * For shared hash joins, load as much of the next batch as we can as part of
+ * the probing phase for the current batch.  This overlapping means that we do
+ * something useful with a CPU and the spare memory before we start waiting
+ * for other workers.
+ */
+static void
+ExecHashJoinPreloadNextBatch(HashJoinState *hjstate)
+{
+	HashJoinTable hashtable = hjstate->hj_HashTable;
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Barrier *barrier PG_USED_FOR_ASSERTS_ONLY = &hashtable->shared->barrier;
+		int curbatch = hashtable->curbatch;
+		int next_batch = curbatch + 1;
+		TupleTableSlot *slot;
+		uint32 hashvalue;
+
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_PROBING_BATCH(curbatch));
+
+		/*
+		 * TODO: We can't preload batch 1 at the end of probing batch 0,
+		 * because the leader might call ExecHashJoinExportAllBatches() during
+		 * that phase.  Batches can't be exported by one backend and imported
+		 * and accessed by another in the same phase.  Is there a way to
+		 * reorder things and avoid that problem?
+		 */
+		if (next_batch == 1)
+			return;
+
+		if (next_batch < hashtable->nbatch)
+		{
+			for (;;)
+			{
+				slot = ExecHashJoinGetSavedTuple(hashtable,
+												 &hashvalue,
+												 hjstate->hj_HashTupleSlot);
+				if (slot == NULL)
+				{
+					/*
+					 * We were able to load the whole batch into memory
+					 * without running out of work_mem.
+					 */
+					break;
+				}
+
+				/*
+				 * Try to preload this tuple into a chunk.  It is not actually
+				 * inserted into the hash table yet.
+				 */
+				if (!ExecHashTableInsert(hashtable,
+										 hjstate->hj_HashTupleSlot,
+										 hashvalue,
+										 true)) /* preload */
+				{
+					/*
+					 * There is no more work_mem.  We'll leave this tuple in
+					 * the slot and tell ExecHashJoinLoadBatch to insert it
+					 * once we've finish probing the current hash table.
+					 */
+					hashtable->preloaded_spare_tuple = true;
+					hashtable->preloaded_spare_tuple_hash = hashvalue;
+					return;
+				}
+			}
+		}
+	}
+}
+
 /*
  * ExecHashJoinOuterGetTuple
  *
@@ -680,7 +951,6 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 			{
 				/* remember outer relation is not empty for possible rescan */
 				hjstate->hj_OuterNotEmpty = true;
-
 				return slot;
 			}
 
@@ -699,11 +969,10 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 		 * In outer-join cases, we could get here even though the batch file
 		 * is empty.
 		 */
-		if (file == NULL)
+		if (!HashJoinTableIsShared(hashtable) && file == NULL)
 			return NULL;
 
-		slot = ExecHashJoinGetSavedTuple(hjstate,
-										 file,
+		slot = ExecHashJoinGetSavedTuple(hashtable,
 										 hashvalue,
 										 hjstate->hj_OuterTupleSlot);
 		if (!TupIsNull(slot))
@@ -726,22 +995,26 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	int			nbatch;
 	int			curbatch;
-	BufFile    *innerFile;
-	TupleTableSlot *slot;
-	uint32		hashvalue;
 
 	nbatch = hashtable->nbatch;
 	curbatch = hashtable->curbatch;
 
+	if (HashJoinTableIsShared(hashtable))
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_UNMATCHED_BATCH(curbatch));
+
 	if (curbatch > 0)
 	{
 		/*
 		 * We no longer need the previous outer batch file; close it right
 		 * away to free disk space.
 		 */
+		/* TODO: is this ok for a shared hash table? */
 		if (hashtable->outerBatchFile[curbatch])
+		{
 			BufFileClose(hashtable->outerBatchFile[curbatch]);
-		hashtable->outerBatchFile[curbatch] = NULL;
+			hashtable->outerBatchFile[curbatch] = NULL;
+		}
 	}
 	else	/* we just finished the first batch */
 	{
@@ -776,7 +1049,8 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	 * need to be reassigned.
 	 */
 	curbatch++;
-	while (curbatch < nbatch &&
+	while (!HashJoinTableIsShared(hashtable) &&
+		   curbatch < nbatch &&
 		   (hashtable->outerBatchFile[curbatch] == NULL ||
 			hashtable->innerBatchFile[curbatch] == NULL))
 	{
@@ -792,13 +1066,15 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (hashtable->outerBatchFile[curbatch] &&
 			nbatch != hashtable->nbatch_outstart)
 			break;				/* must process due to rule 3 */
-		/* We can ignore this batch. */
 		/* Release associated temp files right away. */
+		/* TODO review */
 		if (hashtable->innerBatchFile[curbatch])
 			BufFileClose(hashtable->innerBatchFile[curbatch]);
+
 		hashtable->innerBatchFile[curbatch] = NULL;
 		if (hashtable->outerBatchFile[curbatch])
 			BufFileClose(hashtable->outerBatchFile[curbatch]);
+
 		hashtable->outerBatchFile[curbatch] = NULL;
 		curbatch++;
 	}
@@ -812,48 +1088,163 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 	 * Reload the hash table with the new inner batch (which could be empty)
 	 */
 	ExecHashTableReset(hashtable);
+	ExecHashJoinLoadBatch(hjstate);
+
+	return true;
+}
 
-	innerFile = hashtable->innerBatchFile[curbatch];
+static void
+ExecHashJoinLoadBatch(HashJoinState *hjstate)
+{
+	HashJoinTable hashtable = hjstate->hj_HashTable;
+	int			curbatch = hashtable->curbatch;
+	TupleTableSlot *slot;
+	uint32		hashvalue;
+
+	TRACE_POSTGRESQL_HASH_LOADING_START();
 
-	if (innerFile != NULL)
+	if (HashJoinTableIsShared(hashtable))
 	{
-		if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-				   errmsg("could not rewind hash-join temporary file: %m")));
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_LOADING_BATCH(curbatch));
 
-		while ((slot = ExecHashJoinGetSavedTuple(hjstate,
-												 innerFile,
-												 &hashvalue,
-												 hjstate->hj_HashTupleSlot)))
-		{
-			/*
-			 * NOTE: some tuples may be sent to future batches.  Also, it is
-			 * possible for hashtable->nbatch to be increased here!
-			 */
-			ExecHashTableInsert(hashtable, slot, hashvalue);
-		}
+		/*
+		 * Shrinking may be triggered while loading, if work_mem is exceeded.
+		 * We need to be attached to shrink_barrier so that we can coordinate
+		 * that among participants.
+		 */
+		BarrierAttach(&hashtable->shared->shrink_barrier);
+	}
+
+	/*
+	 * In HJ_NEED_NEW_OUTER, we already selected the current inner batch for
+	 * reading from.  If there is a shared hash table, we may have already
+	 * partially loaded the hash table in ExecHashJoinPreloadNextBatch.  It
+	 * may have already loaded one tuple that it couldn't insert, so we'll do
+	 * that first.
+	 */
+	Assert(hashtable->batch_reader.batchno == curbatch);
+	Assert(hashtable->batch_reader.inner);
+
+	if (hashtable->preloaded_spare_tuple)
+	{
+		bool success;
+
+		Assert(HashJoinTableIsShared(hashtable));
+		Assert(!TupIsNull(hjstate->hj_HashTupleSlot));
+		success = ExecHashTableInsert(hashtable, hjstate->hj_HashTupleSlot,
+									  hashtable->preloaded_spare_tuple_hash,
+									  false);
+		Assert(success);
+		hashtable->preloaded_spare_tuple = false;
+	}
+
+	/*
+	 * If we preloaded any tuples, we now need to insert them into the
+	 * hashtable.
+	 */
+	ExecHashRebucket(hashtable);
+
+	/* Finally, we can read in the rest of the batch. */
+	for (;;)
+	{
+		slot = ExecHashJoinGetSavedTuple(hashtable,
+										 &hashvalue,
+										 hjstate->hj_HashTupleSlot);
+
+		if (slot == NULL)
+			break;
 
 		/*
-		 * after we build the hash table, the inner batch file is no longer
-		 * needed
+		 * NOTE: some tuples may be sent to future batches.  Also, it is
+		 * possible for hashtable->nbatch to be increased here!
 		 */
-		BufFileClose(innerFile);
-		hashtable->innerBatchFile[curbatch] = NULL;
+		ExecHashTableInsert(hashtable, slot, hashvalue, false);
+	}
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* We have finished any potential shrinking. */
+		BarrierDetach(&hashtable->shared->shrink_barrier);
 	}
 
+	TRACE_POSTGRESQL_HASH_LOADING_DONE();
+
 	/*
-	 * Rewind outer batch file (if present), so that we can start reading it.
+	 * Now that we have finished loading this batch into the hash table, we
+	 * can set our outer batch read head to the start of the current batch,
+	 * and our inner batch read head to the start of the NEXT batch (as
+	 * expected by ExecHashJoinPreloadNextBatch).
 	 */
-	if (hashtable->outerBatchFile[curbatch] != NULL)
+	if (HashJoinTableIsShared(hashtable))
 	{
-		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-				   errmsg("could not rewind hash-join temporary file: %m")));
+		/*
+		 * Wait until all participants have finished loading their portion of
+		 * the hash table.
+		 */
+		if (BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASHJOIN_LOADING))
+		{
+			/* Serial phase: prepare to read this outer and next inner batch */
+			ExecHashJoinRewindBatches(hashtable, hashtable->curbatch);
+		}
+
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_PREPARING_BATCH(hashtable->curbatch));
+		/*
+		 * Since we have finished loading the current batch into memory, the
+		 * batch files generated by this participant for the next batch are
+		 * now read-only.  So it's time to export them for other participants
+		 * to read from if they run out of tuples to read from their own batch
+		 * files.  We'll export the current outer batch, so that it can be
+		 * used for probing, and the next inner batch so that it can be used
+		 * for preloading tuples for the next batch when that is finished.
+		 */
+		ExecHashJoinExportBatch(hashtable, hashtable->curbatch, false);
+		if (hashtable->curbatch + 1 < hashtable->nbatch)
+			ExecHashJoinExportBatch(hashtable, hashtable->curbatch + 1, true);
+
+		BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASHJOIN_PREPARING);
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
 	}
+	else
+		ExecHashJoinRewindBatches(hashtable, hashtable->curbatch);
 
-	return true;
+	/*
+	 * The inner batch file is no longer needed by any participant, because
+	 * the hash table has been fully reloaded.
+	 */
+	ExecHashJoinCloseBatch(hashtable, hashtable->curbatch, true);
+
+	/* Prepare to read from the current outer batch. */
+	ExecHashJoinOpenBatch(hashtable, hashtable->curbatch, false);
+}
+
+/*
+ * Export a BufFile, copy the descriptor to DSA memory and return the
+ * dsa_pointer.
+ */
+static dsa_pointer
+make_batch_descriptor(dsa_area *area, BufFile *file)
+{
+	dsa_pointer pointer;
+	BufFileDescriptor *source;
+	BufFileDescriptor *target;
+	size_t size;
+
+	source = BufFileExport(file);
+	size = BufFileDescriptorSize(source);
+	pointer = dsa_allocate(area, size);
+	if (!DsaPointerIsValid(pointer))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed on dsa_allocate of size %zu.", size)));
+	target = dsa_get_address(area, pointer);
+	memcpy(target, source, size);
+	pfree(source);
+
+	return pointer;
 }
 
 /*
@@ -868,17 +1259,26 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
  * will get messed up.
  */
 void
-ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr)
+ExecHashJoinSaveTuple(HashJoinTable hashtable,
+					  MinimalTuple tuple, uint32 hashvalue,
+					  int batchno,
+					  bool inner)
 {
-	BufFile    *file = *fileptr;
+	BufFile    *file;
 	size_t		written;
 
+	if (inner)
+		file = hashtable->innerBatchFile[batchno];
+	else
+		file = hashtable->outerBatchFile[batchno];
 	if (file == NULL)
 	{
 		/* First write to this batch file, so open it. */
 		file = BufFileCreateTemp(false);
-		*fileptr = file;
+		if (inner)
+			hashtable->innerBatchFile[batchno] = file;
+		else
+			hashtable->outerBatchFile[batchno] = file;
 	}
 
 	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
@@ -892,57 +1292,519 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to hash-join temporary file: %m")));
+
+	TRACE_POSTGRESQL_HASH_SAVE_TUPLE(HashJoinParticipantNumber(),
+									 batchno,
+									 inner);
+}
+
+/*
+ * Export the inner or outer batch file written by this participant for a
+ * given batch number, so that other backends can import and read from it if
+ * they run out of tuples to read from their own files.  This must be done
+ * after this participant has finished writing to the batch, but before any
+ * other participant might attempt to read from it.
+ */
+static void
+ExecHashJoinExportBatch(HashJoinTable hashtable, int batchno, bool inner)
+{
+	HashJoinParticipantState *participant;
+	BufFile *file;
+
+	TRACE_POSTGRESQL_HASHJOIN_EXPORT_BATCH(HashJoinParticipantNumber(),
+										   batchno,
+										   inner);
+
+	Assert(HashJoinTableIsShared(hashtable));
+	Assert(batchno < hashtable->nbatch);
+
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+
+	/* We will export batches one-by-one. */
+	participant->nbatch = -1;
+
+	if (inner)
+	{
+		participant->inner_batchno = batchno;
+		file = hashtable->innerBatchFile[batchno];
+		if (file != NULL)
+			participant->inner_batch_descriptor =
+				make_batch_descriptor(hashtable->area, file);
+		else
+			participant->inner_batch_descriptor =
+				InvalidDsaPointer;
+	}
+	else
+	{
+		participant->outer_batchno = batchno;
+		file = hashtable->outerBatchFile[batchno];
+		if (file != NULL)
+			participant->outer_batch_descriptor =
+				make_batch_descriptor(hashtable->area, file);
+		else
+			participant->outer_batch_descriptor =
+				InvalidDsaPointer;
+	}
+}
+
+/*
+ * Export all future batches.  This must be called by any backend that exits
+ * early, to make sure that the batch files it wrote to can be consumed by
+ * other participants.
+ */
+static void
+ExecHashJoinExportAllBatches(HashJoinTable hashtable)
+{
+	HashJoinParticipantState *participant;
+	dsa_pointer *inner_batch_descriptors;
+	dsa_pointer *outer_batch_descriptors;
+	Size size;
+	BufFile *file;
+	int i;
+
+	/*
+	 * Sanity check that we are in one of the expected phases, in which no
+	 * other participant could be reading the state we are writing.
+	 *
+	 * TODO: See ExecHashJoinPreloadNextBatch where we can't actually preload
+	 * batch 1 because of this.  Need to figure something better out.
+	 *
+	 */
+	Assert(BarrierPhase(&hashtable->shared->barrier) == PHJ_PHASE_HASHING ||
+		   BarrierPhase(&hashtable->shared->barrier) == PHJ_PHASE_PROBING);
+
+	TRACE_POSTGRESQL_HASHJOIN_EXPORT_ALL_BATCHES(HashJoinParticipantNumber(),
+												 hashtable->nbatch);
+
+	/* If we didn't generate any batches there is nothing to do. */
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+	if (hashtable->nbatch <= 1)
+	{
+		/* No one ever needs to read batch 0. */
+		participant->nbatch = 0;
+		return;
+	}
+
+	/* Set up space for descriptors for all my batches. */
+	participant->nbatch = hashtable->nbatch;
+	size = sizeof(dsa_pointer) * hashtable->nbatch;
+	participant->inner_batch_descriptors = dsa_allocate(hashtable->area, size);
+	participant->outer_batch_descriptors = dsa_allocate(hashtable->area, size);
+	if (!DsaPointerIsValid(participant->inner_batch_descriptors) ||
+		!DsaPointerIsValid(participant->outer_batch_descriptors))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed on dsa_allocate of size %zu.", size)));
+	inner_batch_descriptors =
+		dsa_get_address(hashtable->area,
+						participant->inner_batch_descriptors);
+	outer_batch_descriptors =
+		dsa_get_address(hashtable->area,
+						participant->outer_batch_descriptors);
+	memset(inner_batch_descriptors, 0, size);
+	memset(outer_batch_descriptors, 0, size);
+
+	/* Now export all batches that were written by this participant. */
+	for (i = hashtable->curbatch + 1; i < hashtable->nbatch; ++i)
+	{
+		file = hashtable->innerBatchFile[i];
+		if (file != NULL)
+			inner_batch_descriptors[i] =
+				make_batch_descriptor(hashtable->area, file);
+		file = hashtable->outerBatchFile[i];
+		if (file != NULL)
+			outer_batch_descriptors[i] =
+				make_batch_descriptor(hashtable->area, file);
+	}
+}
+
+/*
+ * Import a batch that was exported by another participant, so that this
+ * process can read it.  The participant and batch numbers should be already
+ * set in the reader object that is passed in.
+ */
+static void
+ExecHashJoinImportBatch(HashJoinTable hashtable, HashJoinBatchReader *reader)
+{
+	dsa_pointer descriptor = InvalidDsaPointer;
+	HashJoinParticipantState *participant;
+
+	TRACE_POSTGRESQL_HASHJOIN_IMPORT_BATCH(reader->participant_number,
+										   reader->batchno,
+										   reader->inner);
+
+	Assert(reader->participant_number >= 0 &&
+		   reader->participant_number < hashtable->shared->planned_participants);
+
+	/* Find the participant referenced by the reader. */
+	participant = &hashtable->shared->participants[reader->participant_number];
+
+	/* Find the descriptor exported by that participant for that batch. */
+	if (participant->nbatch != -1)
+	{
+		/* It exported all its batches and left.  Find the correct one. */
+		if (reader->batchno < participant->nbatch)
+		{
+			dsa_pointer *descriptors;
+
+			Assert(DsaPointerIsValid(participant->inner_batch_descriptors));
+			Assert(DsaPointerIsValid(participant->outer_batch_descriptors));
+			descriptors =
+				dsa_get_address(hashtable->area,
+								reader->inner
+								? participant->inner_batch_descriptors
+								: participant->outer_batch_descriptors);
+			if (DsaPointerIsValid(descriptors[reader->batchno]))
+				descriptor = descriptors[reader->batchno];
+		}
+	}
+	else
+	{
+		/* It must have just exported the exact batch we expect. */
+		Assert((reader->inner &&
+				(reader->batchno == participant->inner_batchno)) ||
+			   (!reader->inner &&
+				(reader->batchno == participant->outer_batchno)));
+
+		if (reader->inner)
+			descriptor = participant->inner_batch_descriptor;
+		else
+			descriptor = participant->outer_batch_descriptor;
+	}
+
+	/* Import the BufFile, if we found one. */
+	if (DsaPointerIsValid(descriptor))
+	{
+		reader->head.fileno = reader->head.offset = -1;
+		reader->file = BufFileImport(dsa_get_address(hashtable->area,
+													 descriptor));
+		if (reader->inner)
+			reader->shared = &participant->inner_batch_reader;
+		else
+			reader->shared = &participant->outer_batch_reader;
+		Assert(reader->shared->batchno == reader->batchno);
+	}
+	else
+	{
+		reader->file = NULL;
+		reader->shared = NULL;
+	}
+}
+
+/*
+ * Select the batch file that ExecHashJoinGetSavedTuple will read from.
+ */
+void
+ExecHashJoinOpenBatch(HashJoinTable hashtable, int batchno, bool inner)
+{
+	HashJoinBatchReader *batch_reader = &hashtable->batch_reader;
+
+	TRACE_POSTGRESQL_HASHJOIN_OPEN_BATCH(HashJoinParticipantNumber(),
+										 batchno,
+										 inner);
+
+	if (batchno == 0)
+		batch_reader->file = NULL;
+	else
+		batch_reader->file = inner
+			? hashtable->innerBatchFile[batchno]
+			: hashtable->outerBatchFile[batchno];
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		HashJoinParticipantState *participant;
+
+		/* Initially we will read from the caller's batch file. */
+		participant =
+			&hashtable->shared->participants[HashJoinParticipantNumber()];
+		batch_reader->shared = inner
+			? &participant->inner_batch_reader
+			: &participant->outer_batch_reader;
+		/* Seek to the shared position at next read. */
+		batch_reader->head.fileno = -1;
+		batch_reader->head.offset = -1;
+	}
+	else
+	{
+		batch_reader->shared = NULL;
+		/* Seek to start of batch now, if there is one. */
+		if (batch_reader->file != NULL)
+			BufFileSeek(batch_reader->file, 0, 0, SEEK_SET);
+	}
+
+	batch_reader->participant_number = HashJoinParticipantNumber();
+	batch_reader->batchno = batchno;
+	batch_reader->inner = inner;
+}
+
+/*
+ * Close a batch, once it is not needed by any participant.  This causes batch
+ * files created by this participant to be deleted.
+ */
+void
+ExecHashJoinCloseBatch(HashJoinTable hashtable, int batchno, bool inner)
+{
+	HashJoinParticipantState *participant;
+	HashJoinBatchReader *batch_reader;
+	BufFile *file;
+
+	/*
+	 * We only need to close the batch owned by THIS participant.  That causes
+	 * it to be deleted.  Batches opened in this backend but created by other
+	 * participants are closed by ExecHashJoinGetSavedTuple when it reaches
+	 * the end of the file, allowing them to be closed sooner.
+	 */
+	batch_reader = &hashtable->batch_reader;
+	participant = &hashtable->shared->participants[HashJoinParticipantNumber()];
+	if (inner)
+	{
+		file = hashtable->innerBatchFile[batchno];
+		hashtable->innerBatchFile[batchno] = NULL;
+	}
+	else
+	{
+		file = hashtable->outerBatchFile[batchno];
+		hashtable->outerBatchFile[batchno] = NULL;
+	}
+	if (file == NULL)
+		return;
+
+	Assert(batch_reader->file == NULL || file == batch_reader->file);
+
+	BufFileClose(file);
+	batch_reader->file = NULL;
+}
+
+/*
+ * Rewind batch readers.  The outer batch reader is rewound to the start of
+ * batchno.  The inner batch reader is rewound to the start of batchno + 1, in
+ * anticipation of preloading the next batch.
+ */
+void
+ExecHashJoinRewindBatches(HashJoinTable hashtable, int batchno)
+{
+	HashJoinBatchReader *batch_reader;
+	int i;
+
+	batch_reader = &hashtable->batch_reader;
+
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(&hashtable->shared->barrier) == PHJ_PHASE_CREATING ||
+			   (PHJ_PHASE_TO_SUBPHASE(BarrierPhase(&hashtable->shared->barrier)) ==
+				PHJ_SUBPHASE_PREPARING &&
+				PHJ_PHASE_TO_BATCHNO(BarrierPhase(&hashtable->shared->barrier)) ==
+				batchno));
+
+		/* Position the shared read heads for each participant's batch. */
+		for (i = 0; i < hashtable->shared->planned_participants; ++i)
+		{
+			HashJoinSharedBatchReader *reader;
+
+			reader = &hashtable->shared->participants[i].outer_batch_reader;
+			reader->batchno = batchno; /* for probing this batch */
+			reader->head.fileno = 0;
+			reader->head.offset = 0;
+
+			reader = &hashtable->shared->participants[i].inner_batch_reader;
+			reader->batchno = batchno + 1; /* for preloading the next batch */
+			reader->head.fileno = 0;
+			reader->head.offset = 0;
+		}
+	}
 }
 
 /*
  * ExecHashJoinGetSavedTuple
- *		read the next tuple from a batch file.  Return NULL if no more.
+ *		read the next tuple from the batch selected with
+ *		ExecHashJoinOpenBatch, including the batch files of
+ *		other participants if the hash table is shared.  Return NULL if no
+ *		more.
  *
  * On success, *hashvalue is set to the tuple's hash value, and the tuple
  * itself is stored in the given slot.
  */
 static TupleTableSlot *
-ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
-						  BufFile *file,
+ExecHashJoinGetSavedTuple(HashJoinTable hashtable,
 						  uint32 *hashvalue,
 						  TupleTableSlot *tupleSlot)
 {
-	uint32		header[2];
-	size_t		nread;
-	MinimalTuple tuple;
+	TupleTableSlot *result = NULL;
+	HashJoinBatchReader *batch_reader = &hashtable->batch_reader;
 
-	/*
-	 * Since both the hash value and the MinimalTuple length word are uint32,
-	 * we can read them both in one BufFileRead() call without any type
-	 * cheating.
-	 */
-	nread = BufFileRead(file, (void *) header, sizeof(header));
-	if (nread == 0)				/* end of file */
+	for (;;)
 	{
-		ExecClearTuple(tupleSlot);
-		return NULL;
-	}
-	if (nread != sizeof(header))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
-	*hashvalue = header[0];
-	tuple = (MinimalTuple) palloc(header[1]);
-	tuple->t_len = header[1];
-	nread = BufFileRead(file,
-						(void *) ((char *) tuple + sizeof(uint32)),
-						header[1] - sizeof(uint32));
-	if (nread != header[1] - sizeof(uint32))
-		ereport(ERROR,
-				(errcode_for_file_access(),
+		uint32		header[2];
+		size_t		nread;
+		MinimalTuple tuple;
+		bool		can_close = false;
+
+		if (batch_reader->file == NULL)
+		{
+			/*
+			 * No file found for the current participant.  Try stealing tuples
+			 * from the next participant.
+			 */
+			goto next_participant;
+		}
+
+		if (HashJoinTableIsShared(hashtable))
+		{
+			Assert((batch_reader->inner &&
+					batch_reader->shared ==
+					&hashtable->shared->participants[batch_reader->participant_number].inner_batch_reader) ||
+				   (!batch_reader->inner &&
+					batch_reader->shared ==
+					&hashtable->shared->participants[batch_reader->participant_number].outer_batch_reader));
+
+			LWLockAcquire(&batch_reader->shared->lock, LW_EXCLUSIVE);
+			Assert(batch_reader->shared->batchno == batch_reader->batchno);
+			if (batch_reader->shared->error)
+			{
+				/* Don't try to read if reading failed in some other backend. */
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read from hash-join temporary file")));
+			}
+
+			/* Set the shared error flag, which we'll clear if we succeed. */
+			batch_reader->shared->error = true;
+
+			/*
+			 * If another worker has moved the shared read head since we last read,
+			 * we'll need to seek to the new shared position.
+			 */
+			if (batch_reader->head.fileno != batch_reader->shared->head.fileno ||
+				batch_reader->head.offset != batch_reader->shared->head.offset)
+			{
+				TRACE_POSTGRESQL_HASH_SEEK(HashJoinParticipantNumber(),
+										   batch_reader->participant_number,
+										   batch_reader->batchno,
+										   batch_reader->inner,
+										   batch_reader->shared->head.fileno,
+										   batch_reader->shared->head.offset);
+				BufFileSeek(batch_reader->file,
+							batch_reader->shared->head.fileno,
+							batch_reader->shared->head.offset,
+							SEEK_SET);
+				batch_reader->head = batch_reader->shared->head;
+			}
+		}
+
+		/* Try to read the size and hash. */
+		nread = BufFileRead(batch_reader->file, (void *) header, sizeof(header));
+		if (nread > 0)
+		{
+			if (nread != sizeof(header))
+			{
+				ereport(ERROR,
+					(errcode_for_file_access(),
 				 errmsg("could not read from hash-join temporary file: %m")));
-	return ExecStoreMinimalTuple(tuple, tupleSlot, true);
-}
+			}
+			*hashvalue = header[0];
+			tuple = (MinimalTuple) palloc(header[1]);
+			tuple->t_len = header[1];
+			nread = BufFileRead(batch_reader->file,
+								(void *) ((char *) tuple + sizeof(uint32)),
+								header[1] - sizeof(uint32));
+			if (nread != header[1] - sizeof(uint32))
+			{
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read from hash-join temporary file: %m")));
+			}
+
+			TRACE_POSTGRESQL_HASH_GET_SAVED_TUPLE(HashJoinParticipantNumber(),
+												  batch_reader->participant_number,
+												  batch_reader->batchno,
+												  batch_reader->inner);
+			result = ExecStoreMinimalTuple(tuple, tupleSlot, true);
+		}
+
+		if (HashJoinTableIsShared(hashtable))
+		{
+			if (nread == 0 &&
+				batch_reader->participant_number !=
+				HashJoinParticipantNumber())
+			{
+				/*
+				 * We've reached the end of another paticipant's batch file,
+				 * so close it now.  We'll deal with closing THIS
+				 * participant's batch file later, because we don't want the
+				 * files to be deleted just yet.
+				 */
+				can_close = true;
+			}
+			/* Commit new head position to shared memory and clear error. */
+			BufFileTell(batch_reader->file,
+						&batch_reader->head.fileno,
+						&batch_reader->head.offset);
+			batch_reader->shared->head = batch_reader->head;
+			batch_reader->shared->error = false;
+			if (nread == 0)
+				TRACE_POSTGRESQL_HASH_TELL(HashJoinParticipantNumber(),
+										   batch_reader->participant_number,
+										   batch_reader->batchno,
+										   batch_reader->inner,
+										   batch_reader->shared->head.fileno,
+										   batch_reader->shared->head.offset);
+			LWLockRelease(&batch_reader->shared->lock);
+		}
+
+		if (can_close)
+		{
+			BufFileClose(batch_reader->file);
+			batch_reader->file = NULL;
+		}
+
+		if (result != NULL)
+			return result;
+
+next_participant:
+		if (!HashJoinTableIsShared(hashtable))
+		{
+			/* Private hash table, end of batch. */
+			ExecClearTuple(tupleSlot); /* TODO:TM also needed for shared n'est-ce pas? */
+			return NULL;
+		}
+
+		/* Try the next participant's batch file. */
+		batch_reader->participant_number =
+			(batch_reader->participant_number + 1) %
+				hashtable->shared->planned_participants;
+		if (batch_reader->participant_number == HashJoinParticipantNumber())
+		{
+			/*
+			 * We've made it all the way back to the file we started with,
+			 * which is the one that this backend wrote.  So there are no more
+			 * tuples to be had in any participant's batch file.
+			 */
+			ExecClearTuple(tupleSlot);
+			return NULL;
+		}
 
+		/* Import the BufFile from that participant, if it exported one. */
+		ExecHashJoinImportBatch(hashtable, batch_reader);
+	}
+}
 
 void
 ExecReScanHashJoin(HashJoinState *node)
 {
+	HashState *hashNode = (HashState *) innerPlanState(node);
+
+	/* We can't use HashJoinTableIsShared if the table is NULL. */
+	if (hashNode->shared_table_data != NULL)
+	{
+		elog(ERROR, "TODO: shared ExecReScanHashJoin not yet implemented");
+
+		/* Coordinate a rewind to the shared hash table creation phase. */
+		BarrierWaitSet(&hashNode->shared_table_data->barrier,
+					   PHJ_PHASE_BEGINNING,
+					   WAIT_EVENT_HASHJOIN_REWINDING);
+	}
+
 	/*
 	 * In a multi-batch join, we currently have to do rescans the hard way,
 	 * primarily because batch temp files may have already been released. But
@@ -977,6 +1839,14 @@ ExecReScanHashJoin(HashJoinState *node)
 
 			/* ExecHashJoin can skip the BUILD_HASHTABLE step */
 			node->hj_JoinState = HJ_NEED_NEW_OUTER;
+
+			if (HashJoinTableIsShared(node->hj_HashTable))
+			{
+				/* Coordinate a rewind to the shared probing phase. */
+				BarrierWaitSet(&hashNode->shared_table_data->barrier,
+							   PHJ_PHASE_PROBING,
+							   WAIT_EVENT_HASHJOIN_REWINDING2);
+			}
 		}
 		else
 		{
@@ -985,6 +1855,14 @@ ExecReScanHashJoin(HashJoinState *node)
 			node->hj_HashTable = NULL;
 			node->hj_JoinState = HJ_BUILD_HASHTABLE;
 
+			if (HashJoinTableIsShared(node->hj_HashTable))
+			{
+				/* Coordinate a rewind to the shared hash table creation phase. */
+				BarrierWaitSet(&hashNode->shared_table_data->barrier,
+							   PHJ_PHASE_BEGINNING,
+							   WAIT_EVENT_HASHJOIN_REWINDING3);
+			}
+
 			/*
 			 * if chgParam of subnode is not null then plan will be re-scanned
 			 * by first ExecProcNode.
@@ -1011,3 +1889,97 @@ ExecReScanHashJoin(HashJoinState *node)
 	if (node->js.ps.lefttree->chgParam == NULL)
 		ExecReScan(node->js.ps.lefttree);
 }
+
+void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt)
+{
+	size_t size;
+
+	size = offsetof(SharedHashJoinTableData, participants) +
+		sizeof(HashJoinParticipantState) * (pcxt->nworkers + 1);
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+void
+ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
+{
+	HashState *hashNode;
+	SharedHashJoinTable shared;
+	size_t size;
+	int planned_participants;
+	int i;
+
+	/*
+	 * Disable shared hash table mode if we failed to create a real DSM
+	 * segment, because that means that we don't have a DSA area to work
+	 * with.
+	 */
+	if (pcxt->seg == NULL)
+		return;
+
+	/*
+	 * Set up the state needed to coordinate access to the shared hash table,
+	 * using the plan node ID as the toc key.
+	 */
+	planned_participants = pcxt->nworkers + 1;	/* possible workers + leader */
+	size = offsetof(SharedHashJoinTableData, participants) +
+		sizeof(HashJoinParticipantState) * planned_participants;
+	shared = shm_toc_allocate(pcxt->toc, size);
+	BarrierInit(&shared->barrier, 0);
+	BarrierInit(&shared->shrink_barrier, 0);
+	shared->buckets = InvalidDsaPointer;
+	shared->chunks = InvalidDsaPointer;
+	shared->chunks_preloaded = InvalidDsaPointer;
+	shared->chunks_to_rebucket = InvalidDsaPointer;
+	shared->chunks_to_shrink = InvalidDsaPointer;
+	shared->chunks_unmatched = InvalidDsaPointer;
+	shared->planned_participants = planned_participants;
+	shared->size = 0;
+	shared->size_preloaded = 0;
+	shared->shrinking_enabled = true;
+	shm_toc_insert(pcxt->toc, state->js.ps.plan->plan_node_id, shared);
+
+	/* Initialize the LWLocks. */
+	LWLockInitialize(&shared->chunk_lock, LWTRANCHE_PARALLEL_HASH_JOIN_CHUNK);
+	for (i = 0; i < planned_participants; ++i)
+	{
+		LWLockInitialize(&shared->participants[i].inner_batch_reader.lock,
+						 LWTRANCHE_PARALLEL_HASH_JOIN_INNER_BATCH_READER);
+		LWLockInitialize(&shared->participants[i].outer_batch_reader.lock,
+						 LWTRANCHE_PARALLEL_HASH_JOIN_OUTER_BATCH_READER);
+	}
+
+	/*
+	 * Pass the SharedHashJoinTable to the hash node.  If the Gather node
+	 * running in the leader backend decides to execute the hash join, it
+	 * hasn't called ExecHashJoinInitializeWorker so it doesn't have
+	 * state->shared_table_data set up.  So we must do it here.
+	 */
+	hashNode = (HashState *) innerPlanState(state);
+	hashNode->shared_table_data = shared;
+}
+
+void
+ExecHashJoinInitializeWorker(HashJoinState *state, shm_toc *toc)
+{
+	HashState  *hashNode;
+
+	state->hj_sharedHashJoinTable =
+		shm_toc_lookup(toc, state->js.ps.plan->plan_node_id);
+
+	/*
+	 * Inject SharedHashJoinTable into the hash node.  It could instead have
+	 * its own ExecHashInitializeWorker function, but we only want to set its
+	 * 'parallel_aware' flag if we want to tell it to actually build the hash
+	 * table in parallel.  Since its parallel_aware flag also controls whether
+	 * its 'InitializeWorker' function gets called, and it also needs access
+	 * to this object for serial shared hash mode, we'll pass it on here
+	 * instead of depending on that.
+	 */
+	hashNode = (HashState *) innerPlanState(state);
+	hashNode->shared_table_data = state->hj_sharedHashJoinTable;
+	Assert(hashNode->shared_table_data != NULL);
+
+	Assert(HashJoinParticipantNumber() <
+		   hashNode->shared_table_data->planned_participants);
+}
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 439a946..df1d574 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,6 +31,8 @@
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
 
+#include <unistd.h>
+
 static void InitScanRelation(SeqScanState *node, EState *estate, int eflags);
 static TupleTableSlot *SeqNext(SeqScanState *node);
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 806d0a9..a2beb27 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1993,6 +1993,7 @@ _outHashPath(StringInfo str, const HashPath *node)
 
 	WRITE_NODE_FIELD(path_hashclauses);
 	WRITE_INT_FIELD(num_batches);
+	WRITE_ENUM_FIELD(table_type, HashPathTableType);
 }
 
 static void
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a52eb7e..2856bcd 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -104,6 +104,7 @@
 double		seq_page_cost = DEFAULT_SEQ_PAGE_COST;
 double		random_page_cost = DEFAULT_RANDOM_PAGE_COST;
 double		cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
+double		cpu_shared_tuple_cost = DEFAULT_CPU_SHARED_TUPLE_COST;
 double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double		parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
@@ -2693,16 +2694,19 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 					  List *hashclauses,
 					  Path *outer_path, Path *inner_path,
 					  SpecialJoinInfo *sjinfo,
-					  SemiAntiJoinFactors *semifactors)
+					  SemiAntiJoinFactors *semifactors,
+					  HashPathTableType table_type)
 {
 	Cost		startup_cost = 0;
 	Cost		run_cost = 0;
 	double		outer_path_rows = outer_path->rows;
 	double		inner_path_rows = inner_path->rows;
+	double		inner_path_rows_total = inner_path_rows;
 	int			num_hashclauses = list_length(hashclauses);
 	int			numbuckets;
 	int			numbatches;
 	int			num_skew_mcvs;
+	size_t		space_allowed;		/* not used */
 
 	/* cost of source data */
 	startup_cost += outer_path->startup_cost;
@@ -2724,8 +2728,43 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;
 
 	/*
+	 * If this is a shared hash table, there is a extra charge for inserting
+	 * each tuple into the shared hash table, to cover the overhead of memory
+	 * synchronization that makes the hash table slightly slower to build than
+	 * a private hash table.  There is no extra charge for probing the hash
+	 * table for outer path row, on the basis that read-only access to the
+	 * hash table shouldn't generate any extra memory synchronization.
+	 *
+	 * cpu_shared_tuple_cost acts a tie-breaker controlling whether we prefer
+	 * HASH_TABLE_PRIVATE or HASH_TABLE_SHARED_SERIAL plans, when the hash
+	 * table fits in work_mem, since the cost is otherwise the same.  If it is
+	 * positive, then we'll prefer private hash tables, even though that means
+	 * that we'll be running N copies of the inner plan.  Running N copies of
+	 * the copies of the inner plan in parallel is not considered more
+	 * expensive than running 1 copy of the inner plan while N-1 participants
+	 * do nothing, despite doing less work in total.
+	 */
+	if (table_type != HASHPATH_TABLE_PRIVATE)
+		startup_cost += cpu_shared_tuple_cost * inner_path_rows;
+
+	/*
+	 * If this is a parallel shared hash table, then the value we have for
+	 * inner_rows refers only to the rows returned by each participant.  For
+	 * shared hash table size estimation, we need the total number, so we need
+	 * to undo the division.
+	 */
+	if (table_type == HASHPATH_TABLE_SHARED_PARALLEL)
+		inner_path_rows_total *= outer_path->parallel_workers + 1;
+
+	/*
 	 * Get hash table size that executor would use for inner relation.
 	 *
+	 * Shared hash tables are allowed to be larger to make up for the fact
+	 * that there is only one copy shared by all parallel query participants,
+	 * which may reduce the number of batches.  That means that
+	 * HASH_TABLE_SHARED_SERIAL is likely to beat HASH_TABLE_PRIVATE when we
+	 * expect to exceed work_mem.
+	 *
 	 * XXX for the moment, always assume that skew optimization will be
 	 * performed.  As long as SKEW_WORK_MEM_PERCENT is small, it's not worth
 	 * trying to determine that for sure.
@@ -2733,9 +2772,12 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	 * XXX at some point it might be interesting to try to account for skew
 	 * optimization in the cost estimate, but for now, we don't.
 	 */
-	ExecChooseHashTableSize(inner_path_rows,
+	ExecChooseHashTableSize(inner_path_rows_total,
 							inner_path->pathtarget->width,
 							true,		/* useskew */
+							table_type != HASHPATH_TABLE_PRIVATE, /* shared */
+							outer_path->parallel_workers,
+							&space_allowed,
 							&numbuckets,
 							&numbatches,
 							&num_skew_mcvs);
@@ -2746,12 +2788,19 @@ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	 * time.  Charge seq_page_cost per page, since the I/O should be nice and
 	 * sequential.  Writing the inner rel counts as startup cost, all the rest
 	 * as run cost.
+	 *
+	 * If the hash table is HASH_TABLE_PRIVATE, then every participant will
+	 * write a copy of every batch file, but this happens in parallel so we
+	 * don't consider that to be more expensive than the
+	 * HASH_TABLE_SHARED_SERIAL case where only one participant does that.  It
+	 * is not clear how the costing should be affected by higher disk
+	 * bandwidth usage.
 	 */
 	if (numbatches > 1)
 	{
 		double		outerpages = page_size(outer_path_rows,
 										   outer_path->pathtarget->width);
-		double		innerpages = page_size(inner_path_rows,
+		double		innerpages = page_size(inner_path_rows_total,
 										   inner_path->pathtarget->width);
 
 		startup_cost += seq_page_cost * innerpages;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 7c30ec6..209b9d1 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -492,7 +492,8 @@ try_hashjoin_path(PlannerInfo *root,
 				  Path *inner_path,
 				  List *hashclauses,
 				  JoinType jointype,
-				  JoinPathExtraData *extra)
+				  JoinPathExtraData *extra,
+				  HashPathTableType table_type)
 {
 	Relids		required_outer;
 	JoinCostWorkspace workspace;
@@ -517,7 +518,7 @@ try_hashjoin_path(PlannerInfo *root,
 	 */
 	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
 						  outer_path, inner_path,
-						  extra->sjinfo, &extra->semifactors);
+						  extra->sjinfo, &extra->semifactors, table_type);
 
 	if (add_path_precheck(joinrel,
 						  workspace.startup_cost, workspace.total_cost,
@@ -534,7 +535,8 @@ try_hashjoin_path(PlannerInfo *root,
 									  inner_path,
 									  extra->restrictlist,
 									  required_outer,
-									  hashclauses));
+									  hashclauses,
+									  table_type));
 	}
 	else
 	{
@@ -555,7 +557,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 						  Path *inner_path,
 						  List *hashclauses,
 						  JoinType jointype,
-						  JoinPathExtraData *extra)
+						  JoinPathExtraData *extra,
+						  HashPathTableType table_type)
 {
 	JoinCostWorkspace workspace;
 
@@ -580,7 +583,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 	 */
 	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
 						  outer_path, inner_path,
-						  extra->sjinfo, &extra->semifactors);
+						  extra->sjinfo, &extra->semifactors,
+						  table_type);
 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
 		return;
 
@@ -596,7 +600,8 @@ try_partial_hashjoin_path(PlannerInfo *root,
 										  inner_path,
 										  extra->restrictlist,
 										  NULL,
-										  hashclauses));
+										  hashclauses,
+										  table_type));
 }
 
 /*
@@ -1401,7 +1406,8 @@ hash_inner_and_outer(PlannerInfo *root,
 							  cheapest_total_inner,
 							  hashclauses,
 							  jointype,
-							  extra);
+							  extra,
+							  HASHPATH_TABLE_PRIVATE);
 			/* no possibility of cheap startup here */
 		}
 		else if (jointype == JOIN_UNIQUE_INNER)
@@ -1417,7 +1423,8 @@ hash_inner_and_outer(PlannerInfo *root,
 							  cheapest_total_inner,
 							  hashclauses,
 							  jointype,
-							  extra);
+							  extra,
+							  HASHPATH_TABLE_PRIVATE);
 			if (cheapest_startup_outer != NULL &&
 				cheapest_startup_outer != cheapest_total_outer)
 				try_hashjoin_path(root,
@@ -1426,7 +1433,8 @@ hash_inner_and_outer(PlannerInfo *root,
 								  cheapest_total_inner,
 								  hashclauses,
 								  jointype,
-								  extra);
+								  extra,
+								  HASHPATH_TABLE_PRIVATE);
 		}
 		else
 		{
@@ -1447,7 +1455,8 @@ hash_inner_and_outer(PlannerInfo *root,
 								  cheapest_total_inner,
 								  hashclauses,
 								  jointype,
-								  extra);
+								  extra,
+								  HASHPATH_TABLE_PRIVATE);
 
 			foreach(lc1, outerrel->cheapest_parameterized_paths)
 			{
@@ -1481,7 +1490,8 @@ hash_inner_and_outer(PlannerInfo *root,
 									  innerpath,
 									  hashclauses,
 									  jointype,
-									  extra);
+									  extra,
+									  HASHPATH_TABLE_PRIVATE);
 				}
 			}
 		}
@@ -1490,23 +1500,32 @@ hash_inner_and_outer(PlannerInfo *root,
 		 * If the joinrel is parallel-safe, we may be able to consider a
 		 * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
 		 * because the outer path will be partial, and therefore we won't be
-		 * able to properly guarantee uniqueness.  Similarly, we can't handle
-		 * JOIN_FULL and JOIN_RIGHT, because they can produce false null
-		 * extended rows.  Also, the resulting path must not be parameterized.
+		 * able to properly guarantee uniqueness.  Also, the resulting path
+		 * must not be parameterized.
 		 */
 		if (joinrel->consider_parallel &&
 			save_jointype != JOIN_UNIQUE_OUTER &&
-			save_jointype != JOIN_FULL &&
-			save_jointype != JOIN_RIGHT &&
 			outerrel->partial_pathlist != NIL &&
 			bms_is_empty(joinrel->lateral_relids))
 		{
 			Path	   *cheapest_partial_outer;
+			Path	   *cheapest_partial_inner = NULL;
 			Path	   *cheapest_safe_inner = NULL;
 
 			cheapest_partial_outer =
 				(Path *) linitial(outerrel->partial_pathlist);
 
+			/* Can we use a partial inner plan too? */
+			if (innerrel->partial_pathlist != NIL)
+				cheapest_partial_inner =
+					(Path *) linitial(innerrel->partial_pathlist);
+			if (cheapest_partial_inner != NULL)
+				try_partial_hashjoin_path(root, joinrel,
+										  cheapest_partial_outer,
+										  cheapest_partial_inner,
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_SHARED_PARALLEL);
+
 			/*
 			 * Normally, given that the joinrel is parallel-safe, the cheapest
 			 * total inner path will also be parallel-safe, but if not, we'll
@@ -1534,10 +1553,27 @@ hash_inner_and_outer(PlannerInfo *root,
 			}
 
 			if (cheapest_safe_inner != NULL)
+			{
+				/* Try a shared table with only one worker building the table. */
 				try_partial_hashjoin_path(root, joinrel,
 										  cheapest_partial_outer,
 										  cheapest_safe_inner,
-										  hashclauses, jointype, extra);
+										  hashclauses, jointype, extra,
+										  HASHPATH_TABLE_SHARED_SERIAL);
+				/*
+				 * Also try private hash tables, built by each worker, but
+				 * only if it's not a FULL or RIGHT join.  Those rely on being
+				 * able to track which hash table entries have been matched,
+				 * but we don't have a way to unify the HEAP_TUPLE_HAS_MATCH
+				 * flags from all the private copies of the hash table.
+				 */
+				if (save_jointype != JOIN_FULL && save_jointype != JOIN_RIGHT)
+					try_partial_hashjoin_path(root, joinrel,
+											  cheapest_partial_outer,
+											  cheapest_safe_inner,
+											  hashclauses, jointype, extra,
+											  HASHPATH_TABLE_PRIVATE);
+			}
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c7bcd9b..cac4932 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3938,6 +3938,23 @@ create_hashjoin_plan(PlannerInfo *root,
 	copy_plan_costsize(&hash_plan->plan, inner_plan);
 	hash_plan->plan.startup_cost = hash_plan->plan.total_cost;
 
+	/*
+	 * Set the table as sharable if appropriate, with parallel or serial
+	 * building.
+	 */
+	switch (best_path->table_type)
+	{
+	case HASHPATH_TABLE_SHARED_PARALLEL:
+		hash_plan->shared_table = true;
+		hash_plan->plan.parallel_aware = true;
+		break;
+	case HASHPATH_TABLE_SHARED_SERIAL:
+		hash_plan->shared_table = true;
+		break;
+	case HASHPATH_TABLE_PRIVATE:
+		break;
+	}
+
 	join_plan = make_hashjoin(tlist,
 							  joinclauses,
 							  otherclauses,
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3b7c56d..a1d7b20 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2096,6 +2096,7 @@ create_mergejoin_path(PlannerInfo *root,
  * 'required_outer' is the set of required outer rels
  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
  *		(this should be a subset of the restrict_clauses list)
+ * 'table_type' for level of hash table sharing
  */
 HashPath *
 create_hashjoin_path(PlannerInfo *root,
@@ -2108,7 +2109,8 @@ create_hashjoin_path(PlannerInfo *root,
 					 Path *inner_path,
 					 List *restrict_clauses,
 					 Relids required_outer,
-					 List *hashclauses)
+					 List *hashclauses,
+					 HashPathTableType table_type)
 {
 	HashPath   *pathnode = makeNode(HashPath);
 
@@ -2123,9 +2125,13 @@ create_hashjoin_path(PlannerInfo *root,
 								  sjinfo,
 								  required_outer,
 								  &restrict_clauses);
-	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_aware =
+		joinrel->consider_parallel &&
+		(table_type == HASHPATH_TABLE_SHARED_SERIAL ||
+		 table_type == HASHPATH_TABLE_SHARED_PARALLEL);
 	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
 		outer_path->parallel_safe && inner_path->parallel_safe;
+	pathnode->table_type = table_type;
 	/* This is a foolish way to estimate parallel_workers, but for now... */
 	pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f37a0bf..d562fef 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3392,6 +3392,63 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_HASH_CREATING:
+			event_name = "Hash/Creating";
+			break;
+		case WAIT_EVENT_HASH_HASHING:
+			event_name = "Hash/Hashing";
+			break;
+		case WAIT_EVENT_HASH_SHRINKING1:
+			event_name = "Hash/Shrinking1";
+			break;
+		case WAIT_EVENT_HASH_SHRINKING2:
+			event_name = "Hash/Shrinking2";
+			break;
+		case WAIT_EVENT_HASH_SHRINKING3:
+			event_name = "Hash/Shrinking3";
+			break;
+		case WAIT_EVENT_HASH_SHRINKING4:
+			event_name = "Hash/Shrinking4";
+			break;
+		case WAIT_EVENT_HASH_RESIZING:
+			event_name = "Hash/Resizing";
+			break;
+		case WAIT_EVENT_HASH_REBUCKETING:
+			event_name = "Hash/Rebucketing";
+			break;
+		case WAIT_EVENT_HASH_BEGINNING:
+			event_name = "Hash/Beginning";
+			break;
+		case WAIT_EVENT_HASH_DESTROY:
+			event_name = "Hash/Destroy";
+			break;
+		case WAIT_EVENT_HASH_UNMATCHED:
+			event_name = "Hash/Unmatched";
+			break;
+		case WAIT_EVENT_HASH_PROMOTING:
+			event_name = "Hash/Promoting";
+			break;
+		case WAIT_EVENT_HASHJOIN_PROMOTING:
+			event_name = "HashJoin/Promoting";
+			break;
+		case WAIT_EVENT_HASHJOIN_PREPARING:
+			event_name = "HashJoin/Preparing";
+			break;
+		case WAIT_EVENT_HASHJOIN_PROBING:
+			event_name = "HashJoin/Probing";
+			break;
+		case WAIT_EVENT_HASHJOIN_LOADING:
+			event_name = "HashJoin/Loading";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING:
+			event_name = "HashJoin/Rewinding";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING2:
+			event_name = "HashJoin/Rewinding2";;
+			break;
+		case WAIT_EVENT_HASHJOIN_REWINDING3:
+			event_name = "HashJoin/Rewinding3";;
+			break;
 		/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 7ebd636..18ffd4e 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -40,8 +40,11 @@
 #include "storage/fd.h"
 #include "storage/buffile.h"
 #include "storage/buf_internals.h"
+#include "utils/probes.h"
 #include "utils/resowner.h"
 
+extern int ParallelWorkerNumber;
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large temporary BufFiles to be spread across
@@ -89,6 +92,24 @@ struct BufFile
 	char		buffer[BLCKSZ];
 };
 
+/*
+ * Serialized representation of a single file managed by a BufFile.
+ */
+typedef struct BufFileFileDescriptor
+{
+	char path[MAXPGPATH];
+} BufFileFileDescriptor;
+
+/*
+ * Serialized representation of a BufFile, to be created by BufFileExport and
+ * consumed by BufFileImport.
+ */
+struct BufFileDescriptor
+{
+	size_t num_files;
+	BufFileFileDescriptor files[FLEXIBLE_ARRAY_MEMBER];
+};
+
 static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
@@ -178,6 +199,81 @@ BufFileCreateTemp(bool interXact)
 	return file;
 }
 
+/*
+ * Export a BufFile description in a serialized form so that another backend
+ * can attach to it and read from it.  The format is opaque, but it may be
+ * bitwise copied, and its size may be obtained with BufFileDescriptorSize().
+ */
+BufFileDescriptor *
+BufFileExport(BufFile *file)
+{
+	BufFileDescriptor *descriptor;
+	int i;
+
+	/* Flush output from local buffers. */
+	BufFileFlush(file);
+
+	/* Create and fill in a descriptor. */
+	descriptor = palloc0(offsetof(BufFileDescriptor, files) +
+						 sizeof(BufFileFileDescriptor) * file->numFiles);
+	descriptor->num_files = file->numFiles;
+	for (i = 0; i < descriptor->num_files; ++i)
+	{
+		TRACE_POSTGRESQL_BUFFILE_EXPORT_FILE(FilePathName(file->files[i]));
+		strcpy(descriptor->files[i].path, FilePathName(file->files[i]));
+	}
+
+	return descriptor;
+}
+
+/*
+ * Return the size in bytes of a BufFileDescriptor, so that it can be copied.
+ */
+size_t
+BufFileDescriptorSize(const BufFileDescriptor *descriptor)
+{
+	return offsetof(BufFileDescriptor, files) +
+		sizeof(BufFileFileDescriptor) * descriptor->num_files;
+}
+
+/*
+ * Open a BufFile that was created by another backend and then exported.  The
+ * file must be read-only in all backends, and is still owned by the backend
+ * that created it.  This provides a way for cooperating backends to share
+ * immutable temporary data such as hash join batches.
+ */
+BufFile *
+BufFileImport(BufFileDescriptor *descriptor)
+{
+	BufFile    *file = (BufFile *) palloc0(sizeof(BufFile));
+	int i;
+
+	file->numFiles = descriptor->num_files;
+	file->files = (File *) palloc0(sizeof(File) * descriptor->num_files);
+	file->offsets = (off_t *) palloc0(sizeof(off_t) * descriptor->num_files);
+	file->isTemp = false;
+	file->isInterXact = true; /* prevent cleanup by this backend */
+	file->dirty = false;
+	file->resowner = CurrentResourceOwner;
+	file->curFile = 0;
+	file->curOffset = 0L;
+	file->pos = 0;
+	file->nbytes = 0;
+
+	for (i = 0; i < descriptor->num_files; ++i)
+	{
+		TRACE_POSTGRESQL_BUFFILE_IMPORT_FILE(descriptor->files[i].path);
+		file->files[i] =
+			PathNameOpenFile(descriptor->files[i].path,
+							 O_RDONLY | PG_BINARY, 0600);
+		if (file->files[i] <= 0)
+			elog(ERROR, "failed to import \"%s\": %m",
+				 descriptor->files[i].path);
+	}
+
+	return file;
+}
+
 #ifdef NOT_USED
 /*
  * Create a BufFile and attach it to an already-opened virtual File.
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 1cf0684..833b059 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -510,6 +510,12 @@ RegisterLWLockTranches(void)
 						  "predicate_lock_manager");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN_INNER_BATCH_READER,
+						  "hash_join_inner_batches");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN_OUTER_BATCH_READER,
+						  "hash_join_outer_batches");
+	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN_CHUNK,
+						  "hash_join_chunk");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5b23dbf..fdb6d24 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2855,6 +2855,16 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 	{
+		{"cpu_shared_tuple_cost", PGC_USERSET, QUERY_TUNING_COST,
+			gettext_noop("Sets the planner's estimate of the cost of "
+						 "sharing each tuple with other parallel workers."),
+			NULL
+		},
+		&cpu_shared_tuple_cost,
+		DEFAULT_CPU_SHARED_TUPLE_COST, -DBL_MAX, DBL_MAX,
+		NULL, NULL, NULL
+	},
+	{
 		{"cpu_index_tuple_cost", PGC_USERSET, QUERY_TUNING_COST,
 			gettext_noop("Sets the planner's estimate of the cost of "
 						 "processing each index entry during an index scan."),
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 146fce9..3239c3c 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -60,6 +60,40 @@ provider postgresql {
 	probe sort__start(int, bool, int, int, bool);
 	probe sort__done(bool, long);
 
+	probe hash__leader__early__exit();
+	probe hash__worker__early__exit();
+	probe hash__hashing__start();
+	probe hash__hashing__done();
+	probe hash__loading__start();
+	probe hash__loading__done();
+	probe hash__increase__buckets(int);
+	probe hash__increase__batches(int);
+	probe hash__shrink__start(int);
+	probe hash__shrink__done();
+	probe hash__shrink__chunk();
+	probe hash__shrink__disabled();
+	probe hash__shrink__stats(size_t, size_t, size_t, size_t);
+	probe hash__rebucket__start();
+	probe hash__rebucket__done(int);
+	probe hash__free__chunk(size_t);
+	probe hash__allocate__chunk(size_t);
+	probe hash__save__tuple(int, int, int);
+	probe hash__get__saved__tuple(int, int, int, int);
+	probe hash__seek(int, int, int, int, int, size_t);
+	probe hash__tell(int, int, int, int, int, size_t);
+	probe hash__insert(int);
+	probe hash__probe(int, int);
+
+	probe hashjoin__start();
+	probe hashjoin__done();
+	probe hashjoin__export__all__batches(int, int);
+	probe hashjoin__export__batch(int, int, bool);
+	probe hashjoin__import__batch(int, int, bool);
+	probe hashjoin__open__batch(int, int, bool);
+
+	probe buffile__import__file(const char *);
+	probe buffile__export__file(const char *);
+
 	probe buffer__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool);
 	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool, bool);
 	probe buffer__flush__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index ac84053..2effc77 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -15,7 +15,13 @@
 #define HASHJOIN_H
 
 #include "nodes/execnodes.h"
+#include "port/atomics.h"
+#include "storage/barrier.h"
 #include "storage/buffile.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/spin.h"
+#include "utils/dsa.h"
 
 /* ----------------------------------------------------------------
  *				hash-join hash table structures
@@ -63,7 +69,12 @@
 
 typedef struct HashJoinTupleData
 {
-	struct HashJoinTupleData *next;		/* link to next tuple in same bucket */
+	/* link to next tuple in same bucket */
+	union
+	{
+		dsa_pointer shared;
+		struct HashJoinTupleData *private;
+	} next;
 	uint32		hashvalue;		/* tuple's hash code */
 	/* Tuple data, in MinimalTuple format, follows on a MAXALIGN boundary */
 }	HashJoinTupleData;
@@ -94,7 +105,12 @@ typedef struct HashJoinTupleData
 typedef struct HashSkewBucket
 {
 	uint32		hashvalue;		/* common hash value */
-	HashJoinTuple tuples;		/* linked list of inner-relation tuples */
+	/* linked list of inner-relation tuples */
+	union
+	{
+		dsa_pointer shared;
+		HashJoinTuple private;
+	} tuples;
 } HashSkewBucket;
 
 #define SKEW_BUCKET_OVERHEAD  MAXALIGN(sizeof(HashSkewBucket))
@@ -103,8 +119,9 @@ typedef struct HashSkewBucket
 #define SKEW_MIN_OUTER_FRACTION  0.01
 
 /*
- * To reduce palloc overhead, the HashJoinTuples for the current batch are
- * packed in 32kB buffers instead of pallocing each tuple individually.
+ * To reduce palloc/dsa_allocate overhead, the HashJoinTuples for the current
+ * batch are packed in 32kB buffers instead of pallocing each tuple
+ * individually.
  */
 typedef struct HashMemoryChunkData
 {
@@ -112,17 +129,137 @@ typedef struct HashMemoryChunkData
 	size_t		maxlen;			/* size of the buffer holding the tuples */
 	size_t		used;			/* number of buffer bytes already used */
 
-	struct HashMemoryChunkData *next;	/* pointer to the next chunk (linked
-										 * list) */
+	/* pointer to the next chunk (linked  list) */
+	union
+	{
+		dsa_pointer shared;
+		struct HashMemoryChunkData *private;
+	} next;
 
 	char		data[FLEXIBLE_ARRAY_MEMBER];	/* buffer allocated at the end */
 }	HashMemoryChunkData;
 
 typedef struct HashMemoryChunkData *HashMemoryChunk;
 
+
+
 #define HASH_CHUNK_SIZE			(32 * 1024L)
 #define HASH_CHUNK_THRESHOLD	(HASH_CHUNK_SIZE / 4)
 
+/*
+ * Read head position in a shared batch file.
+ */
+typedef struct HashJoinBatchPosition
+{
+	int fileno;
+	off_t offset;
+} HashJoinBatchPosition;
+
+/*
+ * The state exposed in shared memory by each participant to coordinate
+ * reading of batch files that it wrote.
+ */
+typedef struct HashJoinSharedBatchReader
+{
+	int batchno;				/* the batch number we are currently reading */
+
+	LWLock lock;				/* protects access to the members below */
+	bool error;					/* has an IO error occurred? */
+	HashJoinBatchPosition head;	/* shared read head for current batch */
+} HashJoinSharedBatchReader;
+
+/*
+ * The state exposed in shared memory by each participant allowing its batch
+ * files to be read by other participants.
+ */
+typedef struct HashJoinParticipantState
+{
+	/*
+	 * To allow other participants to read from this participant's batch
+	 * files, this participant publishes its batch descriptors (or invalid
+	 * pointers) here.
+	 */
+	int inner_batchno;
+	int outer_batchno;
+	dsa_pointer inner_batch_descriptor;
+	dsa_pointer outer_batch_descriptor;
+
+	/*
+	 * In the case of participants that exit early, they must publish all
+	 * their future batches, rather than publishing them one by one above.
+	 * These point to an array of dsa_pointers to BufFileDescriptor objects.
+	 */
+	int nbatch;
+	dsa_pointer inner_batch_descriptors;
+	dsa_pointer outer_batch_descriptors;
+
+	/*
+	 * The shared state used to coordinate reading from the current batch.  We
+	 * need separate objects for the outer and inner side, because in the
+	 * probing phase some participants can be reading from the outer batch,
+	 * while others can be reading from the inner side to preload the next
+	 * batch.
+	 */
+	HashJoinSharedBatchReader inner_batch_reader;
+	HashJoinSharedBatchReader outer_batch_reader;
+} HashJoinParticipantState;
+
+/*
+ * The state used by each backend to manage reading from batch files written
+ * by all participants.
+ */
+typedef struct HashJoinBatchReader
+{
+	int participant_number;				/* read which participant's batch? */
+	int batchno;						/* which batch are we reading? */
+	bool inner;							/* inner or outer? */
+	HashJoinSharedBatchReader *shared;	/* holder of the shared read head */
+	BufFile *file;						/* the file opened in this backend */
+	HashJoinBatchPosition head;			/* local read head position */
+} HashJoinBatchReader;
+
+/*
+ * State for a shared hash join table.  Each backend participating in a hash
+ * join with a shared hash table also has a HashJoinTableData object in
+ * backend-private memory, which points to this shared state in the DSM
+ * segment.
+ */
+typedef struct SharedHashJoinTableData
+{
+	Barrier barrier;				/* synchronization for the whole join */
+	Barrier shrink_barrier;			/* synchronization to shrink hashtable */
+	dsa_pointer buckets;			/* primary hash table */
+	bool at_least_one_worker;		/* did at least one worker join in time? */
+	int nbuckets;
+	int nbuckets_optimal;
+	int nbatch;
+
+	LWLock chunk_lock;				/* protects the following members */
+	dsa_pointer chunks;				/* chunks loaded for the current batch */
+	dsa_pointer chunks_preloaded;	/* chunks preloaded for the next batch */
+	dsa_pointer chunks_to_rebucket;	/* chunks with tuples to insert */
+	dsa_pointer chunks_to_shrink;	/* chunks needing to be thinned out */
+	dsa_pointer chunks_unmatched;	/* chunks for unmatched scanning */
+	Size tuples_this_batch;			/* number of tuples in chunks */
+	Size tuples_next_batch;			/* number of tuples in chunks_preloaded */
+	Size tuples_in_memory;			/* shared counter while rebatching */
+	Size tuples_written_out;		/* shared counter while rebatching */
+	Size size;						/* size of buckets + chunks */
+	Size size_preloaded;			/* size of chunks_preloaded */
+	bool shrinking_enabled;
+
+	int planned_participants;		/* number of planned workers + leader */
+
+	/* state exposed by each participant for sharing batches */
+	HashJoinParticipantState participants[FLEXIBLE_ARRAY_MEMBER];
+} SharedHashJoinTableData;
+
+typedef union HashJoinBucketHead
+{
+	dsa_pointer_atomic shared;
+	HashJoinTuple private;
+} HashJoinBucketHead;
+
 typedef struct HashJoinTableData
 {
 	int			nbuckets;		/* # buckets in the in-memory hash table */
@@ -134,7 +271,7 @@ typedef struct HashJoinTableData
 	int			log2_nbuckets_optimal;	/* log2(nbuckets_optimal) */
 
 	/* buckets[i] is head of list of tuples in i'th in-memory bucket */
-	struct HashJoinTupleData **buckets;
+	HashJoinBucketHead *buckets;
 	/* buckets array is per-batch storage, as are all the tuples */
 
 	bool		keepNulls;		/* true to store unmatchable NULL tuples */
@@ -185,7 +322,84 @@ typedef struct HashJoinTableData
 	MemoryContext batchCxt;		/* context for this-batch-only storage */
 
 	/* used for dense allocation of tuples (into linked chunks) */
-	HashMemoryChunk chunks;		/* one list for the whole batch */
+	HashMemoryChunk chunk;		/* current chunk  */
+	HashMemoryChunk chunk_preload;	/* current chunk for next batch */
+	HashMemoryChunk chunks_to_rebucket;	/* after resizing table */
+	HashMemoryChunk chunks_to_shrink;
+	int chunk_unmatched_pos;	/* head when scanning for unmatched tuples */
+
+	/* State for coordinating shared tables for parallel hash joins. */
+	dsa_area *area;
+	SharedHashJoinTableData *shared;	/* the shared state */
+	int attached_at_phase;				/* the phase this participant joined */
+	bool detached_early;				/* did we decide to detach early? */
+	HashJoinBatchReader batch_reader;	/* state for reading batches in */
+	bool preloaded_spare_tuple;			/* is there an extra preloaded tuple? */
+	uint32 preloaded_spare_tuple_hash;	/* the tuple's hash value if so */
+	dsa_pointer chunk_shared;			/* DSA pointer to 'chunk' */
+	dsa_pointer chunk_preload_shared;	/* DSA pointer to 'chunk_preload' */
+
 }	HashJoinTableData;
 
+/* Check if a HashJoinTable is shared by parallel workers. */
+#define HashJoinTableIsShared(table) ((table)->shared != NULL)
+
+/* The phases of a parallel hash join. */
+#define PHJ_PHASE_BEGINNING				0
+#define PHJ_PHASE_CREATING				1
+#define PHJ_PHASE_HASHING				2
+#define PHJ_PHASE_RESIZING				3
+#define PHJ_PHASE_REBUCKETING			4
+#define PHJ_PHASE_PROBING				5	/* PHJ_PHASE_PROBING_BATCH(0) */
+#define PHJ_PHASE_UNMATCHED				6	/* PHJ_PHASE_UNMATCHED_BATCH(0) */
+
+/* The subphases for batches. */
+#define PHJ_SUBPHASE_PROMOTING			0
+#define PHJ_SUBPHASE_LOADING			1
+#define PHJ_SUBPHASE_PREPARING			2
+#define PHJ_SUBPHASE_PROBING			3
+#define PHJ_SUBPHASE_UNMATCHED			4
+
+/* The phases of parallel processing for batch(n). */
+#define PHJ_PHASE_PROMOTING_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 5 - 4)
+#define PHJ_PHASE_LOADING_BATCH(n)		(PHJ_PHASE_UNMATCHED + (n) * 5 - 3)
+#define PHJ_PHASE_PREPARING_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 5 - 2)
+#define PHJ_PHASE_PROBING_BATCH(n)		(PHJ_PHASE_UNMATCHED + (n) * 5 - 1)
+#define PHJ_PHASE_UNMATCHED_BATCH(n)	(PHJ_PHASE_UNMATCHED + (n) * 5 - 0)
+
+/* Phase number -> sub-phase within a batch. */
+#define PHJ_PHASE_TO_SUBPHASE(p)										\
+	(((int)(p) - PHJ_PHASE_UNMATCHED + PHJ_SUBPHASE_UNMATCHED) % 5)
+
+/* Phase number -> batch number. */
+#define PHJ_PHASE_TO_BATCHNO(p)											\
+	(((int)(p) - PHJ_PHASE_UNMATCHED + PHJ_SUBPHASE_UNMATCHED) / 5)
+
+/*
+ * Is a given phase one in which a new hash table array is being assigned by
+ * one elected backend?  That includes initial creation, reallocation during
+ * resize, and promotion of secondary hash table to primary.  Workers that
+ * show up and attach at an arbitrary time must wait such phases out before
+ * doing anything with the hash table.
+ */
+#define PHJ_PHASE_MUTATING_TABLE(p)									\
+	((p) == PHJ_PHASE_CREATING ||									\
+	 (p) == PHJ_PHASE_RESIZING ||									\
+	 (PHJ_PHASE_TO_BATCHNO(p) > 0 &&								\
+	  PHJ_PHASE_TO_SUBPHASE(p) == PHJ_SUBPHASE_PROMOTING))
+
+/* The phases of ExecHashShrink. */
+#define PHJ_SHRINK_PHASE_BEGINNING		0
+#define PHJ_SHRINK_PHASE_CLEARING		1
+#define PHJ_SHRINK_PHASE_WORKING		2
+#define PHJ_SHRINK_PHASE_DECIDING		3
+
+/*
+ * Return the 'participant number' for a process participating in a parallel
+ * hash join.  We give a number < hashtable->shared->planned_participants
+ * to each potential participant, including the leader.
+ */
+#define HashJoinParticipantNumber() \
+	(IsParallelWorker() ? ParallelWorkerNumber + 1 : 0)
+
 #endif   /* HASHJOIN_H */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index fe5c264..a7a5c6e 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -22,12 +22,12 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(Hash *node, List *hashOperators,
+extern HashJoinTable ExecHashTableCreate(HashState *node, List *hashOperators,
 					bool keepNulls);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
-extern void ExecHashTableInsert(HashJoinTable hashtable,
+extern bool ExecHashTableInsert(HashJoinTable hashtable,
 					TupleTableSlot *slot,
-					uint32 hashvalue);
+					uint32 hashvalue, bool secondary);
 extern bool ExecHashGetHashValue(HashJoinTable hashtable,
 					 ExprContext *econtext,
 					 List *hashkeys,
@@ -45,9 +45,14 @@ extern bool ExecScanHashTableForUnmatched(HashJoinState *hjstate,
 extern void ExecHashTableReset(HashJoinTable hashtable);
 extern void ExecHashTableResetMatchFlags(HashJoinTable hashtable);
 extern void ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
+						bool shared, int parallel_workers,
+						size_t *spaceAllowed,
 						int *numbuckets,
 						int *numbatches,
 						int *num_skew_mcvs);
 extern int	ExecHashGetSkewBucket(HashJoinTable hashtable, uint32 hashvalue);
+extern void ExecHashUpdate(HashJoinTable hashtable);
+extern bool ExecHashCheckForEarlyExit(HashJoinTable hashtable);
+extern void ExecHashRebucket(HashJoinTable hashtable);
 
 #endif   /* NODEHASH_H */
diff --git a/src/include/executor/nodeHashjoin.h b/src/include/executor/nodeHashjoin.h
index ddc32b1..ef7d935 100644
--- a/src/include/executor/nodeHashjoin.h
+++ b/src/include/executor/nodeHashjoin.h
@@ -14,15 +14,28 @@
 #ifndef NODEHASHJOIN_H
 #define NODEHASHJOIN_H
 
+#include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "storage/buffile.h"
+#include "storage/shm_toc.h"
 
 extern HashJoinState *ExecInitHashJoin(HashJoin *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecHashJoin(HashJoinState *node);
 extern void ExecEndHashJoin(HashJoinState *node);
+extern void ExecShutdownHashJoin(HashJoinState *node);
 extern void ExecReScanHashJoin(HashJoinState *node);
 
-extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr);
+extern void ExecHashJoinSaveTuple(HashJoinTable hashtable,
+					  MinimalTuple tuple, uint32 hashvalue,
+					  int batchno, bool inner);
+extern void ExecHashJoinRewindBatches(HashJoinTable hashtable, int batchno);
+extern void ExecHashJoinOpenBatch(HashJoinTable hashtable,
+					  int batchno, bool inner);
+extern void ExecHashJoinCloseBatch(HashJoinTable hashtable,
+					  int batchno, bool inner);
+
+extern void ExecHashJoinEstimate(HashJoinState *state, ParallelContext *pcxt);
+extern void ExecHashJoinInitializeDSM(HashJoinState *state, ParallelContext *pcxt);
+extern void ExecHashJoinInitializeWorker(HashJoinState *state, shm_toc *toc);
 
 #endif   /* NODEHASHJOIN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ce13bf7..deb8497 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -1755,6 +1756,7 @@ typedef struct MergeJoinState
 /* these structs are defined in executor/hashjoin.h: */
 typedef struct HashJoinTupleData *HashJoinTuple;
 typedef struct HashJoinTableData *HashJoinTable;
+typedef struct SharedHashJoinTableData *SharedHashJoinTable;
 
 typedef struct HashJoinState
 {
@@ -1776,6 +1778,7 @@ typedef struct HashJoinState
 	int			hj_JoinState;
 	bool		hj_MatchedOuter;
 	bool		hj_OuterNotEmpty;
+	SharedHashJoinTable hj_sharedHashJoinTable;
 } HashJoinState;
 
 
@@ -2006,6 +2009,9 @@ typedef struct HashState
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
 	List	   *hashkeys;		/* list of ExprState nodes */
 	/* hashkeys is same as parent's hj_InnerHashKeys */
+
+	/* The following are the same as the parent's. */
+	SharedHashJoinTable shared_table_data;
 } HashState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 692a626..6d1460b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -782,6 +782,7 @@ typedef struct Hash
 	bool		skewInherit;	/* is outer join rel an inheritance tree? */
 	Oid			skewColType;	/* datatype of the outer key column */
 	int32		skewColTypmod;	/* typmod of the outer key column */
+	bool		shared_table;	/* table shared by multiple workers? */
 	/* all other info is in the parent HashJoin node */
 } Hash;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e1d31c7..43f9515 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1258,6 +1258,16 @@ typedef struct MergePath
 	bool		materialize_inner;		/* add Materialize to inner? */
 } MergePath;
 
+typedef enum
+{
+	/* Every worker builds its own private copy of the hash table. */
+	HASHPATH_TABLE_PRIVATE,
+	/* One worker builds a shared hash table, and all workers probe it. */
+	HASHPATH_TABLE_SHARED_SERIAL,
+	/* All workers build a shared hash table, and then probe it. */
+	HASHPATH_TABLE_SHARED_PARALLEL
+} HashPathTableType;
+
 /*
  * A hashjoin path has these fields.
  *
@@ -1272,6 +1282,7 @@ typedef struct HashPath
 	JoinPath	jpath;
 	List	   *path_hashclauses;		/* join clauses used for hashing */
 	int			num_batches;	/* number of batches expected */
+	HashPathTableType table_type;		/* level of sharedness */
 } HashPath;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 39376ec..220c013 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -24,6 +24,7 @@
 #define DEFAULT_SEQ_PAGE_COST  1.0
 #define DEFAULT_RANDOM_PAGE_COST  4.0
 #define DEFAULT_CPU_TUPLE_COST	0.01
+#define DEFAULT_CPU_SHARED_TUPLE_COST 0.001
 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005
 #define DEFAULT_CPU_OPERATOR_COST  0.0025
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
@@ -48,6 +49,7 @@ typedef enum
 extern PGDLLIMPORT double seq_page_cost;
 extern PGDLLIMPORT double random_page_cost;
 extern PGDLLIMPORT double cpu_tuple_cost;
+extern PGDLLIMPORT double cpu_shared_tuple_cost;
 extern PGDLLIMPORT double cpu_index_tuple_cost;
 extern PGDLLIMPORT double cpu_operator_cost;
 extern PGDLLIMPORT double parallel_tuple_cost;
@@ -144,7 +146,8 @@ extern void initial_cost_hashjoin(PlannerInfo *root,
 					  List *hashclauses,
 					  Path *outer_path, Path *inner_path,
 					  SpecialJoinInfo *sjinfo,
-					  SemiAntiJoinFactors *semifactors);
+					  SemiAntiJoinFactors *semifactors,
+					  HashPathTableType table_type);
 extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 					JoinCostWorkspace *workspace,
 					SpecialJoinInfo *sjinfo,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d16f879..42633c5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -134,7 +134,8 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
 					 Path *inner_path,
 					 List *restrict_clauses,
 					 Relids required_outer,
-					 List *hashclauses);
+					 List *hashclauses,
+					 HashPathTableType table_type);
 
 extern ProjectionPath *create_projection_path(PlannerInfo *root,
 					   RelOptInfo *rel,
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5b37894..f54b0a5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -785,7 +785,26 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_HASH_CREATING,
+	WAIT_EVENT_HASH_HASHING,
+	WAIT_EVENT_HASH_RESIZING,
+	WAIT_EVENT_HASH_REBUCKETING,
+	WAIT_EVENT_HASH_BEGINNING,
+	WAIT_EVENT_HASH_DESTROY,
+	WAIT_EVENT_HASH_UNMATCHED,
+	WAIT_EVENT_HASH_PROMOTING,
+	WAIT_EVENT_HASH_SHRINKING1,
+	WAIT_EVENT_HASH_SHRINKING2,
+	WAIT_EVENT_HASH_SHRINKING3,
+	WAIT_EVENT_HASH_SHRINKING4,
+	WAIT_EVENT_HASHJOIN_PROMOTING,
+	WAIT_EVENT_HASHJOIN_PROBING,
+	WAIT_EVENT_HASHJOIN_LOADING,
+	WAIT_EVENT_HASHJOIN_PREPARING,
+	WAIT_EVENT_HASHJOIN_REWINDING,
+	WAIT_EVENT_HASHJOIN_REWINDING2, /* TODO: rename me */
+	WAIT_EVENT_HASHJOIN_REWINDING3 /* TODO: rename me */
 } WaitEventIPC;
 
 /* ----------
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index fe00bf0..023eb3f 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -30,12 +30,17 @@
 
 typedef struct BufFile BufFile;
 
+typedef struct BufFileDescriptor BufFileDescriptor;
+
 /*
  * prototypes for functions in buffile.c
  */
 
 extern BufFile *BufFileCreateTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
+extern BufFileDescriptor *BufFileExport(BufFile *file);
+extern BufFile *BufFileImport(BufFileDescriptor *descriptor);
+extern size_t BufFileDescriptorSize(const BufFileDescriptor *descriptor);
 extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern size_t BufFileWrite(BufFile *file, void *ptr, size_t size);
 extern int	BufFileSeek(BufFile *file, int fileno, off_t offset, int whence);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 8bd93c3..dd6d48e 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -211,6 +211,9 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_PARALLEL_HASH_JOIN_INNER_BATCH_READER,
+	LWTRANCHE_PARALLEL_HASH_JOIN_OUTER_BATCH_READER,
+	LWTRANCHE_PARALLEL_HASH_JOIN_CHUNK,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#8)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Jan 7, 2017 at 9:01 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Tue, Jan 3, 2017 at 10:53 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I will post a new rebased version soon with that and
some other nearby problems fixed.

Here is a new WIP patch.

I forgot to mention: this applies on top of barrier-v5.patch, over here:

/messages/by-id/CAEepm=3g3EC734kgriWseiJPfUQZeoMWdhAfzOc0ecewAa5uXg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#8)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Jan 7, 2017 at 9:01 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Tue, Jan 3, 2017 at 10:53 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I will post a new rebased version soon with that and
some other nearby problems fixed.

Here is a new WIP patch.

To make this easier to understand and harmonise the logic used in a
few places, I'm now planning to chop it up into a patch series,
probably something like this:

1. Change existing hash join code to use chunk-based accounting
2. Change existing hash join code to use a new interface for dealing
with batches
3. Add shared hash join support, single batch only
4. Add components for doing shared batch reading (unused)
5. Add multi-batch shared hash join support

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Peter Geoghegan

pg@heroku.com

about 9 years ago

In reply to: Thomas Munro (#8)

Re: WIP: [[Parallel] Shared] Hash

On Fri, Jan 6, 2017 at 12:01 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here is a new WIP patch. I have plenty of things to tidy up (see note
at end), but the main ideas are now pretty clear and I'd appreciate
some feedback.

I have some review feedback for your V3. I've chosen to start with the
buffile.c stuff, since of course it might share something with my
parallel tuplesort patch. This isn't comprehensive, but I will have
more comprehensive feedback soon.

I'm not surprised that you've generally chosen to make shared BufFile
management as simple as possible, with no special infrastructure other
than the ability to hold open other backend temp files concurrently
within a worker, and no writing to another worker's temp file, or
shared read pointer. As you put it, everything is immutable. I
couldn't see much opportunity for adding a lot of infrastructure that
wasn't written explicitly as parallel hash join code/infrastructure.
My sense is that that was a good decision. I doubted that you'd ever
want some advanced, generic shared BufFile thing with multiple read
pointers, built-in cache coherency, etc. (Robert seemed to think that
you'd go that way, though.)

Anyway, some more specific observations:

* ISTM that this is the wrong thing for shared BufFiles:

+BufFile *
+BufFileImport(BufFileDescriptor *descriptor)
+{

...

+ file->isInterXact = true; /* prevent cleanup by this backend */

There is only one user of isInterXact = true BufFiles at present,
tuplestore.c. It, in turn, only does so for cases that require
persistent tuple stores. A quick audit of these tuplestore.c callers
show this to just be cursor support code within portalmem.c. Here is
the relevant tuplestore_begin_heap() rule that that code adheres to,
unlike the code I've quoted above:

* interXact: if true, the files used for on-disk storage persist beyond the
* end of the current transaction. NOTE: It's the caller's responsibility to
* create such a tuplestore in a memory context and resource owner that will
* also survive transaction boundaries, and to ensure the tuplestore is closed
* when it's no longer wanted.

I don't think it's right for buffile.c to know anything about file
paths directly -- I'd say that that's a modularity violation.
PathNameOpenFile() is called by very few callers at the moment, all of
them very low level (e.g. md.c), but you're using it within buffile.c
to open a path to the file that you obtain from shared memory
directly. This is buggy because the following code won't be reached in
workers that call your BufFileImport() function:

/* Mark it for deletion at close */
VfdCache[file].fdstate |= FD_TEMPORARY;

/* Register it with the current resource owner */
if (!interXact)
{
VfdCache[file].fdstate |= FD_XACT_TEMPORARY;

ResourceOwnerEnlargeFiles(CurrentResourceOwner);
ResourceOwnerRememberFile(CurrentResourceOwner, file);
VfdCache[file].resowner = CurrentResourceOwner;

/* ensure cleanup happens at eoxact */
have_xact_temporary_files = true;
}

Certainly, you don't want the "Mark it for deletion at close" bit.
Deletion should not happen at eoxact for non-owners-but-sharers
(within FileClose()), but you *do* want CleanupTempFiles() to call
FileClose() for the virtual file descriptors you've opened in the
backend, to do some other cleanup. In general, you want to buy into
resource ownership for workers. As things stand, I think that this
will leak virtual file descriptors. That's really well hidden because
there is a similar CleanupTempFiles() call at proc exit, I think.
(Didn't take the time to make sure that that's what masked problems.
I'm sure that you want minimal divergence with serial cases,
resource-ownership-wise, in any case.)

Instead of all this, I suggest copying some of my changes to fd.c, so
that resource ownership within fd.c differentiates between a vfd that
is owned by the backend in the conventional sense, including having a
need to delete at eoxact, as well as a lesser form of ownership where
deletion should not happen. Maybe you'll end up using my BufFileUnify
interface [1]https://wiki.postgresql.org/wiki/Parallel_External_Sort#buffile.c.2C_and_BufFile_unification -- Peter Geoghegan within workers (instead of just within the leader, as
with parallel tuplesort), and have it handle all of that for you.
Currently, that would mean that there'd be an unused/0 sized "local"
segment for the unified BufFile, but I was thinking of making that not
happen unless and until a new segment is actually needed, so even that
minor wart wouldn't necessarily affect you.

Some assorted notes on the status: I need to do some thinking about
the file cleanup logic: both explicit deletes at the earliest possible
time, and failure/error paths. Currently the creator of each file is
responsible for cleaning it up, but I guess if the creator aborts
early the file disappears underneath the others' feet, and then I
guess they might raise a confusing error report that races against the
root cause error report; I'm looking into that. Rescans and skew
buckets not finished yet.

The rescan code path seems to segfault when the regression tests are
run. There is a NULL pointer dereference here:

@@ -985,6 +1855,14 @@ ExecReScanHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
node->hj_JoinState = HJ_BUILD_HASHTABLE;

+           if (HashJoinTableIsShared(node->hj_HashTable))
+           {
+               /* Coordinate a rewind to the shared hash table creation phase. */
+               BarrierWaitSet(&hashNode->shared_table_data->barrier,
+                              PHJ_PHASE_BEGINNING,
+                              WAIT_EVENT_HASHJOIN_REWINDING3);
+           }
+

Clearly, HashJoinTableIsShared() should not be called when its
argument (in this case node->hj_HashTable) is NULL.

In general, I think you should try to set expectations about what
happens when the regression tests run up front, because that's usually
the first thing reviewers do.

Various compiler warnings on my system:

/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHash.c:1376:7:
warning: variable ‘size_before_shrink’ set but not used
[-Wunused-but-set-variable]
Size size_before_shrink = 0;
^
...

/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:
In function ‘ExecHashJoinCloseBatch’:
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:1548:28:
warning: variable ‘participant’ set but not used
[-Wunused-but-set-variable]
HashJoinParticipantState *participant;
^
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:
In function ‘ExecHashJoinRewindBatches’:
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:1587:23:
warning: variable ‘batch_reader’ set but not used
[-Wunused-but-set-variable]
HashJoinBatchReader *batch_reader;
^

Is this change really needed?:

--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,6 +31,8 @@
#include "executor/nodeSeqscan.h"
#include "utils/rel.h"

+#include <unistd.h>
+
static void InitScanRelation(SeqScanState *node, EState *estate, int eflags);
static TupleTableSlot *SeqNext(SeqScanState *node);

That's all I have for now...

[1]: https://wiki.postgresql.org/wiki/Parallel_External_Sort#buffile.c.2C_and_BufFile_unification -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Peter Geoghegan (#11)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Jan 10, 2017 at 8:56 PM, Peter Geoghegan <pg@heroku.com> wrote:

Instead of all this, I suggest copying some of my changes to fd.c, so
that resource ownership within fd.c differentiates between a vfd that
is owned by the backend in the conventional sense, including having a
need to delete at eoxact, as well as a lesser form of ownership where
deletion should not happen.

If multiple processes are using the same file via the BufFile
interface, I think that it is absolutely necessary that there should
be a provision to track the "attach count" of the BufFile. Each
process that reaches EOXact decrements the attach count and when it
reaches 0, the process that reduced it to 0 removes the BufFile. I
think anything that's based on the notion that leaders will remove
files and workers won't is going to be fragile and limiting, and I am
going to push hard against any such proposal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Peter Geoghegan

pg@heroku.com

about 9 years ago

In reply to: Robert Haas (#12)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Jan 11, 2017 at 10:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 10, 2017 at 8:56 PM, Peter Geoghegan <pg@heroku.com> wrote:

Instead of all this, I suggest copying some of my changes to fd.c, so
that resource ownership within fd.c differentiates between a vfd that
is owned by the backend in the conventional sense, including having a
need to delete at eoxact, as well as a lesser form of ownership where
deletion should not happen.

If multiple processes are using the same file via the BufFile
interface, I think that it is absolutely necessary that there should
be a provision to track the "attach count" of the BufFile. Each
process that reaches EOXact decrements the attach count and when it
reaches 0, the process that reduced it to 0 removes the BufFile. I
think anything that's based on the notion that leaders will remove
files and workers won't is going to be fragile and limiting, and I am
going to push hard against any such proposal.

Okay. My BufFile unification approach happens to assume that backends
clean up after themselves, but that isn't a ridged assumption (of
course, these are always temp files, so we reason about them as temp
files). It could be based on a refcount fairly easily, such that, as
you say here, deletion of files occurs within workers (that "own" the
files) only as a consequence of their being the last backend with a
reference, that must therefore "turn out the lights" (delete the
file). That seems consistent with what I've done within fd.c, and what
I suggested to Thomas (that he more or less follow that approach).
You'd probably still want to throw an error when workers ended up not
deleting BufFile segments they owned, though, at least for parallel
tuplesort.

This idea is something that's much more limited than the
SharedTemporaryFile() API that you sketched on the parallel sort
thread, because it only concerns resource management, and not how to
make access to the shared file concurrency safe in any special,
standard way. I think that this resource management is something that
should be managed by buffile.c (and the temp file routines within fd.c
that are morally owned by buffile.c, their only caller). It shouldn't
be necessary for a client of this new infrastructure, such as parallel
tuplesort or parallel hash join, to know anything about file paths.
Instead, they should be passing around some kind of minimal
private-to-buffile state in shared memory that coordinates backends
participating in BufFile unification. Private state created by
buffile.c, and passed back to buffile.c. Everything should be
encapsulated within buffile.c, IMV, making parallel implementations as
close as possible to their serial implementations.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Peter Geoghegan (#13)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Jan 11, 2017 at 2:20 PM, Peter Geoghegan <pg@heroku.com> wrote:

You'd probably still want to throw an error when workers ended up not
deleting BufFile segments they owned, though, at least for parallel
tuplesort.

Don't see why.

This idea is something that's much more limited than the
SharedTemporaryFile() API that you sketched on the parallel sort
thread, because it only concerns resource management, and not how to
make access to the shared file concurrency safe in any special,
standard way.

Actually, I only intended that sketch to be about resource management.
Sounds like I didn't explain very well.

Instead, they should be passing around some kind of minimal
private-to-buffile state in shared memory that coordinates backends
participating in BufFile unification. Private state created by
buffile.c, and passed back to buffile.c. Everything should be
encapsulated within buffile.c, IMV, making parallel implementations as
close as possible to their serial implementations.

That seems reasonable although I haven't studied the details carefully as yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Peter Geoghegan

pg@heroku.com

about 9 years ago

In reply to: Peter Geoghegan (#13)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Jan 11, 2017 at 11:20 AM, Peter Geoghegan <pg@heroku.com> wrote:

If multiple processes are using the same file via the BufFile
interface, I think that it is absolutely necessary that there should
be a provision to track the "attach count" of the BufFile. Each
process that reaches EOXact decrements the attach count and when it
reaches 0, the process that reduced it to 0 removes the BufFile. I
think anything that's based on the notion that leaders will remove
files and workers won't is going to be fragile and limiting, and I am
going to push hard against any such proposal.

Okay. My BufFile unification approach happens to assume that backends
clean up after themselves, but that isn't a ridged assumption (of
course, these are always temp files, so we reason about them as temp
files).

Also, to be clear, and to avoid confusion: I don't think anyone wants
an approach "that's based on the notion that leaders will remove files
and workers won't". All that has been suggested is that the backend
that creates the file should be responsible for deleting the file, by
definition. And, that any other backend that may have files owned by
another backend must be sure to not try to access them after the owner
deletes them. (Typically, that would be ensured by some barrier
condition, some dependency, inherent to how the parallel operation is
implemented.)

I will implement the reference count thing.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Peter Geoghegan

pg@heroku.com

about 9 years ago

In reply to: Robert Haas (#14)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Jan 11, 2017 at 12:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 11, 2017 at 2:20 PM, Peter Geoghegan <pg@heroku.com> wrote:

You'd probably still want to throw an error when workers ended up not
deleting BufFile segments they owned, though, at least for parallel
tuplesort.

Don't see why.

Simply because that's not expected as things stand -- why should the
file go away in that context? (Admittedly, that doesn't seem like an
excellent reason now.)

I actually like the idea of a reference count, the more I think about
it, since it doesn't actually have any tension with my original idea
of ownership. If something like a randomAccess parallel tuplesort
leader merge needs to write new segments (which it almost certainly
*won't* anyway, due to my recent V7 changes), then it can still own
those new segments itself, alone, and delete them on its own in the
manner of conventional temp files, because we can still restrict the
shared refcount mechanism to the deletion of "initial" segments. The
refcount == 0 deleter only deletes those initial segments, and not any
same-BufFile segments that might have been added (added to append to
our unified BufFile within leader). I think that parallel hash join
won't use this at all, and, as I said, it's only a theoretical
requirement for parallel tuplesort, which will generally recycle
blocks from worker temp files for its own writes all the time for
randomAccess cases, the only cases that ever write within logtape.c.

So, the only BufFile shared state needed, that must be maintained over
time, is the refcount variable itself. The size of the "initial"
BufFile (from which we derive number of new segments during
unification) is passed, but it doesn't get maintained in shared
memory. BufFile size remains a one way, one time message needed during
unification. I only really need to tweak things in fd.c temp routines
to make all this work.

This is something I like because it makes certain theoretically useful
things easier. Say, for example, we wanted to have tuplesort workers
merge worker final materialized tapes (their final output), in order
to arrange for the leader to have fewer than $NWORKER runs to merge at
the end -- that's made easier by the refcount stuff. (I'm still not
convinced that that's actually going to make CREATE INDEX faster.
Still, it should, on general principle, be easy to write a patch that
makes it happen -- a good overall design should leave things so that
writing that prototype patch is easy.)

This idea is something that's much more limited than the
SharedTemporaryFile() API that you sketched on the parallel sort
thread, because it only concerns resource management, and not how to
make access to the shared file concurrency safe in any special,
standard way.

Actually, I only intended that sketch to be about resource management.
Sounds like I didn't explain very well.

I'm glad to hear that, because I was very puzzled by what you said. I
guess I was thrown off by "shared read pointers". I don't want to get
into the business of flushing out dirty buffers, or making sure that
every backend stays in lockstep about what the total size of the
BufFile needs to be. It's so much simpler to just have clear
"barriers" for each parallel operation, where backends present a large
amount of immutable state to one other backend at the end, and tells
it how big its BufFile is only once. (It's not quite immutable, since
randomAccess recycle of temp files can happen within logtape.c, but
the point is that there should be very little back and forth -- that
needs to be severely restricted.)

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Peter Geoghegan (#11)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Jan 11, 2017 at 2:56 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Jan 6, 2017 at 12:01 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here is a new WIP patch. I have plenty of things to tidy up (see note
at end), but the main ideas are now pretty clear and I'd appreciate
some feedback.

I have some review feedback for your V3. I've chosen to start with the
buffile.c stuff, since of course it might share something with my
parallel tuplesort patch. This isn't comprehensive, but I will have
more comprehensive feedback soon.

Thanks!

I'm not surprised that you've generally chosen to make shared BufFile
management as simple as possible, with no special infrastructure other
than the ability to hold open other backend temp files concurrently
within a worker, and no writing to another worker's temp file, or
shared read pointer. As you put it, everything is immutable. I
couldn't see much opportunity for adding a lot of infrastructure that
wasn't written explicitly as parallel hash join code/infrastructure.
My sense is that that was a good decision. I doubted that you'd ever
want some advanced, generic shared BufFile thing with multiple read
pointers, built-in cache coherency, etc. (Robert seemed to think that
you'd go that way, though.)

Right, this is extremely minimalist infrastructure. fd.c is
unchanged. buffile.c only gains the power to export/import read-only
views of BufFiles. There is no 'unification' of BufFiles: each hash
join participant simply reads from the buffile it wrote, and then
imports and reads from its peers' BufFiles, until all are exhausted;
so the 'unification' is happening in caller code which knows about the
set of participants and manages shared read positions. Clearly there
are some ownership/cleanup issues to straighten out, but I think those
problems are fixable (probably involving refcounts).

I'm entirely willing to throw that away and use the unified BufFile
concept, if it can be extended to support multiple readers of the
data, where every participant unifies the set of files. I have so far
assumed that it would be most efficient for each participant to read
from the file that it wrote before trying to read from files written
by other participants. I'm reading your patch now; more soon.

Anyway, some more specific observations:

* ISTM that this is the wrong thing for shared BufFiles:
+BufFile *
+BufFileImport(BufFileDescriptor *descriptor)
+{
...

+ file->isInterXact = true; /* prevent cleanup by this backend */

There is only one user of isInterXact = true BufFiles at present,
tuplestore.c. It, in turn, only does so for cases that require
persistent tuple stores. A quick audit of these tuplestore.c callers
show this to just be cursor support code within portalmem.c. Here is
the relevant tuplestore_begin_heap() rule that that code adheres to,
unlike the code I've quoted above:

* interXact: if true, the files used for on-disk storage persist beyond the
* end of the current transaction. NOTE: It's the caller's responsibility to
* create such a tuplestore in a memory context and resource owner that will
* also survive transaction boundaries, and to ensure the tuplestore is closed
* when it's no longer wanted.

Hmm. Yes, that is an entirely bogus use of isInterXact. I am
thinking about how to fix that with refcounts.

I don't think it's right for buffile.c to know anything about file
paths directly -- I'd say that that's a modularity violation.
PathNameOpenFile() is called by very few callers at the moment, all of
them very low level (e.g. md.c), but you're using it within buffile.c
to open a path to the file that you obtain from shared memory

Hmm. I'm not seeing the modularity violation. buffile.c uses
interfaces already exposed by fd.c to do this: OpenTemporaryFile,
then FilePathName to find the path, then PathNameOpenFile to open from
another process. I see that your approach instead has client code
provide more meta data so that things can be discovered, which may
well be a much better idea.

directly. This is buggy because the following code won't be reached in
workers that call your BufFileImport() function:

/* Mark it for deletion at close */
VfdCache[file].fdstate |= FD_TEMPORARY;

/* Register it with the current resource owner */
if (!interXact)
{
VfdCache[file].fdstate |= FD_XACT_TEMPORARY;

ResourceOwnerEnlargeFiles(CurrentResourceOwner);
ResourceOwnerRememberFile(CurrentResourceOwner, file);
VfdCache[file].resowner = CurrentResourceOwner;

/* ensure cleanup happens at eoxact */
have_xact_temporary_files = true;
}

Right, that is a problem. A refcount mode could fix that; virtual
file descriptors would be closed in every backend using the current
resource owner, and the files would be deleted when the last one turns
out the lights.

Certainly, you don't want the "Mark it for deletion at close" bit.
Deletion should not happen at eoxact for non-owners-but-sharers
(within FileClose()), but you *do* want CleanupTempFiles() to call
FileClose() for the virtual file descriptors you've opened in the
backend, to do some other cleanup. In general, you want to buy into
resource ownership for workers. As things stand, I think that this
will leak virtual file descriptors. That's really well hidden because
there is a similar CleanupTempFiles() call at proc exit, I think.
(Didn't take the time to make sure that that's what masked problems.
I'm sure that you want minimal divergence with serial cases,
resource-ownership-wise, in any case.)

Instead of all this, I suggest copying some of my changes to fd.c, so
that resource ownership within fd.c differentiates between a vfd that
is owned by the backend in the conventional sense, including having a
need to delete at eoxact, as well as a lesser form of ownership where
deletion should not happen. Maybe you'll end up using my BufFileUnify
interface [1] within workers (instead of just within the leader, as
with parallel tuplesort), and have it handle all of that for you.
Currently, that would mean that there'd be an unused/0 sized "local"
segment for the unified BufFile, but I was thinking of making that not
happen unless and until a new segment is actually needed, so even that
minor wart wouldn't necessarily affect you.

Ok, I'm studying that code now.

Some assorted notes on the status: I need to do some thinking about
the file cleanup logic: both explicit deletes at the earliest possible
time, and failure/error paths. Currently the creator of each file is
responsible for cleaning it up, but I guess if the creator aborts
early the file disappears underneath the others' feet, and then I
guess they might raise a confusing error report that races against the
root cause error report; I'm looking into that. Rescans and skew
buckets not finished yet.

The rescan code path seems to segfault when the regression tests are
run. There is a NULL pointer dereference here:
@@ -985,6 +1855,14 @@ ExecReScanHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
node->hj_JoinState = HJ_BUILD_HASHTABLE;
+           if (HashJoinTableIsShared(node->hj_HashTable))
+           {
+               /* Coordinate a rewind to the shared hash table creation phase. */
+               BarrierWaitSet(&hashNode->shared_table_data->barrier,
+                              PHJ_PHASE_BEGINNING,
+                              WAIT_EVENT_HASHJOIN_REWINDING3);
+           }
+
Clearly, HashJoinTableIsShared() should not be called when its
argument (in this case node->hj_HashTable) is NULL.

In general, I think you should try to set expectations about what
happens when the regression tests run up front, because that's usually
the first thing reviewers do.

Apologies, poor form. That block can be commented out for now because
rescan support is obviously incomplete, and I didn't mean to post it
that way. Doing so reveals two remaining test failures: "join" and
"rowsecurity" managed to lose a couple of rows. Oops. I will figure
out what I broke and have a fix for that in my next version.

Various compiler warnings on my system:

/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHash.c:1376:7:
warning: variable ‘size_before_shrink’ set but not used
[-Wunused-but-set-variable]
Size size_before_shrink = 0;
^

In this case it was only used in dtrace builds; I will make sure any
such code is compiled out when in non-dtrace builds.

/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:
In function ‘ExecHashJoinCloseBatch’:
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:1548:28:
warning: variable ‘participant’ set but not used
[-Wunused-but-set-variable]
HashJoinParticipantState *participant;
^
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:
In function ‘ExecHashJoinRewindBatches’:
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/executor/nodeHashjoin.c:1587:23:
warning: variable ‘batch_reader’ set but not used
[-Wunused-but-set-variable]
HashJoinBatchReader *batch_reader;
^

Is this change really needed?:
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,6 +31,8 @@
#include "executor/nodeSeqscan.h"
#include "utils/rel.h"
+#include <unistd.h>
+
static void InitScanRelation(SeqScanState *node, EState *estate, int eflags);
static TupleTableSlot *SeqNext(SeqScanState *node);

Right, will clean up.

That's all I have for now...

Thanks! I'm away from my computer for a couple of days but will have
a new patch series early next week, and hope to have a better handle
on what's involved in adopting the 'unification' approach here
instead.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Rafia Sabih

rafia.sabih@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#17)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Jan 12, 2017 at 9:07 AM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Wed, Jan 11, 2017 at 2:56 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Jan 6, 2017 at 12:01 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here is a new WIP patch. I have plenty of things to tidy up (see note
at end), but the main ideas are now pretty clear and I'd appreciate
some feedback.

I have some review feedback for your V3. I've chosen to start with the
buffile.c stuff, since of course it might share something with my
parallel tuplesort patch. This isn't comprehensive, but I will have
more comprehensive feedback soon.

Thanks!

I'm not surprised that you've generally chosen to make shared BufFile
management as simple as possible, with no special infrastructure other
than the ability to hold open other backend temp files concurrently
within a worker, and no writing to another worker's temp file, or
shared read pointer. As you put it, everything is immutable. I
couldn't see much opportunity for adding a lot of infrastructure that
wasn't written explicitly as parallel hash join code/infrastructure.
My sense is that that was a good decision. I doubted that you'd ever
want some advanced, generic shared BufFile thing with multiple read
pointers, built-in cache coherency, etc. (Robert seemed to think that
you'd go that way, though.)

Right, this is extremely minimalist infrastructure. fd.c is
unchanged. buffile.c only gains the power to export/import read-only
views of BufFiles. There is no 'unification' of BufFiles: each hash
join participant simply reads from the buffile it wrote, and then
imports and reads from its peers' BufFiles, until all are exhausted;
so the 'unification' is happening in caller code which knows about the
set of participants and manages shared read positions. Clearly there
are some ownership/cleanup issues to straighten out, but I think those
problems are fixable (probably involving refcounts).

I'm entirely willing to throw that away and use the unified BufFile
concept, if it can be extended to support multiple readers of the
data, where every participant unifies the set of files. I have so far
assumed that it would be most efficient for each participant to read
from the file that it wrote before trying to read from files written
by other participants. I'm reading your patch now; more soon.
Anyway, some more specific observations:

* ISTM that this is the wrong thing for shared BufFiles:
+BufFile *
+BufFileImport(BufFileDescriptor *descriptor)
+{
...

+ file->isInterXact = true; /* prevent cleanup by this backend */

There is only one user of isInterXact = true BufFiles at present,
tuplestore.c. It, in turn, only does so for cases that require
persistent tuple stores. A quick audit of these tuplestore.c callers
show this to just be cursor support code within portalmem.c. Here is
the relevant tuplestore_begin_heap() rule that that code adheres to,
unlike the code I've quoted above:

* interXact: if true, the files used for on-disk storage persist beyond
the

* end of the current transaction. NOTE: It's the caller's

responsibility to

* create such a tuplestore in a memory context and resource owner that

will

* also survive transaction boundaries, and to ensure the tuplestore is

closed

* when it's no longer wanted.

Hmm. Yes, that is an entirely bogus use of isInterXact. I am
thinking about how to fix that with refcounts.

I don't think it's right for buffile.c to know anything about file
paths directly -- I'd say that that's a modularity violation.
PathNameOpenFile() is called by very few callers at the moment, all of
them very low level (e.g. md.c), but you're using it within buffile.c
to open a path to the file that you obtain from shared memory

Hmm. I'm not seeing the modularity violation. buffile.c uses
interfaces already exposed by fd.c to do this: OpenTemporaryFile,
then FilePathName to find the path, then PathNameOpenFile to open from
another process. I see that your approach instead has client code
provide more meta data so that things can be discovered, which may
well be a much better idea.

directly. This is buggy because the following code won't be reached in
workers that call your BufFileImport() function:

/* Mark it for deletion at close */
VfdCache[file].fdstate |= FD_TEMPORARY;

/* Register it with the current resource owner */
if (!interXact)
{
VfdCache[file].fdstate |= FD_XACT_TEMPORARY;

ResourceOwnerEnlargeFiles(CurrentResourceOwner);
ResourceOwnerRememberFile(CurrentResourceOwner, file);
VfdCache[file].resowner = CurrentResourceOwner;

/* ensure cleanup happens at eoxact */
have_xact_temporary_files = true;
}

Right, that is a problem. A refcount mode could fix that; virtual
file descriptors would be closed in every backend using the current
resource owner, and the files would be deleted when the last one turns
out the lights.

Certainly, you don't want the "Mark it for deletion at close" bit.
Deletion should not happen at eoxact for non-owners-but-sharers
(within FileClose()), but you *do* want CleanupTempFiles() to call
FileClose() for the virtual file descriptors you've opened in the
backend, to do some other cleanup. In general, you want to buy into
resource ownership for workers. As things stand, I think that this
will leak virtual file descriptors. That's really well hidden because
there is a similar CleanupTempFiles() call at proc exit, I think.
(Didn't take the time to make sure that that's what masked problems.
I'm sure that you want minimal divergence with serial cases,
resource-ownership-wise, in any case.)

Instead of all this, I suggest copying some of my changes to fd.c, so
that resource ownership within fd.c differentiates between a vfd that
is owned by the backend in the conventional sense, including having a
need to delete at eoxact, as well as a lesser form of ownership where
deletion should not happen. Maybe you'll end up using my BufFileUnify
interface [1] within workers (instead of just within the leader, as
with parallel tuplesort), and have it handle all of that for you.
Currently, that would mean that there'd be an unused/0 sized "local"
segment for the unified BufFile, but I was thinking of making that not
happen unless and until a new segment is actually needed, so even that
minor wart wouldn't necessarily affect you.

Ok, I'm studying that code now.
Some assorted notes on the status: I need to do some thinking about
the file cleanup logic: both explicit deletes at the earliest possible
time, and failure/error paths. Currently the creator of each file is
responsible for cleaning it up, but I guess if the creator aborts
early the file disappears underneath the others' feet, and then I
guess they might raise a confusing error report that races against the
root cause error report; I'm looking into that. Rescans and skew
buckets not finished yet.

The rescan code path seems to segfault when the regression tests are
run. There is a NULL pointer dereference here:
@@ -985,6 +1855,14 @@ ExecReScanHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
node->hj_JoinState = HJ_BUILD_HASHTABLE;
+           if (HashJoinTableIsShared(node->hj_HashTable))
+           {
+               /* Coordinate a rewind to the shared hash table
creation phase. */
+               BarrierWaitSet(&hashNode->shared_table_data->barrier,
+                              PHJ_PHASE_BEGINNING,
+                              WAIT_EVENT_HASHJOIN_REWINDING3);
+           }
+
Clearly, HashJoinTableIsShared() should not be called when its
argument (in this case node->hj_HashTable) is NULL.

In general, I think you should try to set expectations about what
happens when the regression tests run up front, because that's usually
the first thing reviewers do.
Apologies, poor form. That block can be commented out for now because
rescan support is obviously incomplete, and I didn't mean to post it
that way. Doing so reveals two remaining test failures: "join" and
"rowsecurity" managed to lose a couple of rows. Oops. I will figure
out what I broke and have a fix for that in my next version.

Various compiler warnings on my system:

/home/pg/pgbuild/builds/root/../../postgresql/src/backend/

executor/nodeHash.c:1376:7:

warning: variable ‘size_before_shrink’ set but not used
[-Wunused-but-set-variable]
Size size_before_shrink = 0;
^

In this case it was only used in dtrace builds; I will make sure any
such code is compiled out when in non-dtrace builds.

/home/pg/pgbuild/builds/root/../../postgresql/src/backend/

executor/nodeHashjoin.c:

In function ‘ExecHashJoinCloseBatch’:
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/

executor/nodeHashjoin.c:1548:28:

warning: variable ‘participant’ set but not used
[-Wunused-but-set-variable]
HashJoinParticipantState *participant;
^
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/

executor/nodeHashjoin.c:

In function ‘ExecHashJoinRewindBatches’:
/home/pg/pgbuild/builds/root/../../postgresql/src/backend/

executor/nodeHashjoin.c:1587:23:
warning: variable ‘batch_reader’ set but not used
[-Wunused-but-set-variable]
HashJoinBatchReader *batch_reader;
^

Is this change really needed?:
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,6 +31,8 @@
#include "executor/nodeSeqscan.h"
#include "utils/rel.h"
+#include <unistd.h>
+
static void InitScanRelation(SeqScanState *node, EState *estate, int
eflags);

static TupleTableSlot *SeqNext(SeqScanState *node);

Right, will clean up.

That's all I have for now...

Thanks! I'm away from my computer for a couple of days but will have
a new patch series early next week, and hope to have a better handle
on what's involved in adopting the 'unification' approach here
instead.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Hi Thomas,
I was trying to analyse the performance of TPC-H queries with your patch
and came across following results,
Q9 and Q21 were crashing, both of them had following bt in core dump (I
thought it might be helpful),

#0 0x0000000010757da4 in pfree (pointer=0x3fff78d11000) at mcxt.c:1012
#1 0x000000001032c574 in ExecHashIncreaseNumBatches
(hashtable=0x1003af6da60) at nodeHash.c:1124
#2 0x000000001032d518 in ExecHashTableInsert (hashtable=0x1003af6da60,
slot=0x1003af695c0, hashvalue=2904801109, preload=1 '\001') at
nodeHash.c:1700
#3 0x0000000010330fd4 in ExecHashJoinPreloadNextBatch
(hjstate=0x1003af39118) at nodeHashjoin.c:886
#4 0x00000000103301fc in ExecHashJoin (node=0x1003af39118) at
nodeHashjoin.c:376
#5 0x0000000010308644 in ExecProcNode (node=0x1003af39118) at
execProcnode.c:490
#6 0x000000001031f530 in fetch_input_tuple (aggstate=0x1003af38910) at
nodeAgg.c:587
#7 0x0000000010322b50 in agg_fill_hash_table (aggstate=0x1003af38910) at
nodeAgg.c:2304
#8 0x000000001032239c in ExecAgg (node=0x1003af38910) at nodeAgg.c:1942
#9 0x0000000010308694 in ExecProcNode (node=0x1003af38910) at
execProcnode.c:509
#10 0x0000000010302a1c in ExecutePlan (estate=0x1003af37fa0,
planstate=0x1003af38910, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x1003af19390) at execMain.c:1587

In case you want to know, I was using TPC-H with 20 scale factor. Please
let me know if you want anymore information on this.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#19

Peter Geoghegan

pg@heroku.com

almost 9 years ago

In reply to: Thomas Munro (#17)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Jan 11, 2017 at 7:37 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Hmm. Yes, that is an entirely bogus use of isInterXact. I am
thinking about how to fix that with refcounts.

Cool. As I said, the way I'd introduce refcounts would not be very
different from what I've already done -- there'd still be a strong
adherence to the use of resource managers to clean-up, with that
including exactly one particular backend doing the extra step of
deletion. The refcount only changes which backend does that extra step
in corner cases, which is conceptually a very minor change.

I don't think it's right for buffile.c to know anything about file
paths directly -- I'd say that that's a modularity violation.
PathNameOpenFile() is called by very few callers at the moment, all of
them very low level (e.g. md.c), but you're using it within buffile.c
to open a path to the file that you obtain from shared memory

Hmm. I'm not seeing the modularity violation. buffile.c uses
interfaces already exposed by fd.c to do this: OpenTemporaryFile,
then FilePathName to find the path, then PathNameOpenFile to open from
another process. I see that your approach instead has client code
provide more meta data so that things can be discovered, which may
well be a much better idea.

Indeed, my point was that the metadata thing would IMV be better.
buffile.c shouldn't need to know about file paths, etc. Instead,
caller should pass BufFileImport()/BufFileUnify() simple private state
sufficient for routine to discover all details itself, based on a
deterministic scheme. In my tuplesort patch, that piece of state is:

 /*
+ * BufFileOp is an identifier for a particular parallel operation involving
+ * temporary files.  Parallel temp file operations must be discoverable across
+ * processes based on these details.
+ *
+ * These fields should be set by BufFileGetIdent() within leader process.
+ * Identifier BufFileOp makes temp files from workers discoverable within
+ * leader.
+ */
+typedef struct BufFileOp
+{
+   /*
+    * leaderPid is leader process PID.
+    *
+    * tempFileIdent is an identifier for a particular temp file (or parallel
+    * temp file op) for the leader.  Needed to distinguish multiple parallel
+    * temp file operations within a given leader process.
+    */
+   int         leaderPid;
+   long        tempFileIdent;
+} BufFileOp;
+

Right, that is a problem. A refcount mode could fix that; virtual
file descriptors would be closed in every backend using the current
resource owner, and the files would be deleted when the last one turns
out the lights.

Yeah. That's basically what the BufFile unification process can
provide you with (or will, once I get around to implementing the
refcount thing, which shouldn't be too hard). As already noted, I'll
also want to make it defer creation of a leader-owned segment, unless
and until that proves necessary, which it never will for hash join.

Perhaps I should make superficial changes to unification in my patch
to suit your work, like rename the field BufFileOp.leaderPid to
BufFileOp.ownerPid, without actually changing any behaviors, except as
noted in the last paragraph. Since you only require that backends be
able to open up some other backend's temp file themselves for a short
while, that gives you everything you need. You'll be doing unification
in backends, and not just within the leader as in the tuplesort patch,
I believe, but that's just fine. All that matters is that you present
all data at once to a consuming backend via unification (since you
treat temp file contents as immutable, this will be true for hash
join, just as it is for tuplesort).

There is a good argument against my making such a tweak, however,
which is that maybe it's clearer to DBAs what's going on if temp file
names have the leader PID in them for all operations. So, maybe
BufFileOp.leaderPid isn't renamed to BufFileOp.ownerPid by me;
instead, you always make it the leader pid, while at the same time
having the leader dole out BufFileOp.tempFileIdent identifiers to each
worker as needed (see how I generate BufFileOps for an idea of what I
mean if it's not immediately clear). That's also an easy change, or at
least will be once the refcount thing is added.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Peter Geoghegan (#19)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Fri, Jan 13, 2017 at 2:36 PM, Peter Geoghegan <pg@heroku.com> wrote:

[...]
Yeah. That's basically what the BufFile unification process can
provide you with (or will, once I get around to implementing the
refcount thing, which shouldn't be too hard). As already noted, I'll
also want to make it defer creation of a leader-owned segment, unless
and until that proves necessary, which it never will for hash join.

Hi Peter,

I have broken this up into a patch series, harmonised the private vs
shared hash table code paths better and fixed many things including
the problems with rescans and regression tests mentioned upthread.
You'll see that one of the patches is that throwaway BufFile
import/export facility, which I'll replace with your code as
discussed.

The three 'refactor' patches change the existing hash join code to
work in terms of chunks in more places. These may be improvements in
their own right, but mainly they pave the way for parallelism. The
later patches introduce single-batch and then multi-batch shared
tables.

The patches in the attached tarball are:

0001-nail-down-regression-test-row-order-v4.patch:

A couple of regression tests would fail with later refactoring that
changes the order of unmatched rows emitted by hash joins. So first,
let's fix that by adding ORDER BY in those places, without any code
changes.

0002-hj-add-dtrace-probes-v4.patch:

Before making any code changes, let's add some dtrace probes so that
we can measure time spent doing different phases of hash join work
before and after the later changes. The main problem with the probes
as I have them here (and the extra probes inserted by later patches in
the series) is that interesting query plans contain multiple hash
joins so these get all mixed up when you're trying to measure stuff,
so perhaps I should pass executor node IDs into all the probes. More
on this later. (If people don't want dtrace probes in the executor,
I'm happy to omit them and maintain that kind of thing locally for my
own testing purposes...)

0003-hj-refactor-memory-accounting-v4.patch:

Modify the existing hash join code to work in terms of chunks when
estimating and later tracking memory usage. This is probably more
accurate than the current tuple-based approach, because it tries to
take into account the space used by chunk headers and the wasted space
in chunks. In practice the difference is probably small, but it's
arguably more accurate; I did this because I need chunk-based
accounting the later patches. Also, make HASH_CHUNK_SIZE the actual
size of allocated chunks (ie the header information is included in
that size so we allocate exactly 32KB, not 32KB + a bit, for the
benefit of the dsa allocator which otherwise finishes up allocating
36KB).

0004-hj-refactor-batch-increases-v4.patch:

Modify the existing hash join code to detect work_mem exhaustion at
the point where chunks are allocated, instead of checking after every
tuple insertion. This matches the logic used for estimating, and more
importantly allows for some parallelism in later patches.

0005-hj-refactor-unmatched-v4.patch:

Modifies the existing hash join code to handle unmatched tuples in
right/full joins chunk-by-chunk. This is probably a cache-friendlier
scan order anyway, but the real goal is to provide a natural grain for
parallelism in a later patch.

0006-hj-barrier-v4.patch:

The patch from a nearby thread previously presented as a dependency of
this project. It might as well be considered part of this patch
series.

0007-hj-exec-detach-node-v4.patch

By the time ExecEndNode() runs in workers, ExecShutdownNode() has
already run. That's done on purpose because, for example, the hash
table needs to survive longer than the parallel environment to allow
EXPLAIN to peek at it. But it means that the Gather node has thrown
out the shared memory before any parallel-aware node below it gets to
run its Shutdown and End methods. So I invented ExecDetachNode()
which runs before ExecShutdownNode(), giving parallel-aware nodes a
chance to say goodbye before their shared memory vanishes. Better
ideas?

0008-hj-shared-single-batch-v4.patch:

Introduces hash joins with "Shared Hash" and "Parallel Shared Hash"
nodes, for single-batch joins only. If the planner has a partial
inner plan, it'll pick a Parallel Shared Hash plan to divide that over
K participants. Failing that, if the planner has a parallel-safe
inner plan and thinks that it can avoid batching by using work_mem * K
memory (shared by all K participants), it will now use a Shared Hash.
Otherwise it'll typically use a Hash plan as before. Without the
later patches, it will blow through work_mem * K if it turns out to
have underestimated the hash table size, because it lacks
infrastructure for dealing with batches.

The trickiest thing at this point in the series is that participants
(workers and the leader) can show up at any time, so there are three
places that provide synchronisation with a parallel hash join that is
already in progress. Those can be seen in ExecHashTableCreate,
MultiExecHash and ExecHashJoin (HJ_BUILD_HASHTABLE case).

0009-hj-shared-buffile-strawman-v4.patch:

Simple code for sharing BufFiles between backends. This is standing
in for Peter G's BufFile sharing facility with refcount-based cleanup.

0010-hj-shared-multi-batch-v4.patch:

Adds support for multi-batch joins with shared hash tables. At this
point, more complications appear: deadlock avoidance with the leader,
batch file sharing and coordinated batch number increases (shrinking
the hash table) while building or loading.

Some thoughts:

* Although this patch series adds a ton of wait points, in the common
case of a single batch inner join there is effectively only one:
participants wait for PHJ_PHASE_BUILDING to end and PHJ_PHASE_PROBING
to begin (resizing the hash table in between if necessary). For a
single batch outer join, there is one more wait point: participants
wait for PHJ_PHASE_PROBING to end so that PHJ_PHASE_UNMATCHED can
begin. The length of the wait for PHJ_PHASE_BUILDING to finish is
limited by the grain of the scattered data being loaded into the hash
table: if the source of parallelism is Parallel Seq Scan, then the
worst case scenario is that you run out of tuples to insert and
twiddle your thumbs while some other participant chews on the final
pageful of tuples. The wait for PHJ_PHASE_UNMATCHED (if applicable)
is similarly limited by the time it takes for the slowest participant
to scan the match bits of one chunk of tuples. All other phases and
associated wait points relate to multi-batch joins: either running out
of work_mem and needing to shrink the hash table, or coordinating
loading and various batches; in other words, ugly synchronisation only
enters the picture at the point where hash join starts doing IO
because you don't have enough work_mem.

* I wrestled with rescans for a long time; I think I have it right
now! The key thing to understand is that only the leader runs
ExecHashJoinReScan; new workers will be created for the next scan, so
the leader is able to get the barrier into the right state (attached
and fast-forwarded to PHJ_PHASE_PROBING if reusing the hash table,
detached and in the initial phase PHJ_PHASE_BEGINNING if we need to
recreate it).

* Skew table not supported yet.

* I removed the support for preloading data for the next batch; it
didn't seem to buy anything (it faithfully used up exactly all of your
work_mem for a brief moment, but since probing usually finishes very
close together in all participants anyway, no total execution time
seems to be saved) and added some complexity to the code; might be
worth revisiting but I'm not hopeful.

* The thing where different backends attach at different phases of the
hash join obviously creates a fairly large bug surface; of course we
can review the code and convince ourselves that it is correct, but
what is really needed is a test with 100% coverage that somehow
arranges for a worker to join at phases 0 to 12, and then perhaps also
for the leader to do the same; I have an idea for how to do that with
a debug build, more soon.

* Some of this needs to be more beautiful.

* With the patches up to 0008-hj-shared-single-batch.patch, I find
that typically I can get up to 3x or 4x speedups on queries like TPCH
Q9 that can benefit from a partial inner plan using Parallel Shared
Hash when work_mem is set 'just right', and at least some speedup on
queries without a partial inner plan but where the extra usable memory
available to Shared Hash can avoid the need to batching. (The best
cases I've seen probably combine these factors: avoiding batching and
dividing work up).

* With the full patch series up to 0010-hj-shared-multi-batch.patch,
it produces some terrible plans for some TPCH queries right now, and
I'm investigating that. Up to this point I have been focused on
getting the multi-batch code to work correctly, but will now turn some
attention to planning and efficiency and figure out what's happening
there.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-shared-hash-v4.tgzapplication/x-gzip; name=parallel-shared-hash-v4.tgzDownload

����X���[��0��
E���-��v�����<&��8���y����XRk�-c�����Yj�M��M����]]��S��~����L�I5����z�E7�/���������~�#�g�~��m���D�Uo7��ns_���z����cN"��2�����t9[��d����=�C��7�7����Fu���(��U����� �Uc?�����.F�B����xxS0����n����nvy���V�%��ng��D������?
�cQ�^�����p�w��{��:���?��)f�p��������*F�����k��aw�`�)���ju������z����+���a�rPe��-������b.gqiwG��T���sCx�m!;�x�L��aG"�w��P_�R�`�������W1����Y���� ���~n�����nGDQ��;�U�F�3���sS���k?R���?�]�U|8;>O���O�{.����Y�q��K!.�������4���������&���&�oN����(�`�;r���Q��D#�-����}������=��v9������T���
�i����r����7�:��,.�E�C�/�{��y����`o��k������\n�6&�lfVS��,�nl�t�+��O������+?��
������QS���������H�W>f<[�Yr�����kw��gk6�%<h�f���.���4@Fg�uH���+?��~��}�u�(q�=�9x���m4�}Z�I^���i��<�
��7�J��������������p�mD���U���5� �H���Lm#b����
~����w��X���[�����+�cb����?�:+����o+�m��+X5�	�������5i��+�WP�������+�m�o8��A|�������o+�m��+Xy�7���9��xt�;�1%��x�M�m����<�B�A+p�������:���4�:�����D��o�s�U�m������
}��#Oox���-o0����/����r�w+�m����i���ZM��y����m��m��a�=�Tp=�����E���'>�?;=���7��s1�P�w�����w!��_���S�;��/�g�����e������_��Co6c1�'~������?)�}	�@B\��~[~z%�yy*�=�x�����_�_���b���h�z�;��\�]����;1
���?��7�����0�?�?5�Cp����H�C�y�]~�n�����������`TQ�!���9���&���h�-��p��+�4��XB�C�J�����Sv�*��Pi��L��r�� �0F��&&`-�L��D�p6��@����|nO�����"�MG���H4�NS��z�r���Y8	��m����������'����NV�N�a:)?*��Vk6W�4����aH�u��(kz0t������5D��<����������EO��m���I��^0��m�HR$���-$��Pv��FSJ�)�..�_<b9�n�<�L�$�c�A�5�����8\���L|�%	gQ��Il��B��������{�2�D��M-���Vt�IMYII#���7��������CoN��)�Cb
�8D�&�������j�Mp��B�]��&O��|&�xq��O��Li�cs�5��S|����',�p���C(a��=��9�R����������q����_��>�.E�����W��!Yy&��R���xv^��v�V�y������dqW�{�����?��:��Qg?�����9m�5:��J�.��K��.f��A�����
 �%�i�{M!�}u�3oP?z9A�+�:nT�@t[B(��D3F����})�4������';*��dS<�x'�O�hc��7��#��8���������{�F~�8}w|�W�}��6���-^��/$�#��\B�H8�q�_����+��mVo~�z�Qu$a7���E8��u~�����$;n�o�So���k�K]go�������������,��W��R��/1o�9l�G���������>��U�
��V��e��}ymm��
f��r$��i
��T�z7O��K)F{���x��r�O�xYN��#���$��co����K���!�]'/�-'q�������?�1���d�bn����H�{����m0�I~)^�'R�f#ukH"����BsK���t.%�z;��.o]]����_^}w����Y���������g���W�W% �[R���#����[�D��/���"��������7�r�\_k��mm��q���q{�+Z[��h�,�~���e��_�?��J;"�;�)$4%�����IPc1����YJ�W?�?�.d�'�����:�7;<�%[��1f��DV)����x��q0�&;��~���� ���P
n����"�G+������f�pv��5fi�m{d�r��7|����,g�E8
����{<��"��o/N��G�J���I���g`(�n����>S��K��`� >
��o���[��_������%�F7�*)����;�P��Ox����#e��EP6[���CrK�f�AE��I���h.���ri�Az]y�xz��^�������eI�]uJ?e7e90��W8M�I�,�����=�\�6���'_cF��!5�u��G\�x!�%K��G2�=h�Q����h����SV��?�DLl|���7x��"C*H��~1����/	#�.�J��Y��	��)���G�f:�����,����yD�c"Z��:�B��j5��R3	���	����eB�`�������N�;��D*u)	��&Gtk�\NQ�"�\���[��r9y�M���Fy���3�Dk�R��THg!����(����shj��w����wG����R����� y����W�������KO`�Q���s\�H�^b���u_H$a�����fO*�J��� T#���c��0mg��h��r� )����V�������+�G� C-F�)0������[����~�`|�����PM���Q3��lv +q���
�a�g�)s��K�5�
�v�~4	����/�W�����A�������Y"?�!��]���d~�c��W@���9�{��_LL$Z��eg�e�
�}��|��PR���A����/�'];������f�;�2�����p��K��|������x�� �m�E1B��������${|����K��m������W�N��s��H^�?�gk\nI�����q�E\�-�t��=�/�������Y?,�9����p�q6EFc�w�����*!��~���O������<$z�z}���]�7I3v�\|<�|�-&���JK<�%6[]C# ��&K����f��.��c>4J��B^��X�����q��q�4/�x����7�|^	��&-����;\��/��;d�'�u�Q�W+�#M��S<�c����/�`^"zT���i�e���+I��wX�Y�T�(�5��b��4=�[���M�Tk��h���O�:���iEy%	�-^oN@����1[�2�'�[�J)J�����k�����Ns�G��~�}�J���y�a�V;h������h�����D�@t`���:�w_0���(����7�)�-�~�1�V*`��T���F���A((QK�J�5]����>^��c���b7��I�Wl%��{NM(�t��v,w,�@
���5&���M�)F|�Q�.O���~��U�������U���F�#����x�/����")�g�d��j����c���!n8����v:�,���f){�Wk��5�
�?-��,��7�lh�T�Uo��R�[���������N{��������m�?d���<�����oe�iH���+����2�Wi/�����-p�U��;�s�*���@����dQ�<��*I��������\�����{��������9�<��Hv�X���eJ����>��ORC��A���!�������	."�E�x��|��G�9,�(�`w�/�Q��D�����q8GSF+GO�Zy�>xPG�~'afAv�\�|!HE�r��������$�Et����??_N&QBn�'.��.�`.�k8����a��w��X����w����-��R
.�9"70��I�w��
�%I�*�������E�C�~Bu�`.XyB�.�	Q:����I��3VE�!�yJJ���k$t��m��h�*M�
<�	����
J��(\��fJy����`��N/#�TQ����?Hy�V�MU�^]�������E@3bT���q��6H��a��eP!k���6H{�>�	"�k��V%���tv'�D��_�HJ�
�Y0_�wx&B�:�=c��d��5l��C�A[}�Bd���&�Q�]0Rk�i;�vj�����K�����.��P���&L
g%@!�	������j�|�x5���������_������g���������[\�Jgw�}�X���i{,���l���]�/O�e'R�jd��6{��^_��l[�R�`���Z�Fp*��e;"��q�U�
SR�;^E�{�{f�0WQ�[����F�=�Jv@��6�p����$�X�&~�~)/���Q6���5�A#��sW��@k��5H�X�[)_��[�9��8B�m���R�o�krO���h*Y��x��_qZ<���
)�,�����4������s�����7A�L*��~�����C���M������Z�e��z_'�,����(�P9D��A]9��O-�3����f$A�Kp��b��U��c�'�dHA{�O�fK��x��fr,b�)k�jrr�3��z�����v�lV-Q>��� �����*s*���)�n%%��%)����.7��6C�����!�2�E+d���������p*����S<�����|�X\��������:B&n22�t�6��i4��yJ�5}{S���^�����\��R�L���Rc�AN����	���/P�{��3N��{B��E���W��cgE�:xx�E���K%n�n4/��c��U~�sK��KsS�!��SEe%3+y���@o5Pk�w��f�~PWFq�Xw9�\��9�S�k��/��|0A1~��"�������sw�������q�8&y���/O����k	��H��������$�����XT�bu�\��(�����}����!�6�,��>�<�7���i����?��w;L�BJ����F�	=c,�
�:Z(����]��j�}o8�{^Z�Z��������A\�w�ws�*@�6�E�o��F�&����(�|{b#���ZMy4�������vzo��8�;��i�~+%���6:sf�M0\��Z������c�AjXU��Fa@+�?�n7Y��S��-��W��m�?d�������uX���0�O��^���������im�i�4e�������~/$������Sr9l���^v7d�������c)��~���JqE���s�/��o[�]#�P�8��!:��-X�S�Q U�aR-�0���^[�"o�#��Ki��qK@}<��i���w�f��sp�A�*��x�}Ev�hO�X������^sgg�p*:�<j����������U�������Ad��f7�-W*��7�X�OR"��#X����\���:�U���Ib���:������j����������
��7��8_ ��P���u�}+�;��JJ���'q��$��n=�L#�r�C��#��y��<��'�:,�6s��;no\�cq�V[q�f��6Q����K���Vi�L��R�UY����h3���U��5	�2���Ps����Te����Vt�A��'H<o�Z�h��I.�d��VXa��!��Bh#�1%�>M�������s���HG�m& �/�(&/�O>�Z �6���p�A��Z�
�t�_���W� (g	��;���Rp��������8A�WR�U�T�Y��{!q����d�1�C���
��=�OXp6�'�_�`>Vj�gp�?d���Br�8���N��q��O������Q�t�=Kr����	4�B���'�O�)b������7��+%��'�C�yL���&"lj�#��,��������p��Rgi���R(#��<^p\�~�"��]����v�p�����cps���)�@e���f��d�2�S���Bs���m����������,40(��u�g�{�7i��}uM}0�p�=m���o�{��)}+G?����n������t�;�����\���7D�1<���@��q�:�`�@+���~'��k�����~��D��j�Z��5���7��np>Y����X��][�����7�o!��e7G"�~��?E`��@�����]�����Wi�_/'�B�
�~�$���%]��=D����?/�%f+31�y�|i��9S��K�Of���x����q���o�����%y?|I`K����S�\�d�+(���>I��p��?JFi�tb ���p��C��z)�>pu�J�z������	�����'1���1&���~U����7R"���.z��#y�����TY�F��He-
�����]�	5��0!�H�]��Y��&��31���[��h�Q��J�pV�Q�s5|��N�e<�_�@t�k��x>��v6J��J�c�So�IY��X�D�:�!Q]'�N������@rX�������&�G��� W���A�#
�r��6�������N�O�o�7�B��}�*!�-Z�v`���u6�GyY4W���
�.�������C��������D�������X����7�]�a��k�G��'$�[!��b���rK��+�Z���_��
�]$5�
"����=�]"]���9��l)6Bjw	��v��a&����u<�l �L����=p��KH-b>����f+MQ�&{M�c=��&8�
65�������k�G�t5���a0��o(�"�>�n�KE�`*����������kJ��!,f�Pv)D��X���~�B�HXL�z
0��M�x����Z��8������yM2�C
slbR�����
)M����
&9���1���nm����a�2�P����m���G�[N�?���o!d����	:��;I�
��hS8��O	�0��M���G��Cn��v��L��V9�����'k%��3D��D�S�v�;}I^4Cf����K�[����K��da��U}��'��\R<��
�"�<#��HQ����*}E����?���>��Cy���Tw�}3���Awaa�'u���K�Yw$Q\�o�%t�bG��j��������v�����,k���?l
2JP<������S#����8d0�%y�5s���ZK�U`f��u����:��S�e��qb�P��O�����JqCz�I�H&c"�N�����dsf���D�4�������]���d����I���o������x�E�/�����+��6]�������t~����r���:���{�����j��&L:�N�u���G���Y�>���
8��A�������Q�����/��\<��R��V�S��*e��W�[��4����I8�&������.e�h���I-�_��8��@��`R�P��Qp=������������]�n��Q/��!
�K�L}�E�$��O�_�z���l��(��:t6����������d���b�4�Z7�Bt�����}��@���_�f��d��G����41��v�V�?������H��IV����x7�,�Y�wL8��\~W�� )rO'��n\������zc�">PV6�:���k����H�w�p9��_W	���+����T</r��|�d%�] ���,����3H��>@W�jr��0��F2	�d����
�9���X���!F�
DzpJ!�A�;�")�zb~���{����%I>�PX��$}�B���A|3=R�	/w��/T����e�7[p��w~u���+����F5��Q�k����jj`5hq��2�:�h��7��?��
�#"���c�}����o��A'�[/ ��c�0����o�{��$D����W���#��7�n�����"�`e>R�q��'�?}��S�#��%�/�)��?��ki����#�����]
����o�`b��<�������S�5��.cVL�GX_8}9���.k����OOB�����������*������Oz���[:e�9<�X����Mh�`9��.��,&4m�N���2�T�j�7#(������G�q{f�
^�38��ns�Y�:�5��
����+b~
�>�����-$)�}I^A��xUW1�/P�-B�n> �K��`6!�2�����y9��{C�9�\l
�=�AW?�������E������]�1+�0ZMh=y�A i�"��q<���7�r�������o�~G��J��Ri����N_�����a(/s�C���[��h%��+��/j/1.D�N��D,Az,-�A��S�fJu��s�d?���K�w���F>��?� :R-$P��W9��������/.N{�O��*��
�|^?��G��� �{!�����$	��TzX��yA�u���z�I�����Z1�R�#`i
����������3�����Z������!i-����C�P��n�#�$O<�r�$��fz�)��jz#�]�&�[+��
[����lL*Q�B�_��4)��vx�����/5_p��B�Ux9����I'
�� ���i������m%7���3�b��*���)��	d��P�;��������"u-Q���f�t�_w�{���E����	����c.#`�����6]��]��y�64�{��}�qY����N����*5s{C��m�0��,��5p���|�	�A�#q��P�4������,P��[@4����M:�F��t�I6���(H��4d�����'$,�:�Uj��2:�WW�'oK�6����/��.|�Q���5,4�I�7�JE�h�#�`GP��m�+Y�K@4Q���$����h�]W[+X��w��K�>�-��#y�"CIcp���-/o3�\,�
������M77uu�����4V�y`9g��������0d���r
*�5��������g�zj|6�89it�5Yfpgp�&Y>�����W_��$9����$2��1M��M��\6"���U������x���4��$��'��*����Xm�z����� ��(�$s0�vR>u�����G�s����s�p��]����)*��
T�<�����P�$�ys�	io���F�0O!#�4Y��8��r1A]�1s�A
���X�:%�q���f����/�Y����j�y�l=na�6�L������0p&�}d����r�'�#,��{�u����w�_����PX/��G�V5Qxg����YM����ftJY��w�����w��D#�5C�*�p�%�[�q��D�[.�����r����I��-��k��V�s����{�{��7��������������#k���)v��I�f=�W���A�2I.�/��2$�K�C��>1�*)�O8�0.�x� ��8����=�%OCy��e�L\���)/Su}���&�<�H���V�4��0�����J��^�Z�t)EH�@D��C����,|fimS��HO�5,�A�W)��(�~�-K��}�\�@�1
����XG
��	�X�?|�Bl��L�\��u����Q������y�������0�U���X���`��_�V3�+��[��_����_���'�]n��i��{��:�f�����A�`��_����_�ek���Sg��������z��S��
p�Z�O#(�S���������
@t��������&o�p��{DE�`���h�+��
/&'l������l�V
�"���|X�C�s��V�	�;�n���t;�a� ������Z��^�N���������
�en�.�@�u��C�,��y� n�����C�x���Q�������r����?�a�q��#�L�����C&M��L�*;)N�z�����_�bs�ZY���H��*YN��[���3�S�+��h��dS��$~���~q�V���x��R���	��8�;�
��� ����C|�;U�5���Kn�9j�:�s<O�����5��/�����:������0:���5��>
��Vp����?�Z1am9��5�
�2Y��!��c�
P
�(5��3��X�3��]];o�/t.����`�V���X�v(dF{�����?�_�w�?���zS���k�:��~*�K����k��e0��=��u��a��_����^w��k5��^���g+2�Hs�����1�*�T��A�d�;M�w�+'����pf/@��?��� ~�d��m�
���c�i�;x��� G����[u����R��0Q�M�q�m��������S���k����$C����1mX�V�V��~�1���Zm�ow:^F���~V���%��=��r���hk2��'���R9�� ��x#y��z�n	�P*?��wu�����jS���Y&��r��2Z}�5!��q� �*����&@��Ec)MD��K�w�c�G�0����\�����*�N���������vgL�����K"������B�n�r����}�.�Ys?[lr��Zb��@'���`PZ	���g&�	%�4�0$�+���t�~6��D?�����^�[<���M�}��k�����O��-�7�(���$����#�r2��
3}��������R?^g+������j����7�UZ�&������	06��x�a��f����w/�h�&Q���JXH�LC/�&��;gJ+&�(&�L��w��/��n�V�x�0kv����/�rs��i /��4H^��_3��cE_�d�D:���B/���|i����E��m���2$�~�)d�AY�#�
�I�M���'�M1�����~Kl4�AtNT�L�_���c����9�����(wZ�f�}5��|	�p�I��L�����$y�G^����k��T0����)���?��fj�
��'C�0�C��8,�_��BgAc9����X�W����z�U;������R��0�ina���C}B�M�����p�q0���$� �b��F%���]�N*Z��$��ZR����<Df�-xAvmJx�N�[l,�\�e��,/h�`?,�Q#8�-�&�;Z����d�Gc��yT3-w��0.���t:k�0\�
C���k�����&��u���	9V����<�H�a%��2_�����6�->�S��COT�����\��}���	��I�b}/���!�Ud����g1u"g.���px����Q?���Cw�E�HS�	�04]N�UX�qZ��N.|N������%t�*���}9��?�4�80('�G
��o����K���;=??=��X��]7�W#�&�"{'7M0�%��[aJ[��[��H�5����tX��P�Z�u^�,n6�b1x�NPj;�h�����w~E����;�z����i�l�99 ��(�]��[o���o^'A�����
���7�^J�aq��I�9�����J�sR�~��^�.O����	�<�2'L��Lo��X^*�Bq��������	��dX��Tex��pm`w�����?�p"����(e����$�P�{6����5=�i���I}<���	�)#����:����m��D��jVG�W�@UJM��Y�)3q��4�c�q�'j P	0������$����K�3`s�J�D�W���Mh�!c�r���7��XJ+�;�gX5g8������;]�.�|��z�
\1�vTY�Dqx,5ue��N��E���&��u��^5����0����qL���"�;
�u9'w[7��Y<�����������d�5Yp�{'>����C���k�@CdQr��bI���H�:����W��y�s��&��K����m�B���R���&���3�����-T7j���R^�p�YThDHX��[<�)�Kb����F��S`��	��Q��A���*���\x7x�
��<_�����5�D"`T��BP�#��b�E	�����3��r(����3��N���>�������x��[�
�=$J�1���w!�~�}�J���,|�2�B}�cU����z\*Qx3�������	�=���J��l2�fU�-Y�Xk���D���Q��QE�c:��U[U �\kl}[�������v����p���l"��`��RhD>��!C�Ry����x����(�<K%8UYR2Q�T���{d0%Y��&^�TU��@R��Y�@W�|�m�z�.:	/���W���z�?�(���K�f�b,�R�%nji�"K��fF��Z��MX�!HE���wA����U����@$E<�?/�p����V�"Nk�������O����0���0
����Y1�rw���3��@.54���X�����4Z�wm#.G{&A���j���di���� �) �(58fC�t^/���gE_:�9j�`N�sW����HN���ck��@�}j
|
D�om����������� ���i�>�7([|�K�[�r�����e��'8�'�O���P?��e���=1��UA-�S��N0��*����������\��v������8�SBj�#�=�A��D��nW�e�&��P�����.�q��r��"���K>�
F���3x�N��HB?bj)C��Q�&���W��mB���GNsU4�sQ7]?u�����L'S:�[�����j��9u����n)��4��v��/��V�
i��������y����f��jz�4��W���zUA+[�O�ET�{��
-�o���qoESb3c����}�w������.��+�f�,�D����4�&�]Q�m�[t��>�?����������f�=��H��K=��E����P��I���g����d5���3�H�3�����=��$�I�-�^�;0����z�{��;�������.��(����fG	�e�����>������������r��e�~j-V�
P:\x�;�UJEdr�m����/��$nJF�J�#���;TP#��%�P����(�r��c�F�t�j7[��|�R��~�f?��	��S�5������Z�m��W�r_:�F��1���J�'��J��D�U��
;����� �vP�R�JI��U":2%tA[C���������
�0����4`��	�{R�oA��m\WaV�\d��[��;6�t�	����E@^��b�%}tv���������|�N������?J�����J�p'��r.��n\]�Im��V7MO��W���"�'>�$J������})E�w�1~jpr8�	���YPLi����A�oJ)H PZ��*���b��$5{��QO.�t:�-���A�]E�1�H/��v4j��J_�
Gi��H��W�=O�j��<D��-�/@���=�+���%(�L�U��f�Z�A����L����[D����6e��	Q�f+` ��J����
5�(��KP�(3��-�=�b��w~L���}h7P�A�H'��)�r
���I��:�ue)�j2�m�6���>��F���W��"e���S�;yU�������2O��(I[ ���|	���2/��_��(�#A����I�@�	�[H���w���=L�#�=����9.�?/�\*6��������������H�����S�wq����W%��������kTq}wq��v�����^�u��%
��4�.=����
=��q����Z:����40p�~��}o4./?�4D�����V.j��W�r��J8�������
�#��U����[���6��,�v�b}F�Ix��[_8���3u������~�<��y�\�E3#�����4�J�X�9��K��-�mp�{p��]�����
gF�L8�'��AK��F5����e�2u�w2N�Fd8���-c�
�q-P��~N�p}>�j��{���j��y���S����\��Z�U
�
/]v��|w3��
�e�C��{s�@��X!o+|�y^�mt*�))���Yz�Q��R���|�����������25VeK����K�$����V������5���d�J=��J�kV%��
(|��{�ZM�Hb����fU�/r�N���LB$��$���B\�R���5d}Y�%�i��3�p��t��\`�ab��x<
�3x �a
�t��M^���G��4!,��n��M�c�R����<}��-��������t��h���jj�(���Lf0�^]��R$J��A�"|(M��./�A!������p���F���[��dJ����d�H�h�}��Ey��c"���*����.�������9y�$wsG��+�f%�QE������HNGHW�1
Wx�\�P����)�s��E��nN%�P��t.7�tg���>0���p48/�X�{!�["��f4��-+����9��E*9�VJ#����lG��'����PU�g#��t��;R��~�S�e��|��<4>�g��db��u8��O��u�#��������>/���)bb`��gyR&�N�[i4��u������D�+L}y�%�������&Wq���w���9D�D�B)�����$Id<�msm��+�,Aeg�5=�r	x@/���51|�{�R7����M}TD�Y�c�eI��%;���~��'�+�Y�[�Qt���7��������+X���r��e�B����`B�.f��?����-����/�U�!3z�"��>&��G�"���������[(�J�%4��5v�bFgo��[��V������-T[Q�
������������|���4h���1���R�O,�i�����jdx�H,%��80�eO��Gq7��p�s����"�d4��q����0Ol��bd����=p}�0ZPE!)�V+;SJ�O��<���T6D	T|�	=+��������}(�,l�� ��6cP�S\�+�����=Lg��;6��A�u>�x�
��6�|�32B�w����h�*1�����P�y��}����Y�����u0��m
����aiy�Mr�k�������*���GMCl�{�m6��Y����Z���u����6����s�5���H�A�?"|Cc�����W�{!�]J�[*��6�(�"[���l��#��%��������������.*
�=h[�{�
�i;�4 �����H8P�M]��j�R�%����P�0GNt���R��#0c>��2��Q� �YWT$�P,�BBF>,�lz)�*����{Q
���W8�9�v�����
rL���Uh��D��pIm�\v���t��=u�GN1��QX
�X,��S���@��zk<��v%U�D�O������<d3%^����x=�;�O������=�P��a{������)K0�^�(�
3U��!�t�����
�,��h�����bZ���>��w���t<��CD7�XB>1�������p���`gJ���q��58��s��M������-��?��8�t�7�F���`P~rVu���	���.�^�����^�|{qz�=������gf��7��d#���������W����Q��r�XC��@�����iT�����uh��w8�U9���D91�Z��J�C)�%�=��q�������3���OR�P;L�4#w�b;��t���`�OT����������c�N|���.������@�����YoE�
k���.�+��J���}T�D�`���/r��X��m��n�,3:�6��*PzA�Q�L��V��cga���19B��Est�����h!�����)�1[*�����z���R������7{f��]��"C7v�+o�p�A���dkL�[Q�u�d2
�������B��~R�W��B��h@L"�~%7N{;7����@L��O
6���s��U��Q�+8A�����r���O���sLh�a=@����������J��}�o US�^R����y�}��=1��4�T�kR�	U���Sp�c�{��?��F�a=��y��0������g��+����A�3N6�������(^�xC
m7Q���@���^��GX�yp�q��LU�5�����+	����Z���2����x�B:�;���~�F�~P9h�%���}�[d8�i�P5I���$���P65��I��%���=v��SU���*'�g�� �������j��d3�lAH���%
�g�l��
_D��\N��H����l�u���r�_S�:�����u�����9+V�l��r��t9�r �(�U;�����r���~���QV�3N�{��-��Fr����{��*�O�2��E���r�pb�
9�XQ�B�$2o����U��}o�C�U��=9���_��}���[L��L�M=�W�7��~��0
|��`�j1�]��v�x=���9���J��('^	han��V��Q~SA�MvC6��R���)=R"�2C�T�*��8�lk��~q�b'�K�n����������15�������Y8h��\74��FB����3n�r7_.�nn���8�r�_������!�O���'JU�2�|���j]a�H�������S
D���IZ��
��-�����#sV�r�i��.�ufo{3�����_#�u�4G�J6�A���B�m�����6fXAEJ/��Kk!y���-�BN�ad�Sz4������6zu6�� W5��$'�S��{�Ci�$Hp��!J#��������@�B��4��BZk2f�l,�����o?���;~���K��G{��m���#:Y���pA�KR��9��hHs�H�U)��;���������?]}�p����K��-��]�e�h6:��h����+�����J���h{���Z�dK�E�L��:h[>P���K�����$vt��a�,�&�W�m�k����]K/8�����(rI���6&jD|t�KTC�����bw9�q�)�3�8�L�����
ru	�zO)I��r�4�P����?;R�y�wS����Y0*���������.AAl��d����)C��eb�Y��6q�-ywz�����;��k��]c�
,Y�h�3B��8����0o���d�{�wp{����Ky�H���I���Xy�U�lr�� ����F���o���F;��������f��t*�'������Knf%\Yc�t�����bD�z��#I�5LCy��9���p9/[y�q�����l�F<}��~�>����gk��������-��	T����h��e:�p�f�x�h�W
�s�����J��lh&zb����']�Le�
$����>�>A��`���N���}�Z'������#Y
9j�� i�O������9a*������>�i"�����&�������$xx���d-7A���R����b�"�'&,el�`|����!�������)l��d<�p���F����r��v��M�I��p�Q���[��,���7�k�i���YB8���L��)�p����{�����%{�1k��@w]M�tj0����pUEt?��$+��K�*y��]�bD�\RR^F���-���G�.'���e�a�oO���p(�(N�r���s-m��4ly)�����:����IX��xx��oy��Q6�[���G�8a���#������9����E�g����t����m�i�cP������U�/������#��w[���*�1\I�?�����i��snMfU���cvm����r����|e=2T�+
������n�K�s�G��d?N�A�.��y�����y�:'������l�����]��'(n(�=j.��'�;��\v6�/�RzE��K�������BW1|Qcp7)�t<y]M�z'C��	P�$pa|+`���[pP!��|��G��W�m�`�iJ�e��u]� nte�D��v����,6�����f>A����w]���o��S�Ao0�����m�1B�8}c��:^S~��j�b�eD�����
*���y-�g��7&�����%g�-�B�tB$��H���i��ws������Aj��?[��b��b�%��h������o���-hC��{�^aiPf#!RJ6D���\�*C���j�������IP1g�[F'��2�`�(�TSU�Z]�	�;�E�&[���x�t��k��U�����J��8f��4��
63h��j&gG{���95dt�7��p�n�u���C��~�
�F���D��z���
*%�TX����Gl=�'��H����D8�k�D�\��)��t*C����7^�������z|��:����{j��
����+';�H�iO�|�E�]�����&��%Z%��aQ8�:��v��C�7��u;m:�Db���{k��E'�^G���+3��n50I:
��*��K���,�@*;(&��HJn������I��\��R�[MZ�E���j�@���jv��d��l6*r2[[�M	r���	
����;=2����7��;j�YW�"X�qu�auJ����gS4���!�����D6F���(��2�e�fX���BX��[��Y������Vu�����
sh�Y�>�oO���u+�O*���h��~�����P����r�8vGf@V ���u��T�R_��ON������D�f�A��������t�r;���o��5a�)HV@���t�����L�(��U1�.+�������{ e���S�4[�I���
�D�4*(�������(^��/dWb������I�����
���Nz�PaT�*!D�pUv���%E9�J�$��`�-��!�?�Yc_��E�Tm��Q0�`v*�	�H��aF����e?���%3�:���O3�e8��r�e��1����]%��tFGE�8����%N�GG@��8;,(rny�\� 7+zyR�j*',�g����1��
�0����9�%Ui�};i��������q��:6�p�-I!�����|�(�-L/K�b�sTh��_��2y�?��Wo������kn�����U�P�E�,D`CRxQ�E�!�r���[	]_���U��6��Uvt ������Z�@�0�v[N6��QVD�<W��'���A�B`v���D����m</��PYV����RDHKZ	v���P�hg�����,�!ol���8�Fg/O��K�������������Fy�`T��=�������/���4Q�%�8�~���;V���y����W_Y����@�&7"r�	�����*���������X��S�)��
Q	J�9���]�2b*A��
eS������qn
C�����[��`A����#�P=���i����s�,�lo9h�9CI
���@G�fD�����atF��
���a���k���||5����������!�<R��K����!��8�c�YYs{-)hE���p��r�nj���EJ���-�ra�6p�BbF�=��KX��Z>�iO�P�{F�R:'q3s�_n�f�s_&����V�4�(mS��	��0I5���A6�����> �p�aQ(`R�v���y���3~fj�:������N[�#(Cq0u����|������)����S����&���c����C/��-��rM�
�2��~����!y���%w���+:Zq�R@XBp}ft��I�i��G��T�I���Cy~@o����	 s2X'
Y5u`����X�3�_�!�������7��I�uU�w��	��X�O0�t�'����1_d�r��;�_�fN�{���Jtl}�r�b�N���NZ~"K�71���YDCDs��H_��
����)vnz���,H�]2d���{��Q����0�q����YX�������}�yc�N�,��Hn.���vO\��Oa (�%���K"�V3�i81�k���J����j�y�wB'�
���&����
�p�4�[����&��$��W���f�b����JVA���+5q�h�19�V��J�N�4�"6����C�����GGV�l�6��=�%#���y:����k����l�����{ewn}�����'������5�X���ySs��+[)|����3�c���9\�����P�����h�)>�x=�2��o��L�!�<����M�����0�-�w���nH���|V����-���D��f�$9�3�m�]w����x��GW�fb�����0�:�v6DQ}U�I���*������}����y���x�����%���S(�}}y�T�+ETr�-�r��p��/gp�h�T�y�e���c�Wm�/x�OF����dv���^&��C��������k��e�,%�"7k��E���������z�����w��b�##�y��?����,�C�J�u���Z��,��-�{l���������:",x��P�i�M�^��<l�j��f���D�^�����ju�������#���2�7�P	���x���v���=v�����M�y�<�_�%/�`�M�)��/�A���z�z��8e6&c;`:*����F��0F��;�����7���������&!X98����r����k�$.����[����t�U
���L����c����dS��-�����p���W��1^fqo��szv����wQB��S���N�V%����G��P\��j�EFz����I�NO�=��,;TH��&o�N���
D��v��eWo��(Y%Pq$H��Z���:P����[��@{SQ�m�����\m��*�|������lhV��
@��H��@�1$8W�Q���|O����k��O�J�n��Q�g��S����e��d�TY(�T�#�t���np�"#�:G`�b�|O��P������`���kS��vQh1�F\RU����l�Q$�o�)��}[�5?\�uz���1��* j>�q�i�0\���P�+��1|P���=#�y��7�dd����K���r��o����n6L��\�:����u���<���*�p� R�>W�;����9U�rS�������|7����Y��9~<GW����,1:QG���4�Z@rMh�hK$+Wg�X��/�8���<z=����=�EA#���������W�
P��o��$w���#U�Gz�e����J��
2'/��*c����_N���{�����5p����n����igc��$T���qp��g�a��N����9[y�T��*m���x�-��V�����,�<��-O��"�a�O��B�{rI'u�?�>����%�$+�k]��"�N���Cq��O����e��-eZ[��DwC>�\��\�"�[1_�{�Ym�k��d�Dr�U�T �[��Tk/`���lU3��A��|;��O��	��������������NS�.��r�>�b]��k�fV3w�����mM��R��Mv��N�/������lb�D&Z�a%�FD�a��H�����>���6��2���U�~�4I�28Z'��\D��'�m�;�I*��b9�����6�fhk|[T�x�SL���M�����R�+_���\���D�~�RHT[@%�x��[�5�m��(C�Q�><lU���g��8f�
E��g+�q���X�O����[�;er��#Zb�o�1~��6��dQ�K�.�k��Y�����������If�����b�������K����n�\u�����.��?����YD�1�������>�O�����:�g:�o�[�	�Ac^������v����0M�+����K�-��h8��:z���H��{(U%4%�a=�gD�:�Y�2�l
V)6j�Q�.�ngX�s9w=3� FV���sW��qx�^����T�������2b�/@�+��g�����g���V�n���������^��z������*�e����2>h8K5��a&S���w�X0|����Z����B��O�������N}�����V_N����7��	�+�
(�"���������l�pO%rE�)�����*���
�H,�n����E����I��V&�o-oM"J3N Z�����2��@�e�<,��\���'�p($N��fA�8���(���9D��0/�U�#�Bm�))��>�4�|d���7������k������e�3�[&��O[`�*l��t�a\\���'�z�2��?����}��~0��@�d����ku��8>��s������Y���e�����^D��u��z	��� �u����1xH�(6��t�
��a0��h���g�����|�!�TG,����.�*��� ���$�e&i:J��%6�G���D��r���r��I	�v�<S�CEWtf��i:c����I�r����|dQ�����0s�����:8�@���O���:N�
�<1���r3�M�x��0I��=^�������>���F��P��z���}O���3L���� �%���0^��w	C��jl��K.�	w ��-&�Z�����b�,��gzi�SV%�������������X8�-�\	�y���f��~��y�����g�!���?�D�������9-��u,��n�
��=�������;���vk�y���h���4��F����f��������f,��U����Y^��>�"_���7������?K��]�������_HT
��������w�_
�K��J|s��#����������/�_�^;�;#"��?��d����C6Af}��C�����E�;}�3���=&�!t�]��+��R�r�>�7�aP���d����]���d�Od#�Z�{�~U�S!�0��3���W �a��>�N�����'���\����n�C���S�d��Z����H6�)R R���-�-
}s`Jx�����������:Nl���;'a�7��p��
3��/�[`����Kxd�6Q�������O1�7}5��\N�����y������f�������QP��7d��O�_"#�I�� G~+�
��u(>��A�I�I��a�D=���>��B��}�c�=hT��G�s��$�:��"	�]��|�4h��V��'C�2���dC�D_z�k���5��+0Dk��<����/bE*c�@Uu
 %fE�c6ED�N����JMq:s�����OZ��C�"��C-@r�{/
HTf^]N�����X3����H��~B�<_��h���fz�M�%e���_E3)�w4A�Tb��7�/��b�@j�p|��u�����9���M1��/������J�����"�R@e�U/��������H �
(�d{M��@F����G�
g���QE�_pB\XV����xF��J�:�0��W�N6X���eW5b�;-����X��%7"���)�X?���#�f�0�gn�������>������:q�����&�x��"#�YW9c�`���/�$5����4	OEe+����MB�PQ
�[NHD�mz�i8(S~
&�3@2cAL��d:�Q�9k	���R���(�U�+�+�����p��K�Er�"p>e�B�g���/@��k�
%�����k�3��5%�������Y�~*��Z��W!He|O"�T�0~��J������m(��Y�*}&0;��@�7������
��/9�l~YJL��W6��#�CxS
�+���
�p�0�5E&������'�2RX��IM��Z5����k��T��}�"c��@�C?�'�&������b��pu|Wz��'&�G�xaV�nt��k��~�����
�&�����(ZN]0�������Y@	d4fGb���H`��O�/����I9�wq�;��=S�'�4R)��PW���g�.S���)��U1Nh�l������@Krr���m#���8����y��1�q��m02������ ?	���TX�"W��&\��6��)���
�����Y=5s�!�Sw���6�Q���������o��% d���m�e���8�
!\�$vSS��Iz����7��Hwrf�8���%o_W7��!N�hm-
���JKc���f����n�V����`m-���J-����Mu�*��$�xx��y2{��/#Q����,�Oo��H
�Q�C����:e�2��?	F���A�D��Q�'��a��9�:
>�k�]k2O,���H�����}�*��{U5@�H1�T$������h'�uZWl��f����I���`�������U�Y�zn"(}q������l��<dZ}��Rj,�a*����[��R��%_9I@m��t��5���:�������"'���s`�����q�8�k��Gn���:V��8iC0��>����G0�>��GL������r���-��B4�
���k�e��R4���{Nu�}_��He,Df�xYfcQT�`6qH�\=#	%���s�n	t"B$<�z{�n�u��ZD�V�'�LE6U*"K�t)M��������c�l����
��i���ic�*d���>�@��o��YK>N�!ka��v�����t���J�I�QT�����H�����b���h�������?���R�?�h7�����y<u_�>����s�o�X>�c����	C"([��JX�3��L�s@y�i�m�����2�P����d���)�,x��DA^������G��.�2�9l�*�����l`dC#�Xu�8�����
!�4�nmUZ�
��sn
��D`7��7F���Ji���}��e|�@��B��
;����th�g��X�?�Z��t� U���u��B	���.�y�:���R�h��X����>�I��
f�����9rG8�7@�`H��>���U�UT����S���-'�Cn���%��P��K7���|�a��q�!�
������������q�������vk�����������w/�[��pg�|��?��[
g���{a.L�FN�-2�F� �c�Y����(��XU,�g���MWp�����H�n�H����l�M��/��+vRLX�(�A^@!u��w���2K�a�
��^��"�����v&a���wl������������uK�
�U����e{-i.E��4<i:J�y����KBDi�t�F>U�A4o�
�x�5�D'����C��@	�	��P�`!�N�[in|g��h��|��
"��i��r���_��Hn�;V>��`mq��<�ndV�/0���H��&���d7Pq�1��j���k�*�A������	$�RN'R����M���pw(���C{G/���M ��l��L��>�B<��������X���[������"x���?�����|�=P�KL����*��.����-3�~���W�Rh��h�G����m�6�q��&t���\J%}�#�:f�+�-�`1�����_�u����e�H�Uh�H��:��i61�5�����) ���u_9�R�o���k���}G�����^��F���l�:G�E�A#�)yh���&"������l�fk TV�a$��l�+�b���&}�Vc�-C��gzH.8�Hn���%
F'b����=��x�4��M��3p�0���������V��k�0H��V7 �?_�v�t�����N�v��~�p������l%����~������4k�/��b]'�.���g����
����&�5��,"��^����� 0�H��x�
�	�!9[���9}�YYx`�P��+oLdq1]�3sX�Q8�F��7��#;����
Or4����J� Q�:&��?Y���������&�Or��5Y?s�]i������%��5�<Jh
��BlDI��E���f�r�iv8��f�������G�u[��&a��j���?����� u����^s�\��^T^������O�l��WZ�l�_�k[�`�����%��z`X/]h�S��qDv�+��8Cydo��b�M���A�WP\F�S/^��7}!2W��Es��\^�S/��k�t�I���By~4���q�V�vZ��a+��g��&�Y��qj �?[�)��c��[/���|X�=)���N�-�a���R$\��������8�����O.�f���4-&�T�t�{�����\��r{��������	1�+:<��_��9����������cnl��.z���]������o����$����up�z�&r�et�S�����v(�Kp��@��w��sv{����	e$��gX�����h�`��mH�=#B�K��p�]�{���borQ�)2���	��q�j������� ��4)�nGMu����y��dP������}�N��X.��'sy�J�1���7��/������RG�E)#���{���<��=+r����R�(��89�W����R��`����?1�U��j������r�p�@�A�C#B���wQ$���;����v�h�w���"�Z�����A�����! �ARM��: a6��	��Z�D�!FY���>f�8����]�G��4��� �����)d�z����">|w��()�e��"+�>�u������n�, �Y�KO.!�X�������:��B��hxc�V�
�r+�W�}�	�M��@�(��i�*�Ua�hET_�:��;��"�/�Q
fem�d@� �:�O����!$(��h����x
9�h)��t�&'���k7��-�c�?l
�$<��vF�rQ�������Eww�&Y6���!��H����`9�;#�}[���"9o�6�/��JJ���%��>�I��j���:|��<���B��.�	g.�KEt�; (���W��{+v��7^����T��4�*��Z��rF1�Hq��V!�����Z~T�\l-e�&C�h�T�Ci!@������k
�I�f��Q�&I�����"Q7R�Q�{��z\�����9�4�����dd�T��p�(*�A|����#�|�o.�O�N��Q���V���!��$2�X�����'�����\%��9�:X�����J�����@c$�2�j_����Q������"��������t�3KRe�E��$���Fw�������:~��[JL�<�p�D� nB��sD���J0}m���o���*<mB��3���:����)��'�*G���#�&�K:8}�W~m�������y��> ��e���9��p

������B�������P��p���A� s���nRb��~��w�S�M<<2G;��Vi�z�;�����[�D5a�'�;`�PeN0�
�1/U�M������LJ:R1�lK'����W��eEs����;�V<�F���`U�0}�
��&��@�#Q^4N����L�6U�%�O9��(��J����o�iu#�5}2`C���Y�Z�R�Q��[R����@O��f?�03/Y�$p��dlZ.&��Z����6��t�a��=G�%rV_��s�A�G*��q�dwa
��b��[|-v������ETl�jR�S`��<L8��_�&��B��5=�G\wF�����-�jU������@Q�!O
����2�D�K;����Q�
�E��tI:'�x�r��~I�j�&G���$�k}p�;y�CI�����,"����./��:�������a�����%�����3%fU�[�*^�t����
�[9,���>��	",�)%��XIQT���G��+�JcQ�,��&o��"�|��i��]+����d���?�<B"Mu��dFd�0�lm}��9���3/��b����utBwS^7:[g
����B���^���������c�9�U1�H�A*4���zD��s������by��2i��md�w����l>�	l2�:#��T'N/.��jwU��as�H����NP�6�xP �rs��l|�i�7Y��m�M?����k_��]uh���[��*������0�4����
����sR��Ad���$�!%}�e���O�9
��%n���BO�@Pz���'�$�)�:��5[i�(L)J�����\qpgb���������������y=���A�Ffl������
Z�
�mF6ig5a�t��z�-1����_��p,h���-�a�������������v������p
���G���j�J�&�����*��|(����,��������:�6'_������z���]�������6�l��/���D��������N0pO2�,-!��c��Q��[�U���V;3s��5pwUu=v������0���u�_��)V[�):�� �i�4���n����s����>��"�y~���R���d�<<~	��=K�����p���10����&+'�8!�Y�l�$�����F�\0�������1���!�Nl�|Pff�7�����D��!g����&��Ct�S0&P���D?O���s�D�r�L����h,�s�z��K��C�`�]&z|rp�j�5�h�	�-��t<u;�m0cl��;���vJ0T����`j�k���Nf �����0J��s�AX{�5��k��tfKG�]�^�Uby��Yx�!�u��kn�_���L'��T��g�L�G�:��b�,Q����7�j�<���E$����B\��c�A��e%k����-�>{t:_B��\���������j�r4o7����p�3�g�� [�0��4�T{8�-�&x�1���0�����\;e��5�R��[���"��"�h�u/��E�����;�@��y2{�|���|�h��Q�X7I'A,�?s��<�NJe�<�[�z�j0Xx9�r���b�� �CgsSo�K3��U���Z����6�i��z���O�/��yYY�p���n?�t>�N��%���v��G��s����R���wQ� icX(<��6b�&�����$�G�����1b4��m����A�/�������3�X@�zZ��x|V1)k�z���^�T�3�
w������Nok��[�cRv�X��}���RH���u�������$�%D�/
��N�0���$oPo�)x�c0��t���������f&�Ah�7�obV���E�k�ukM���U�z�ic~���:rS�v9+������\<h|�g^oewE4��,�k�����e�����r�s�Y��4�GV2�ncC�M�h>��6@�Y��@1v�i��q�������H7�
/�J��,��,�O����+�kEL�a�,�{"�r�&�'3g����5'�^�����J��8��H�i�i5l���)@�F�
zH����i}w��&��@����{6{F�+�&�k��c$�G����lhz��O�A0*a�
hf�v��F�dWC���`�rw�	�_���L���
�wPA7�nNF;2�@vGf�����
��!wEo�� h7[�2�4A��+�f��@����P�������W8�s�n��:I6�9����d�Pe������]�����u�����������N������LgC),�&A17���������SWVpR$��$6� %BbY������7��$%�@�����T	���@K��be���U	_r����Q��\�����v��k6/w:��e���)�~V12���_6����7���$��N[�r����J	j������A�����_�B.ap�Z����S��Ng�����x�������_�B�v�
~Q�f�}������:����S}0����������U��������X7O<�Z�����
������"[6}����������V�yiG�����\ 5���Ko>����S��~���s9������v��z� ����)B�8V���rA��h��D�q��T�������g�����8{����7?�7�J�:s��*��F�2Reje��_)'f���?;�����������U�����x������WQjW8p�6��Z����o���@
u�
�n�3U��G�e���������nu�������n�nk����t��@��m��#h�����[��Q]���-��r��dR�����������e�F���X��{����n7����f�i(�����TA��a`��o�z���W[8�o���Dq���e��b(I���I4j3�h��M��D%
��t4E+���������+���l7~Q���l9W�Q�.����xJc�1H|������Wr���-�� LF�h���P�D�:���������GP<����:����r�:�A�����7��#���Ccskq{����������%=v�m�z z����������2��n�0R�|�N����\/wa*eu����(ByE��l1B-�n�����B��t��+�W������B�8!��X[\�M�]�m�S+�Q �YX��Q����y������0�5`�B�;�.)!����`>��[AT\2X.��DKL���s:mqB��}i�{>C��S�����^��&��E����@�H;����&�`u��	������Z]�E(\���h!q��7�E0_�U����2�E�>����AM^�<��n�*��$�J���Xs��(���@;NQ(�����j�H0�6���F��'�?����x�W[��P>
�o�lp�� �
}�6p�+c�+�0��W�J�&��V����e�8k��bxg����$������^�8�������h��<t�d���lbd����%�Ut�_:w.N����!�yr��l>�.�l���0}��5m}:�kW������ZH�%(���}G0~�2��D�!�fA��?E`*&|Lv������
�@�,����r���#�
����-D�&V��+R���9t�e����R,89/���z��KI�lq'w��6��v�������G�(�j�!J�������i
��i+J������y_<�$0�
�g��4�n�Dj��
��R�h���6,��z�����BJ||�M����--M�yx����G��<N��3���+3P������=]v����,a����\�t�Ng'�G	�78�q�����z��n��K�����h	�.��W������z�t2T0�e�;&���o��?}|{�Tpo���0t��}��%y]���@[M��
8
6��A�B��g�����V�$x������#/�,��v��fo{��my��9��`ft����N�K����ey�h�����UMu��r0@����p]n�q<+��N��:6�������K�3\�Ta:+r
���}���o���cNa�����hl�'�$� @[#r�����\n���e��c������u8I�u9i5��N7���>f[����2�q�������AKd��o��#���}b�?@�sMP,��v�6��1�����������w��)O�O�${�����~�-$SQ/%@��|'~�xi�6u�����:M����^*�r��y-�����s���A���fq= ��$z9�G���Z9�O�����[���J3���
����V��I���V�Q������F�Qi����Xy������l���a{�@.���N|�P�Kyp;�/���'#%���}}��ujz��`qu���{YQ����@�����:>_k�U�kA�C���u�".*Dv���J�(���I�����������Sx�����1�a�V���=��5��!�6��Dva���9J�Z��ng lY����T��1��r�NR�lp����X�y�4��5�6��Z�"<���"��^�w���d1��()y�2	��&�I���:{���H�x�k���+��I4��(�6����n�����s�W������6:|��:�������Z@]C!������s!�"�D��0f�����e�K���!��-Q���n����f{8���e��h�?���?��*wq���	��S�U<U�,n���8K�k�5�k�(�<-aB-a"��$0YL��"�0a������16�`����5%����AQ����' �qC�I����Xo:���B�����f���"D�|�%�\4�����u����$�#���K
B�N�h
6�S=\�K�'K�����w���LIp;�5����9@��%`�'8#A�&=�"
�]G�(���A�w��gjS'hT�4�d��q���Eqj���j���d�����ZB���Ibg=���a�pg�*G��a:4V�
��r�r�E��[���q(����LIK�����1���!�`4)��Q����V�0�8�6E��*�n���(t������y���p�1��S
+@�����'K�d���XR����4%6��`��� �����6�[�X�NO;������]��z�|A0I(���/v��!����&����BR�Y]�
�dT���x���Q��l{�������yLz���|K�-��08�t��I�^5�����LC���+�1k��L'[m�!��
x�+^�$aF�"k�&q����F��g��
�B�J`��E�"�;��_�A��62��=/�~���%����
�	����������
]��xIIY�1$�j����d*�>���@F��u,L���`�WL$���J��XZ�/�j���k�Td����t����X�D��:��u;��L�Q�u^���G8�R��ZT�������
����K.��(�;�#_��
�RS�_O5% ��p����fM/&2�i<u�/��M��7�D���j���3AF�J���v}l�!*Oa/Q
4�����x�������,f�����.���r�:� #hp�d����[r��1�������f�����Y��������_���/*,�[�$>�����V���dbu��nC%��a���agu�h�_��[�md+`�[��i�A�+>�������e�b�gQ�� KR8�� ����2V}��?��B�8��ylT��8����-	���'�80�Z"���u�j�4\D(c���#��M8�ty�{�x��)�U'��_��C��hI3�+$���U�k�����P%I��]�I/���U�Nw�����������e���3Y<���e�C�(��&�
���8��{u*Ixq���b@_�����@@��F���$��O�������|���d�)�!G�����@BX�"�}$Q�w|�C�@x7��=S�������������n��L�&eDQ��N�+%�T���%�e���[��4_�Z�g��R�W���N+$�*�5�Q���]R=�h����P�8�(��a�����p"_
��L��W���8�*|s��<H5��8��!u���bB!
-�����P'{�����^��������?�mZ�����P��v�BB7x�e������S�wPY����_u�6�_5����w�1��T�
�&H!����:�G��~����qO�}y2\3�k�Z%Ks�,:�������s��.���?bN�����3[�_�MV5R��y��i�%�L�"���9�=����''/��
�����.�����y�R$?�?Y�����e�e�AF�or��@|����Y�� ��vv<��r����,���������l��tc�?�4�F��dxJ���Jiz���R��P��_&���6_Y���l0����
6����F_�L��k�sm�'��kA���'�JN���1���I�FU����F����"���bjP�R�����d�M���>3��#�
u��R�����k�N[�2SS�OY��v����
j�#�4��0�c��1jv��f>i�R�26[`9ux��F�M��E�oV2'����.�Y�-;��g�bwQl�,�+��T�C���/���SW���t���������;U�����
L�� �$�B�]��;�v���&`�:�)+�:��2���9�`i�@2�{��15����f����!�[�u5��vR�a�
dr���X�H9�lM,�����T+��&��$�8Sv�1�l*xJ�a^�K��{��-B��V�q�l���U�d.z��%�W��������&�����FK�)G�*� `;�U�JJ*
�Y��D�f&��2�d/a�������1�����$�L�qP�3C��r���K��a���cgV"3���b<�T+n�����6�dH��T+`H5fHp�.������y#��:4�H+���;����.+p�H�@�{�	M��-(A���K'jh�FQ�^y{�Gj�/p45����
q:�����+#,�r�B�%���nH�������n��%(�������.�O����A�6���I�Q_���F���Z����h�p����&�J�E�@�hD���mK�)�4���p�L�(���n6
"��h��J V�98|�?p+�mAj}F���,|Z�I��I�J_��^��6U��]��j����E8$;3�~���ni�Jh�������p)�&��������W(�v�4��c|)����ob�t����(�{n��:h���_�6�ZV<)A���(��B���]u��1M��A���JrT��E���}���� c;���Y6#
hr�@�S.���jkqu6~B�`�m*#�kfKm�L��jQ�_�|���s%
���o���^k�o+����$�'�m�����
�6��8����_���gC�6�%4��8��]s�sW���Fv�j�.B
��M���.�2��c�z���2����V�	������ �;w�3�7�,>�1�8lG�$ �Y����@�����NB�5�g�T������`��LSw$.��P�*w\��E����S���kl��d�"+����C���a��i������/E#������8�r�\C� ��&���/��q��������
�xk`�,m��}i=�������3��J�zPj������T��N�^9�P5�!5�8j�L)�0k����kR����[�����j����a6�N"z -��Zx��6�2�y���"J���:/h�O������V^B.u�pzU�����zi-8�A[4�x����@5���5��~8M�R-��@�~��'���Tt�\�Q���9��+R����g�V}	�l�[���L;~���N���!��V���`����������l������x:���N:�����T���mP/Q��Y�z�Q�Q?������%�8_��-M�Lu�1�	+���A�~�%	>im'x�������&���
\}�\�}-1�o��d+��"���>��R�e��L�S�?���Pb�p���R�n�*��X�m=xJ��/"��V:�+N6_����T+��A�D,c	��:j�������YL;�@�7�������K���S���X���>ye�%S���xb��V
����K���hTF~����*hE���	�l��I���X��]�X�u��h�J��T��ZZ��g���=p�� p��[GdC�F�Ql��M:e:��%�/���tg
G��d���������E�
_���%��o���_�����kZ��%����n�:pG}�O��� ������K���
�������0��.)m>�oR�����z�y&�v=8�yp������?����<���J��s\���Zs�p�W���5Q���~(k��E],z�
���rdfW(<����j�Kp�_��0������hn�X��v��}�����������i�����*���3����_���V��3
zI�nu{������7�.c�����Q�?�
8�&����r�
����LY��:���Z���pAM���p�����M|(*}�cE����o`��	��
_�DD���>�������i>���;�!K�I����Y�EmY">�b�@Ws���@o�F�2\��
�N�����Z�KJby:
!�%��'��LYhR��W8�-H+��$�R�Wl��9�W9�[�7!1� ����W^������E7�U��x-%�9���N�~t�g�2�m0��'��@tN��](�l��W-��yo����+T8��(5�|��U(��6]-&kg��[����gLkQ����_J
E�!t:�C��>.�� �:.z]N]2�����Q�����&ZMy��Jj��+G�Z�V���g��!���Jc���_�}�d��3�i�U��=)��k^�B'����k@�?>)"D �9�V�c2���Q 
����v��.����:�S.����Dj�`(�c�x���P�LfC�?�V������������Jg�\�`���������z`�*��J��w��}c�>��'��������q�,�������(���5XZ-�*������u�!U�V����������@)���n����Loq��\_D�&��=@��;�]����~7��0�5��H�F��D���A2w@Y���
r���U_��Q|�h�+-�b�
c)_����:u
1^{�k&�P[�`Z���ED�=�S��������b�`J@���,ax����'��!%-^@����j��
�sI���������H^���G
��7��:� ��%^������"V���$GT�Z4�b�Oz��U.��w{�0c����O��;J��b�@Y�9�tm75X,������\�Q����\e�O���n~��*^F{|gP�m����Oz
q�.�s�Z���,^
������q��[,J���`����%V������E����tY���'�9�OP;_��G'�e:]"��m���-��.os[�9���	VE��g5�DJ�M�^�60$E�X'A^D1��%V,[�G����^���J*��{��Z+�7k�w����?b���8\������2���x�p*���iaPe����J]� ����'0j�	N�C5,-%*�!���0B�������{�s}���&hs/s��}��]IMX������������{�������kB��@��+�g��)&��R��?���BEF��{��{�M�~R{f�r���
�Z4�l�%o#�:Y��Z!�.������1�X���(�N�e���/
�cF�Q>�?���������~�-��M>	�����:����9�����9	h^9�����y1;4y��A����S��R�Wf����u;]�cnM�[����?��=���)U�������/3�;�
@��/����%$����3u0%�M�
�Gk��2f�w���+���%�yE���T@^��Ya9�e��
������C�<�2�������a��l�38Q�H�	�X�di/Z����H���b���������J���"�ikwb���6�'��J��C!E���ci��]DWW���9���tb;���r4)8O(�����&����`��s��T*��������������i�<:��2��f�t������F7�n����(Yw��=��%D�=�U�G�k�Vnz�����|}
������ �8
`_��Bd�@�����P�
�`�DrMl�q����'��@��yf54>�bx��m%����������vJ@�sI�r������D�h8�Q����`���T�b�cU&T����D�vPS7��&v�24� OU2�����	\����
������</(����\�F@7�����a���*�sK�Bu�d��&����|�M2������_���z��R7�i����iw�8�#=�ty9�$�#}`]�P���s.@QJr�� 3s���#��
>#�+�"�z��m%��4Z��������K���&�j��Y{q#�E�`������`��1��m�\��Q��3_��9wT�
�$�HU�������N�g�����cJ�u����9��5QC��GW�@�1IZ���~������G�����@3L���9���n�SR�^�UW| 2yBN��&$wC�V t'�T�(&����jN���2Y~�5B;jga����}="�f��^��I^VeN����4�c��51m��ik���?�|&�"�([�@��>�2L>-S��������^�8���"���xN	?2X���h&��=�iu��J�O�l�9@#c�K�Rc�B�������$�n�sFI�%Q+LX��Lpcja����'.�*��������6��t�����>dA���|��i�p��w�kpj�+��f #��$��.��F^FZ�L%����<t\��?�����t���F�p���[4���1m�0��8g$�����hZ�������������Y���w����/���E���F2!�E<�^g������-���V�&��>��UF������L�!���B�L�N�r�cl�_s �l)��%��0�����Q�
��x����������)tq��������W�"����z71x�Qn����"�C)��J
� �
���d�����������`N�J�|!�f2tg�3�1;?�,Zi�x���W��b}���R��r1�f�m����c2���k|�h��	���>��
�p_R-+��^.Q����%����Z���w{���V���y�I�!~���\,2�����������UF��0p���+c������2_PY�
�p���X7��b�������99n�|����V�l��t��I}����Yc6W �[����������"%)ur�����w��!����������!�����E|�ln"��N2��Y��h��������8:�fQ��z&1YCn��H�;�~)�(�6f����P.�
��w���������|��l��j��#�����U[����X��X���&F��Zo&d�����d����5+����F��V�w�/�o��Z�A_�X�1�;:O�m��o��������c�/�9Y�Z��@S�~u�#�\������(�H���5M�-��cK���|�����<�yM��)��s�A�~4�r6+�������"c����=�����m�WR]f7�3�|f�7&����Fpl�XG�H\�'6n�B��_���$�m�J�Fb�W�W	B����lnG���Q�C���}#}Bbg0����8i� ��T�~��������C%��]�v&x���@j�
|�yj���"������2�I����)}W���X�d���M'xQ����}F��&FB�)�
�C���_+��P����C�������7�*���	�wz��{���Oj{�$��,O.�x�f�Y[�'D�mm/�c��!�h����g ����!�T��	�t[��NP�m���^���	���e�.�B�C�j���mW��C�"j�Bp�#�'�����*4����(��.9���v�2�>��2�����j^�bL��k��Z���R���B|(�A1l������[@Q���}���=Z�����[�	�1�"s�U��'�����C��p6X"�Z��I��v�];���	�b�V�9�����J3n�K�Gg�y~4B\��Q=8?<�ip~xQ%�@�pm������_	�����.<�eP�j�j\�U������60�W,(�zE
@�����u����7O��vdQ`�wV$����iL����c�7����y�c�(����������
���D]v���#v��BH#�������?:��0H��p]��m�=�����T����s
!��gwP�9Z���xz�+~����.@�q�\�4v�H'<�����&������v�|@���m	�2rp-��j�	HC��
@�����u/k�\��V��!_J��/����	2��������U����"�\���;27j��yi���GJ4�cb�Lb�jZ�B���U��.�Z��8i=����4��)�A��j�p_tD�����������t@rQ�	�HfIa0Xp��'���y��SCwo����F��{X*�����L*WAYB���Z��w�����K]��Dv.�m��.���[������IR��:OL��+NQ��P,��(�0���k����J�i8�\���Wr�0bp���M�
��BY)��jAD��<�����&�D1�[�u���	��<����>DsH�|�MEK5�	�Y1X�ap������X0}c3I����?��r|��&����5������,=����g���u�������{������Q���Q:���6���� ��18���c6!�0p|�V��s�\l-���8��z�*��2��eJh7H��pjrI3z��L�Ph�"�@��U�Y
�vj�$k�:���KF���7�������O�J���]��a5A�����{(����Cg��������,��[�_pY����/tQu<�{{
���Ar2����@Uv�0������"�M�t>�M�Gy�����/�M-m�i0����G��T����Z��0���d�sf<R����cT���)��oI��|�aB��K[w�-Q��a���$����Kk�l)z�&M/26��l/O3�dGiq3Th5t-�� �k�tjv�1n0�@��
V��x��(�ldE���S�g�m���){��v����o��������A���-h�R�!\�v;����4<�!!=
����F���d����\������2������6�zy�J�� ��\:9���X_�8��"fC��8w�j i��p���U�S�����V�97��'�R�U���?5�M���f�
�R{�U)(�r����%�H�[�� �����^��x��$mc�=L3�A$�=���jZ�`sM	L�h�q2e��Wb|
-�9s����D
Bt�49t�3m�$\��k�+.���u5���xA���n#��/�x;2�&	�x�$��8�`�1!�4��ba�����g$��77�}�������PS�g��Q��	���[�"�N���w�va�����mXv��c���1���l���[��~�����h���	���d
&F�M��^H�����.8]D���n+�yw��S���m1vw�z��������#�3+�q���r��]
�%z����C%��L����/�Ul(}y&(����}w`���f!Y�Jo�M1���r������/.��"^�r�<���I3l"���xJ!B�K��&��v��� ����%FM�vW#M�S���-��swe��,�AJ'(�vk&�E4_�b#��m~(�W�Uo!F���[o��n�C�N��+Pa�g��������x.z`��,�o�0������&�7�d7����f��Y��dT�
U�r�K�%�T�N����@���?vJ�+����C"
��7'��^98y;x����/�"0���{j���hZy��.������p����m�)��d%�� ��F(#���A�j�%&���/d���v�1K�!VcUK���Y�i�\����H�K��f�-m��#A�Y��d	�R�x#�.��<���8^2�B(�{7 �\R�6l�,�����A[�$-8�I67B~l���i�!�:'v����=c"�����
����������	���l\�*��`yJ���"�|
��/�'gl2*�i��{�N��c|�ut+��H4w>dj�^W��Q���=Aa��tF�%EQ����90�������W�����)o�jA5)��Z��:'���)&)_���Y'��{���������s���/�U:a����-.1C)���3d����*�H���Kgy�8�o��0P��^p�����e��a�X�0H�=��\K$L��E�_"�rr{�����$%��1�xAk2<��O�I��m��
�qSp
�=A+CO�sTV_���k���v�����J��c��I�S���OE4�7���AD��RU\s(
����s�{5��%�B��hy/8����v�U1@2b#D'_����=�#��BP��������������j2)�O��3r��]����#V�����
Y��e����#:�����7rOK��}P4�8�PmDyE&A�]��+ny�b�8��y_�������KH��~RW�F�F�
�@�!���U���zK�{������Q�7���
��j��/S�6ru��54���w����u�������n�g0`���	/$����n��9Y �e(������3#�~Fj�����}6#K�[�j	b���.�%�,�������0�5;����F�
A.A�����Pj��k�������4 B)�C���I���4��%�p����ygQ��`>�NS�?���{A�����Z��}+�sQ�5%gnT��NV��
;��_���%�u���I�i�Eg�����i4N�����	:B��k7�1���
<)E�e�al
�9���}�����?����a�l�����68>�{��q��<���P0���D%K��\�Qb����
%����j�S���EF�<�J" �R�D�
u����8�A�q��\����� m�`���\�'�|�*��0���!��v���(^,`XV22:�����u�������7�gRx����>Tz�F2Gb�C���5���?e�a��J����!�|;�X�
���P�*�).��#�B��s�k��/�0u��;Q84����3m��#Cd����a�jGX��}`��J���Wm��*���XbWs�_���|��-?c���&�W����>LZ�����y���7UX���P�*��
�v��
�=��|�;��?�����>�{������&�#�'y����1�$���� 7*�l�>�o�2��_h�r�{&W�0��%'����Z����-��E 9^���APiF0��}��g^<��%54R42�f�d����svg�7df�A1Q_�;���"��smZ�(��7=������q�C	�|Q���d�l�kH���f�2Rb�=,�K�$��d��rD5��2s�'��]��xrK�T~g���;�g�Q������2rf���s�H�i��V�J?)�V�����n���$-Fz�������hY+X����]��<��2-x��n�4�R�&��&���Q8[2;	����Q	��Q63���;!#
d?�c��s}yHyE��y i��sXQ��f;�`�ND������l@f�����`��TP\�s�C�"F��O�p�)��F�f�J��` �ryx�����������m��2�������)��,{�Q����z����e�����eZNs��b�Zg.x�j_�����������Nkui+�-t�u)����i��K������/8r�j���!g��H�g�W�����K}��^����W�xPb\l���_C��M�B�'���\`$7����/���2��?~�~{[����4xBZo�w�����Lv�DQ!z�u����r�LmFM�g|W^_h$
?����h��N'1�����#���k\����;q`t�Q�\���"��fx�1�����9�]����q����~�D[� M����7�g4Q�.;o8���
��`�,J�p��5ST�IV*�G&W5�x��`�����B���V���.Gb��yXrz4����y�O��s��Ns�����
�,S�����@s3K��
�Pkoww	b4��/YF7�U���������%�7��G�/9��0�q����l���rh��(I�i�P��5*�+E�x`B����2
B$eH�|G��i���eO���}���&��d������:Ylfg��9���	Lvqw�Q��Mr�:� J��l���x�>���1/��D�i���c�m�m���_4v���D\a\����3D��G�Z�v�<h�3�������`��n��2Z�/!P&�^�D�?
P=���a�^�.-#�}q�Z����)�����z���kA:e�]���in9=���O�r&�sh��#3p�=�R�[��������Hr�������(D���Y�n�N.�����S,|�l�����<09�W��Ux�B�;���/MM������H	�P-�����3:��?{�Y�������[�t����RV�"���y����
<��,��y��x>W����drK������Q��m]6����h<��g��o)�_:�,l���^��
j�a���{���t�P
���LbuE��e��������!��W��"���������dY����m<q��m1�qp�r$�%�s�wb
��i�`4�u�����+���������#�����8�K��(��	�x&7L�HH�{��\��U�u[����8���$����=��h����Y2�S9U����t����f<��9��Q�7�3%O�����Vw�3�6���dw�S����F��|�����m����n������N���h>���}�)�-��o�;��������� s�r0����v�s�`�)���K���`*m��A�Z����$��h���Y��'�`;�E
!�^����&;k6�]�dw�&7W7�Y�I>3�M����H��6�CBv�<"�FOI�V������}�!�����baIe��q.�H��U�|zs�x>���7�?�e��\�6��^�������{���J����a��]���/��,������q��PO���Q���y��X����cP���,V�`��i	4������+M0{�N�,c
q�"�;o,H���>��eTIh�j��~�[��G��T���ZF�tp3���_<�+���PC�D�#����������v�|_I�d�%�����FH���9���h���_�J���Y�+}m�K������}�����h�<�|�Is��M�K������7{��hg3���������*t��d����F��S4F �dy��������'J|�q���7��E����y��C��J�{�r�P	�����AQ����oc���P�B�y�b�[���o����CI�cWdh�i�w0!]�U��VOW�Q6MZ��E�&��}y-��Fj�R�1g���$s�d���e
��ZZ,���[H7������P�$�NS~
,\�P`��\�Q��(�Pj����H���MkS��������	��(�__|��e�zg9�c�������:��K����f
J0/�L.'��N������.�����
���J�J���J^����
���9!Y.nGK�(��x6���g��j����B���\�S2��������������y�Ye�s�f�T��Kr)�������G�� n��
b�c��d2O.0��7������t����F�u��.�3~�X�R���$�\��1��0P���e��F�AfA�J��a�����{�����=AI��C�F�$ ;�����1��2��w��.�h9��PYd���06o<�O���iG
@vG4$��m,���g�+,!���(��`k6� �g��H��dZ?N�wc����4���	�����I�g�����S@J���������4��yFJ���I����/v��Xd��xhG��G��r��y���_6�@Tw�3D��:u��a��er�I��b�3�K���`�>���5��?������0A��axeR�������%�����^��z
3� �� 6��)�$�	�4�2���E����I�!�uYh18�F�Q��Gm���%������|�dF0rgu�%�/��N-Q�'�H��Lw`����s�/25DC������w��/�u����*��
M��������#���%���'7����y�>��~�r� �"$���P�}�aM�DR<��"'�
]����a��=c4�_��f��u��\/0�U��K>�H���[���������s�Aj2Y"��cY (�6�6��AH9�������w����IL�����K�q4���c�9����+����X��ya��9�Y��7{�jU��w��m���c�&��!l6�L�g��E<��g�3-l��Z=l��_^���������h+��1?�K<X��?��n��t0�������30�������B"��T��;�Q�:7E��U{�[�kX3��e
 �C��"��`$f8�;m��$�v�XK����W��Z����������{n8:J@�
�y
24TY���Uj���gDo�%��"zg�[H�6���lJ->{����b$��m�O�����<}����_�N�
�� �HJB�	����6f���r��o����t��l�/������W�v��{V����\�����n�-9�i0�H�a-MA�����Y������+�uX-�"�>H3v������N8��|r#(IK���+��D�H��]���u��0G"����8.�1)��$4�R���)�=�����g�qy�0cyws���.��������u|����]��������|��a�4��{�m�Pb�I�����I����*����D:u<7�	������+�+��|���k=edvT�j��a#�*��)a���)a��k��;K�K4���O��2�f�����Q��������6�F���Zr:�vV7��6��s�������]fl�*O/����x�� ��&��|��O���^�������8O��b�\}��#��W�oA7����R+�������/��s�+�,���<;	�C��?>����Lv�7G��gX*���R?������{�.tp��O��k]\�4p|��}���~�7�A��Md3bkZ@GM�z�"����w�r��f���B�w�^�h,wd��J#� dC��������
��.JF�}'���W�����o~0%���5l{��fs�5��vK��6
�V)4�w����J�]hg�7��u�����[�Y~�(E��]4^^3|D����<3�p��(=�}���z�s+��*������t�W�<qk�/i�
���� +�y6�I�;N�����m��r	����m0kL���-��V[��N����mM������v�*�.x�����7�7[�k7o:�w7{J��h�\;�/;Ah������E$��< P����P(��CE)�@�����_�] ��K�Iv�L��<��^�wG���lo���	[W�R�~������5�Eq�r�OR�����C��N�|��$K��%�{i<>�v
�v�nzzl;2�j��`�q�e��u���,q�����e��"&��'.��Gi"K���3��v���x{��S�E3
d�.So��6���k'M{�IES%l��W���J\�^x���l�~������;:y��zs��#���oN��o�/�ejp�[��-(��]�V���o�g��j�)!�m�����<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�<�����1RFH

#21

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#20)

Re: WIP: [[Parallel] Shared] Hash

Hi Thomas,

On Fri, Jan 27, 2017 at 5:03 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I have broken this up into a patch series, harmonised the private vs
shared hash table code paths better and fixed many things including
the problems with rescans and regression tests mentioned upthread.
You'll see that one of the patches is that throwaway BufFile
import/export facility, which I'll replace with your code as
discussed.

I'll try to get back to this ASAP, but expect to be somewhat busy next
week. Next week will be my last week at Heroku.

It was not an easy decision for me to leave Heroku, but I felt it was
time for a change. I am very grateful to have had the opportunity. I
have learned an awful lot during my time at the company. It has been
excellent to have an employer that has been so supportive of my work
on Postgres this whole time.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#8)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Jan 7, 2017 at 9:01 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Stepping back a bit, I am aware of the following approaches to hash
join parallelism:

1. Run the inner plan and build a private hash table in each
participant [...].

2. Run a partition-wise hash join[1]. [...]

3. Repartition the data on the fly, and then run a partition-wise
hash join. [...]

4. Scatter both the inner and outer plans arbitrarily across
participants [...], and build a shared hash
table. [...]

[...] I suspect that 4 is probably a better
fit than 3 for Postgres today, because the communication overhead of
shovelling nearly all tuples through extra tuple queues to route them
to the right hash table would surely be very high, though I can see
that it's very attractive to have a reusable tuple repartitioning
operator and then run k disjoint communication-free joins (again,
without code change to the join operator, and to the benefit of all
join operators).

On this topic, I recently stumbled on the 2011 paper "Design and
Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs"[1]https://pdfs.semanticscholar.org/9de4/b32f2c7b630a4f6aae6994a362a46c7c49e9.pdf
and found it reassuring. It compares simple shared hash tables to
some state-of-the-art repartitioning approaches (including the radix
join algorithm which performs the amazing feat of building a lot of
cacheline-sized hash tables and then runs with minimal cache misses).

From the introduction:

"Second, we show that an algorithm that does not do any partitioning,
but simply constructs a single shared hash table on the build relation
often outperforms more complex algorithms. This simple
“no-partitioning” hash join algorithm is robust to sub-optimal
parameter choices by the optimizer, and does not require any knowledge
of the characteristics of the input to work well. To the best of our
knowledge, this simple hash join technique differs from what is
currently implemented in existing DBMSs for multi-core hash join
processing, and offers a tantalizingly simple, efficient, and robust
technique for implementing the hash join operation."

"Finally, we show that the simple “no-partitioning” hash join
algorithm takes advantage of intrinsic hardware optimizations to
handle skew. As a result, this simple hash join technique often
benefits from skew and its relative performance increases as the skew
increases! This property is a big advancement over the
state-of-the-art methods, as it is important to have methods that can
gracefully handle skew in practice [8]."

(That relates to SMT pipelining compensating for the extra cacheline
misses during probing by doing thread A's work while waiting for
thread B's memory to be fetched.)

From the conclusion:

"... Our results show that a simple hash join technique that does not
do any partitioning of the input relations often outperforms the other
more complex partitioning-based join alternatives. In addition, the
relative performance of this simple hash join technique rapidly
improves with increasing skew, and it outperforms every other
algorithm in the presence of even small amounts of skew."

For balance, the authors of a 2013 paper "Main-Memory Hash Joins on
Multi-Core CPUs: Tuning to the Underlying Hardware"[2]https://www.inf.ethz.ch/personal/cagri.balkesen/publications/parallel-joins-icde13.pdf are less keen
on the simple "hardware-oblivious" "no partitioning" approach and
don't buy the other paper's ideas about SMT. Incidentally, their
results on the benefits of large (huge) pages are interesting (table
VI) and suggest that huge page support for DSM segments could be good
here.

[1]: https://pdfs.semanticscholar.org/9de4/b32f2c7b630a4f6aae6994a362a46c7c49e9.pdf
[2]: https://www.inf.ethz.ch/personal/cagri.balkesen/publications/parallel-joins-icde13.pdf

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Thomas Munro (#20)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Jan 28, 2017 at 10:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I have broken this up into a patch series, harmonised the private vs
shared hash table code paths better and fixed many things including
the problems with rescans and regression tests mentioned upthread.
You'll see that one of the patches is that throwaway BufFile
import/export facility, which I'll replace with your code as
discussed.

Patch moved to CF 2017-03.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#20)

Re: WIP: [[Parallel] Shared] Hash

0003-hj-refactor-memory-accounting-v4.patch:

Modify the existing hash join code to work in terms of chunks when
estimating and later tracking memory usage. This is probably more
accurate than the current tuple-based approach, because it tries to
take into account the space used by chunk headers and the wasted space
in chunks. In practice the difference is probably small, but it's
arguably more accurate; I did this because I need chunk-based
accounting the later patches. Also, make HASH_CHUNK_SIZE the actual
size of allocated chunks (ie the header information is included in
that size so we allocate exactly 32KB, not 32KB + a bit, for the
benefit of the dsa allocator which otherwise finishes up allocating
36KB).

I looked at this patch. I agree that it accounts the memory usage more
accurately. Here are few comments.

spaceUsed is defined with comment
Size spaceUsed; /* memory space currently used by tuples */

In ExecHashTableCreate(), although the space is allocated for buckets, no
tuples are yet inserted, so no space is used by the tuples, so going strictly
by the comment, spaceUsed should be 0 in that function. But I think the patch
is accounting the spaceUsed more accurately. Without this patch, the actual
allocation might cross spaceAllowed without being noticed. With this patch
that's not possible. Probably we should change the comment to say memory space
currently allocated. However, ExecHashIncreaseNumBatches() may change the
number of buckets; the patch does not seem to account for spaceUsed changes
because of that.

Without this patch ExecHashTableInsert() used to account for the space used by
a single tuple inserted. The patch moves this calculation in dense_alloc() and
accounts for out-of-bound allocation for larger tuples. That's good.

The change in ExecChooseHashTableSize() too looks fine.

In ExecHashTableReset(), do we want to update spacePeak while setting
spaceUsed.

While this patch tracks space usage more accurately, I am afraid we might be
overdoing it; a reason why we don't track space usage accurately now. But I
think I will leave it to be judged by someone who is more familiar with the
code and possibly has historical perspective.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Ashutosh Bapat (#24)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Feb 1, 2017 at 2:10 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

0003-hj-refactor-memory-accounting-v4.patch:
[...]

I looked at this patch. I agree that it accounts the memory usage more
accurately. Here are few comments.

Thanks for the review!

spaceUsed is defined with comment
Size spaceUsed; /* memory space currently used by tuples */

In ExecHashTableCreate(), although the space is allocated for buckets, no
tuples are yet inserted, so no space is used by the tuples, so going strictly
by the comment, spaceUsed should be 0 in that function. But I think the patch
is accounting the spaceUsed more accurately. Without this patch, the actual
allocation might cross spaceAllowed without being noticed. With this patch
that's not possible. Probably we should change the comment to say memory space
currently allocated.

Right, that comment is out of date. It is now the space used by the
bucket array and the tuples. I will fix that in the next version.

However, ExecHashIncreaseNumBatches() may change the
number of buckets; the patch does not seem to account for spaceUsed changes
because of that.

That's what this hunk is intended to do:

@@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,

hashtable->nbuckets_optimal);

+       /* account for the increase in space that will be used by buckets */
+       hashtable->spaceUsed += sizeof(HashJoinTuple) *
+               (hashtable->nbuckets_optimal - hashtable->nbuckets);
+       if (hashtable->spaceUsed > hashtable->spacePeak)
+               hashtable->spacePeak = hashtable->spaceUsed;
+
        hashtable->nbuckets = hashtable->nbuckets_optimal;
        hashtable->log2_nbuckets = hashtable->log2_nbuckets_optimal;

It knows that spaceUsed already includes the old bucket array
(nbuckets), so it figures out how much bigger the new bucket array
will be (nbuckets_optimal - nbuckets) and adds that.

Without this patch ExecHashTableInsert() used to account for the space used by
a single tuple inserted. The patch moves this calculation in dense_alloc() and
accounts for out-of-bound allocation for larger tuples. That's good.

The change in ExecChooseHashTableSize() too looks fine.

In ExecHashTableReset(), do we want to update spacePeak while setting
spaceUsed.

I figured there was no way that the new spaceUsed value could be
bigger than spacePeak, because we threw out all chunks and have just
the bucket array, and we had that number of buckets before, so
spacePeak must at least have been set to a number >= this number
either when we expanded to this many buckets, or when we created the
hashtable in the first place. Perhaps I should
Assert(hashtable->spaceUsed <= hashtable->spacePeak).

While this patch tracks space usage more accurately, I am afraid we might be
overdoing it; a reason why we don't track space usage accurately now. But I
think I will leave it to be judged by someone who is more familiar with the
code and possibly has historical perspective.

Well it's not doing more work; it doesn't make any practical
difference whatsoever but it's technically doing less work than
master, by doing memory accounting only when acquiring a new 32KB
chunk. But if by overdoing it you mean that no one really cares about
the tiny increase in accuracy so the patch on its own is a bit of a
waste of time, you're probably right. Depending on tuple size, you
could imagine something like 64 bytes of header and unused space per
32KB chunk that we're not accounting for, and that's only 0.2%. So I
probably wouldn't propose this refactoring just on accuracy grounds
alone.

This refactoring is intended to pave the way for shared memory
accounting in the later patches, which would otherwise generate ugly
IPC if done for every time a tuple is allocated. I considered using
atomic add to count space per tuple, or maintaining per-backend local
subtotals and periodically summing. Then I realised that switching to
per-chunk accounting would fix the IPC problem AND be justifiable on
theoretical grounds. When we allocate a new 32KB chunk, we really are
using 32KB more of your memory.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#25)

Re: WIP: [[Parallel] Shared] Hash

However, ExecHashIncreaseNumBatches() may change the
number of buckets; the patch does not seem to account for spaceUsed changes
because of that.

That's what this hunk is intended to do:

@@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,

hashtable->nbuckets_optimal);
+       /* account for the increase in space that will be used by buckets */
+       hashtable->spaceUsed += sizeof(HashJoinTuple) *
+               (hashtable->nbuckets_optimal - hashtable->nbuckets);
+       if (hashtable->spaceUsed > hashtable->spacePeak)
+               hashtable->spacePeak = hashtable->spaceUsed;
+

Sorry, I missed that hunk. You are right, it's getting accounted for.

In ExecHashTableReset(), do we want to update spacePeak while setting
spaceUsed.

I figured there was no way that the new spaceUsed value could be
bigger than spacePeak, because we threw out all chunks and have just
the bucket array, and we had that number of buckets before, so
spacePeak must at least have been set to a number >= this number
either when we expanded to this many buckets, or when we created the
hashtable in the first place. Perhaps I should
Assert(hashtable->spaceUsed <= hashtable->spacePeak).

That would help, better if you explain this with a comment before Assert.

While this patch tracks space usage more accurately, I am afraid we might be
overdoing it; a reason why we don't track space usage accurately now. But I
think I will leave it to be judged by someone who is more familiar with the
code and possibly has historical perspective.

Well it's not doing more work; it doesn't make any practical
difference whatsoever but it's technically doing less work than
master, by doing memory accounting only when acquiring a new 32KB
chunk.

This patch does more work while counting the space used by buckets, I
guess. AFAIU, right now, that happens only after the hash table is
built completely. But that's fine. I am not worried about whether the
it's less work or more.

But if by overdoing it you mean that no one really cares about
the tiny increase in accuracy so the patch on its own is a bit of a
waste of time, you're probably right.

This is what I meant by overdoing; you have spelled it better.

Depending on tuple size, you
could imagine something like 64 bytes of header and unused space per
32KB chunk that we're not accounting for, and that's only 0.2%. So I
probably wouldn't propose this refactoring just on accuracy grounds
alone.

This refactoring is intended to pave the way for shared memory
accounting in the later patches, which would otherwise generate ugly
IPC if done for every time a tuple is allocated. I considered using
atomic add to count space per tuple, or maintaining per-backend local
subtotals and periodically summing. Then I realised that switching to
per-chunk accounting would fix the IPC problem AND be justifiable on
theoretical grounds. When we allocate a new 32KB chunk, we really are
using 32KB more of your memory.

+1.

Thanks for considering the comments.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 9 years ago

In reply to: Ashutosh Bapat (#26)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Feb 1, 2017 at 10:13 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

However, ExecHashIncreaseNumBatches() may change the
number of buckets; the patch does not seem to account for spaceUsed changes
because of that.

That's what this hunk is intended to do:

@@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,

hashtable->nbuckets_optimal);
+       /* account for the increase in space that will be used by buckets */
+       hashtable->spaceUsed += sizeof(HashJoinTuple) *
+               (hashtable->nbuckets_optimal - hashtable->nbuckets);
+       if (hashtable->spaceUsed > hashtable->spacePeak)
+               hashtable->spacePeak = hashtable->spaceUsed;
+
Sorry, I missed that hunk. You are right, it's getting accounted for.

In ExecHashTableReset(), do we want to update spacePeak while setting
spaceUsed.

I figured there was no way that the new spaceUsed value could be
bigger than spacePeak, because we threw out all chunks and have just
the bucket array, and we had that number of buckets before, so
spacePeak must at least have been set to a number >= this number
either when we expanded to this many buckets, or when we created the
hashtable in the first place. Perhaps I should
Assert(hashtable->spaceUsed <= hashtable->spacePeak).

That would help, better if you explain this with a comment before Assert.

While this patch tracks space usage more accurately, I am afraid we might be
overdoing it; a reason why we don't track space usage accurately now. But I
think I will leave it to be judged by someone who is more familiar with the
code and possibly has historical perspective.

Well it's not doing more work; it doesn't make any practical
difference whatsoever but it's technically doing less work than
master, by doing memory accounting only when acquiring a new 32KB
chunk.

This patch does more work while counting the space used by buckets, I
guess. AFAIU, right now, that happens only after the hash table is
built completely. But that's fine. I am not worried about whether the
it's less work or more.

But if by overdoing it you mean that no one really cares about
the tiny increase in accuracy so the patch on its own is a bit of a
waste of time, you're probably right.

This is what I meant by overdoing; you have spelled it better.

Depending on tuple size, you
could imagine something like 64 bytes of header and unused space per
32KB chunk that we're not accounting for, and that's only 0.2%. So I
probably wouldn't propose this refactoring just on accuracy grounds
alone.

This refactoring is intended to pave the way for shared memory
accounting in the later patches, which would otherwise generate ugly
IPC if done for every time a tuple is allocated. I considered using
atomic add to count space per tuple, or maintaining per-backend local
subtotals and periodically summing. Then I realised that switching to
per-chunk accounting would fix the IPC problem AND be justifiable on
theoretical grounds. When we allocate a new 32KB chunk, we really are
using 32KB more of your memory.

+1.

Thanks for considering the comments.

Hello Thomas,

I was performing performance analysis of this set of patches on TPC-H
higher scale factor and came across following cases:
- Only 6 queries are using parallel hash
- Q8, is showing regression from 8 seconds on head to 15 seconds with
this patch set
- In the remaining queries, most are not showing significant
improvement in performance, numbers are,

Query | Head | with patch
---------|----------------|----------------
3 | 72829.921 | 59915.961
5 | 54815.123 | 55751.214
7 | 41346.71 | 46149.742
8 | 8801.814 | 15049.155
9 | 62928.88 | 59077.909
10 | 62446.136 | 61933.278

Could you please look into this regression case, also let me know if
the setup I am using is something that is expectant to give such
performance for your patch, or is there anything else you might want
to point out. Let me know if you need any more information for these
tests.

Experimental setup is as follows:
Scale factor: 20
work_mem = 64 MB
max_parallel_workers_per_gather = 4
shared_buffers = 8GB
effective_cache_size = 10 GB
Additional indexes are on columns (all individually) l_shipdate,
l_shipmode, o_comment, o_orderdate, c_mktsegment.

For the output plans on head and with this set of patch please check
the attached tar folder.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#28

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Rafia Sabih (#27)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

9 | 62928.88 | 59077.909

Thanks Rafia. At first glance this plan is using the Parallel Shared
Hash in one place where it should pay off, that is loading the orders
table, but the numbers are terrible. I noticed that it uses batch
files and then has to increase the number of batch files, generating a
bunch of extra work, even though it apparently overestimated the
number of rows, though that's only ~9 seconds of ~60. I am
investigating.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#28)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 2, 2017 at 1:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

9 | 62928.88 | 59077.909

Thanks Rafia. At first glance this plan is using the Parallel Shared
Hash in one place where it should pay off, that is loading the orders
table, but the numbers are terrible. I noticed that it uses batch
files and then has to increase the number of batch files, generating a
bunch of extra work, even though it apparently overestimated the
number of rows, though that's only ~9 seconds of ~60. I am
investigating.

Hi Thomas,
Apart from the previously reported regression, there appear one more
issue in this set of patches. At times, running a query using parallel
hash it hangs up and all the workers including the master shows the
following backtrace,

#0 0x00003fff880c7de8 in __epoll_wait_nocancel () from /lib64/power8/libc.so.6
#1 0x00000000104e2718 in WaitEventSetWaitBlock (set=0x100157bde90,
cur_timeout=-1, occurred_events=0x3fffdbe69698, nevents=1) at
latch.c:998
#2 0x00000000104e255c in WaitEventSetWait (set=0x100157bde90,
timeout=-1, occurred_events=0x3fffdbe69698, nevents=1,
wait_event_info=134217745) at latch.c:950
#3 0x0000000010512970 in ConditionVariableSleep (cv=0x3ffd736e05a4,
wait_event_info=134217745) at condition_variable.c:132
#4 0x00000000104dbb1c in BarrierWaitSet (barrier=0x3ffd736e0594,
new_phase=1, wait_event_info=134217745) at barrier.c:97
#5 0x00000000104dbb9c in BarrierWait (barrier=0x3ffd736e0594,
wait_event_info=134217745) at barrier.c:127
#6 0x00000000103296a8 in ExecHashShrink (hashtable=0x3ffd73747dc0) at
nodeHash.c:1075
#7 0x000000001032c46c in dense_alloc_shared
(hashtable=0x3ffd73747dc0, size=40, shared=0x3fffdbe69eb8,
respect_work_mem=1 '\001') at nodeHash.c:2618
#8 0x000000001032a2f0 in ExecHashTableInsert
(hashtable=0x3ffd73747dc0, slot=0x100158f9e90, hashvalue=2389907270)
at nodeHash.c:1476
#9 0x0000000010327fd0 in MultiExecHash (node=0x100158f9800) at nodeHash.c:296
#10 0x0000000010306730 in MultiExecProcNode (node=0x100158f9800) at
execProcnode.c:577

The issue is not deterministic and straightforwardly reproducible,
sometimes after make clean, etc. queries run sometimes they hang up
again. I wanted to bring this to your notice hoping you might be
faster than me in picking up the exact reason behind this anomaly.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Rafia Sabih (#29)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 2, 2017 at 4:57 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Apart from the previously reported regression, there appear one more
issue in this set of patches. At times, running a query using parallel
hash it hangs up and all the workers including the master shows the
following backtrace,

Ugh, thanks. Investigating.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#20)

Re: WIP: [[Parallel] Shared] Hash

0004-hj-refactor-batch-increases-v4.patch:

Modify the existing hash join code to detect work_mem exhaustion at
the point where chunks are allocated, instead of checking after every
tuple insertion. This matches the logic used for estimating, and more
importantly allows for some parallelism in later patches.

The patch has three changes
1. change dense_alloc() to accept respect_workmem argument and use it
within the function.
2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from
ExecHashTableInsert() to account for memory before inserting new tuple
3. Check growEnabled before calling ExecHashIncreaseNumBatches().

I think checking growEnabled within ExecHashIncreaseNumBatches() is
more easy to maintain that checking at every caller. If someone is to
add a caller tomorrow, s/he has to remember to add the check.

It might be better to add some comments in
ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be
called with respect_work_mem = false? ExecHashSkewTableInsert() does
call ExecHashIncreaseNumBatches() after calling
ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do
expect increase in space used and thus go beyond work_mem for a short
while. Is there a way we can handle this case in dense_alloc()?

Is it possible that increasing the number of batches changes the
bucket number of the tuple being inserted? If so, should we
recalculate the bucket and batch of the tuple being inserted?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Rafia Sabih (#29)

2 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 2, 2017 at 4:57 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

On Thu, Feb 2, 2017 at 1:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

[ regressions ]

Thanks Rafia. At first glance this plan is using the Parallel Shared
Hash in one place where it should pay off, that is loading the orders
table, but the numbers are terrible. I noticed that it uses batch
files and then has to increase the number of batch files, generating a
bunch of extra work, even though it apparently overestimated the
number of rows, though that's only ~9 seconds of ~60. I am
investigating.

Hi Thomas,
Apart from the previously reported regression, there appear one more
issue in this set of patches. At times, running a query using parallel
hash it hangs up and all the workers including the master shows the
following backtrace,

Here's a new version to fix the problems reported by Rafia above. The
patch descriptions are as before but it starts from 0002 because 0001
was committed as 7c5d8c16 (thanks, Andres).

First, some quick master-vs-patch numbers from the queries listed with
regressions, using TPCH dbgen scale 10, work_mem = 64MB,
max_parallel_workers_per_gather = 4, shared_buffers = 8GB (the numbers
themselves not comparable as different scale and different hardware).
Better except for Q5 and Q8, which for some mysterious reason plans
only one worker and then loses. I'm looking into that.

Q3 19917.682 -> 8649.822
Q5 4149.983 -> 4192.551
Q7 14453.721 -> 10303.911
Q8 1981.540 -> 8030.264
Q9 26928.102 -> 17384.607
Q10 16955.240 -> 14563.787

I plan to explore the performance space with a range of worker numbers
and work_mem sizes and do some analysis; more soon.

Changes:

1. Fixed two bugs that resulted in ExecHashShrink sometimes hanging,
as reported by Rafia. (1) When splitting the large v3 patch up into
smaller patches for v4, I'd managed to lose the line that initialises
shared->shrink_barrier, causing some occasional strange behaviour.
(2) I found a bug[1]/messages/by-id/CAEepm=3a4VaPFnmwcdyUH8gE5_hW4tRvXQkpfQyrzgDQ9gJCYw@mail.gmail.com in condition_variable.c that could cause hangs
and fixed that via a separate patch and the fix was committed as
3f3d60d3 (thanks, Robert).

2. Simplified barrier.c by removing BarrierWaitSet(), because that
turned out to be unnecessary to implement rescan as I'd originally
thought, and was incompatible with the way BarrierDetach() works. The
latter assumes that the phase only ever increments, so that
combination of features was broken.

3. Sorted out the hash table sizing logic that was previously leading
to some strange decisions about batches. This involved putting the
total estimated number of inner rows into the path and plan when there
is a partial inner plan, because plan_rows only has the partial
number. I need to size the hash table correctly at execution time.
It seems a bit strange to do that specifically and only for Hash (see
rows_total in the 0008 patch)... should there be some more generic
way? Should total rows go into Plan rather than HashPlan, or perhaps
the parallel divisor should go somewhere?

4. Comments fixed and added based on Ashutosh's feedback on patch 0003.

5. Various small bug fixes.

I've also attached a small set of test queries that hit the four
"modes" (for want of a better word) of our hash join algorithm for
dealing with different memory conditions, which I've nicknamed thus:

1. "Good": We estimate that the hash table will fit in work_mem, and
at execution time it does. This patch makes that more likely because
[Parallel] Shared Hash gets to use more work_mem as discussed.

2. "Bad": We estimate that the hash table won't fit in work_mem, but
that if we partition it into N batches using some bits from the hash
value then each batch will fit in work_mem. At execution time, each
batch does indeed fit into work_mem. This is not ideal, because we
have to write out and read back in N - (1 / N) inner and outer tuples
(ie all batches except the first one, although actually costsize.c
always charges for all of them). But it may still be better than
other plans, and the IO is sequential. Currently Shared Hash
shouldn't be selected over (private) Hash if it would require batching
anyway due to the cpu_shared_tuple_cost tie-breaker: on the one had it
avoids a bunch of copies of the batch files being written out, but on
the other it introduces a bunch of synchronisation overhead. Parallel
Shared Hash is fairly likely to be chosen if possible be due to
division of the inner relation's cost outweighing
cpu_shared_tuple_cost.

3. "Ugly": We planned for "good" or "bad" mode, but we ran out of
work_mem at some point during execution: this could be during the
initial hash table load, or while loading a subsequent batch. So now
we double the number of batches, splitting the current batch and all
batches that haven't been processed yet into two in the hope of
shrinking the hash table, while generating extra reading and writing
of all as-yet unprocessed tuples. This patch can do the shrinking
work in parallel, which may help.

4. "Fail": After reaching "ugly" mode (and perhaps trying multiple
times to shrink the hash table), we deduce that there is a kind of
extreme skew that our partitioning scheme can never help with. So we
stop respecting work_mem and hope for the best. The hash join may or
may not be able to complete, depending on how much memory you can
successfully allocate without melting the server or being killed by
the OOM reaper.

The "ugly" mode was added in 2005[1]/messages/by-id/CAEepm=3a4VaPFnmwcdyUH8gE5_hW4tRvXQkpfQyrzgDQ9gJCYw@mail.gmail.com, so before that we had only
"good", "bad" and "fail". We don't ever want to be in "ugly" or
"fail" modes: a sort merge join would have been better, or in any
case is guaranteed to be able to run to completion in the configured
space. However, at the point where we reach this condition, there
isn't anything else we can do.

Some other interesting cases that hit new code are: rescan with single
batch (reuses the hash table contents), rescan with multiple batches
(blows away and rebuilds the hash table), outer join (scans hash table
for unmatched tuples). Outer joins are obviously easy to test but
rescans are a bit tricky to reach... one way is to run TPCH Q9 with
cph_shared_tuple_cost = -10 (I think what's happening here is that
it's essentially running the optimiser in reverse, and a nested loop
rescanning a gather node (= fork/exit workers for every loop) is about
the worst plan imaginable), but I haven't found a short and sweet test
query for that yet.

Some assorted thoughts:

* Instead of abandoning our work_mem limit in "fail" mode, you might
think we could probe the portion of the hash table that we managed to
load so far, then rewind the outer batch and probe again using the
next work_mem-sized portion of the same inner batch file. This
doesn't work though because in the case of work_mem exhaustion during
the initial batch it's too late to decide to start recording the the
initial outer batch, so we have no way to rewind.

* Instead of using the shared hash table for batch mode, we could do
just the initial batch with a shared hash table, but drop back to
smaller private hash tables for later batches and give each worker its
own batch to work until they're all done with no further
communication. There are some problems with this though: inability to
handle outer joins (just like parallel hash join in 9.6), limit of
work_mem (not work_mem * P) for the private hash tables, load
balancing/granularity problems with skewed data. Thanks to my
colleague Ashutosh Bapat for this off-list suggestion.

One of the unpleasant things about this patch is the risk of deadlock,
as already discussed. I wanted to mention an idea for how to get rid
of this problem eventually. I am aware of two ways that a deadlock
could happen:

1. A worker is waiting to write into its tuple queue (because the
reader is not consuming fast enough and its fixed buffer has filled
up), but the leader (which should be reading the tuple queue) is stuck
waiting for the worker. This is avoided currently with the early-exit
protocol, at the cost of losing a CPU core after probing the first
batch.

2. Two different hash joins run in non-deterministic order. Workers
A and B have executed hash join nodes 1 and 2 at least once and
attached to the barrier, and now Worker A is in hash join node 1, and
worker B is in hash join node 2 at a barrier wait point. I am not
aware of any executor nodes that could do that currently, but there is
nothing to say that future nodes couldn't do that. If I am wrong
about that and this could happen today, that would be fatal for this
patch in its current form.

Once we have asynchronous execution infrastructure, perhaps we could
make those problems go away like this:

1. Introduce a new way for barrier clients to try to advance to the
next phase, but detach and return immediately if they would have to
wait.

2. Introduce a way for barriers to participate in the the readiness
protocol used for async execution, so that barrier advances counts as
a kind of readiness. (The asynchronous scheduler probably doesn't
need to know anything about that since it's based on latches which the
WaitSet API already knows how to multiplex.)

3. Teach Hash Join to yield instead of waiting at barriers, asking to
be executed again when the barrier might have advanced.

4. Make sure the Gather node is suitably asynchronicity-aware. At a
minimum it should be able to deal with the child plan yielding (in the
case where it runs in the leader due to lack of better things to do)
and be able to try that again when it needs to.

[1]: /messages/by-id/CAEepm=3a4VaPFnmwcdyUH8gE5_hW4tRvXQkpfQyrzgDQ9gJCYw@mail.gmail.com
[2]: /messages/by-id/15661.1109887540@sss.pgh.pa.us
[3]: 849074f9ae422c64501bb1d53ef840de870bf65c

--
Thomas Munro
http://www.enterprisedb.com

#33

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Ashutosh Bapat (#31)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 9, 2017 at 2:03 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

0004-hj-refactor-batch-increases-v4.patch:

Modify the existing hash join code to detect work_mem exhaustion at
the point where chunks are allocated, instead of checking after every
tuple insertion. This matches the logic used for estimating, and more
importantly allows for some parallelism in later patches.

The patch has three changes
1. change dense_alloc() to accept respect_workmem argument and use it
within the function.
2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from
ExecHashTableInsert() to account for memory before inserting new tuple
3. Check growEnabled before calling ExecHashIncreaseNumBatches().

Thanks for the review!

I think checking growEnabled within ExecHashIncreaseNumBatches() is
more easy to maintain that checking at every caller. If someone is to
add a caller tomorrow, s/he has to remember to add the check.

Hmm. Yeah. In the later 0010 patch ExecHashIncreaseNumBatches will
be used in a slightly different way -- not for making decisions or
performing the hash table shrink, but only for reallocating the batch
arrays. I will see if putting the growEnabled check back in there in
the 0004 patch and then refactoring in a later patch makes more sense
to someone reviewing the patches independently, for the next version.

It might be better to add some comments in
ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be
called with respect_work_mem = false? ExecHashSkewTableInsert() does
call ExecHashIncreaseNumBatches() after calling
ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do
expect increase in space used and thus go beyond work_mem for a short
while. Is there a way we can handle this case in dense_alloc()?

Right, that needs some explanation, which I'll add for the next
version. The explanation is that while 'shrinking' the hash table, we
may need to go over the work_mem limit by one chunk for a short time.
That is already true in master, but by moving the work_mem checks into
dense_alloc I ran into the problem that dense_alloc might decide to
shrink the hash table which needs to call dense alloc. Shrinking
works by spinning through all the chunks copying only the tuples we
want to keep into new chunks and freeing the old chunks as we go. We
will temporarily go one chunk over work_mem when we allocate the first
new chunk but before we've freed the first old one. We don't want
shrink operations to trigger recursive shrink operations, so we
disable respect for work_mem when calling it from
ExecHashIncreaseNumBatches. In the course of regular hash table
loading, we want to respect work_mem.

Looking at the v5 patch series I posted yesterday, I see that in fact
ExecHashIncreaseNumBatches calls dense_alloc with respect_work_mem =
true in the 0004 patch, and then I corrected that mistake in the 0008
patch; I'll move the correction back to the 0004 patch in the next
version.

To reach ExecHashIncreaseNumBatches, see the "ugly" query in
hj-test-queries.sql (posted with v5).

In ExecHashRemoveNextSkewBucket I preserved the existing behaviour of
not caring about work_mem when performing the rare operation of
copying a tuple from the skew bucket into a dense_alloc memory chunk
so it can be inserted into a regular (non-skew) bucket.

Is it possible that increasing the number of batches changes the
bucket number of the tuple being inserted? If so, should we
recalculate the bucket and batch of the tuple being inserted?

No -- see the function documentation for ExecHashGetBucketAndBatch.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#33)

Re: WIP: [[Parallel] Shared] Hash

Out of archeological curiosity, I was digging around in the hash join
code and RCS history from Postgres 4.2[1]http://db.cs.berkeley.edu/postgres.html, and I was astounded to
discover that it had a parallel executor for Sequent SMP systems and
was capable of parallel hash joins as of 1991. At first glance, it
seems to follow approximately the same design as I propose: share a
hash table and use a barrier to coordinate the switch from build phase
to probe phase and deal with later patches. It uses mmap to get space
and then works with relative pointers. See
src/backend/executor/n_hash.c and src/backend/executor/n_hashjoin.c.
Some of this might be described in Wei Hong's PhD thesis[2]http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf which I
haven't had the pleasure of reading yet.

The parallel support is absent from the first commit in our repo
(1996), but there are some vestiges like RelativeAddr and ABSADDR used
to access the hash table (presumably needlessly) and also some
mentions of parallel machines in comments that survived up until
commit 26069a58 (1999).

[1]: http://db.cs.berkeley.edu/postgres.html
[2]: http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#32)

Re: WIP: [[Parallel] Shared] Hash

Hi,

On 2017-02-13 23:57:00 +1300, Thomas Munro wrote:

Here's a new version to fix the problems reported by Rafia above. The
patch descriptions are as before but it starts from 0002 because 0001
was committed as 7c5d8c16 (thanks, Andres).

FWIW, I'd appreciate if you'd added a short commit message to the
individual patches - I find it helpful to have a littlebit more context
while looking at them than just the titles. Alternatively you can
include that text when re-posting the series, but it's imo just as easy
to have a short commit message (and just use format-patch).

I'm for now using [1]http://archives.postgresql.org/message-id/CAEepm%3D1D4-tP7j7UAgT_j4ZX2j4Ehe1qgZQWFKBMb8F76UW5Rg%40mail.gmail.com as context.

0002-hj-add-dtrace-probes-v5.patch

Hm. I'm personally very unenthusiastic about addming more of these, and
would rather rip all of them out. I tend to believe that static
problems simply aren't a good approach for anything requiring a lot of
detail. But whatever.

0003-hj-refactor-memory-accounting-v5.patch
@@ -424,15 +422,29 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
if (ntuples <= 0.0)
ntuples = 1000.0;

-	/*
-	 * Estimate tupsize based on footprint of tuple in hashtable... note this
-	 * does not allow for any palloc overhead.  The manipulations of spaceUsed
-	 * don't count palloc overhead either.
-	 */
+	/* Estimate tupsize based on footprint of tuple in hashtable. */

palloc overhead is still unaccounted for, no? In the chunked case that
might not be much, I realize that (so that comment should probably have
been updated when chunking was introduced).

-	Size		spaceUsed;		/* memory space currently used by tuples */
+	Size		spaceUsed;		/* memory space currently used by hashtable */

It's not really the full hashtable, is it? The ->buckets array appears
to still be unaccounted for.

Looks ok.

0004-hj-refactor-batch-increases-v5.patch

@@ -1693,10 +1689,12 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
}

 /*
- * Allocate 'size' bytes from the currently active HashMemoryChunk
+ * Allocate 'size' bytes from the currently active HashMemoryChunk.  If
+ * 'respect_work_mem' is true, this may cause the number of batches to be
+ * increased in an attempt to shrink the hash table.
  */
 static void *
-dense_alloc(HashJoinTable hashtable, Size size)
+dense_alloc(HashJoinTable hashtable, Size size, bool respect_work_mem)

{
 	HashMemoryChunk newChunk;
 	char	   *ptr;
@@ -1710,6 +1708,15 @@ dense_alloc(HashJoinTable hashtable, Size size)
 	 */
 	if (size > HASH_CHUNK_THRESHOLD)
 	{
+		if (respect_work_mem &&
+			hashtable->growEnabled &&
+			hashtable->spaceUsed + HASH_CHUNK_HEADER_SIZE + size >
+			hashtable->spaceAllowed)
+		{
+			/* work_mem would be exceeded: try to shrink hash table */
+			ExecHashIncreaseNumBatches(hashtable);
+		}
+

Isn't it kinda weird to do this from within dense_alloc()? I mean that
dumps a lot of data to disk, frees a bunch of memory and so on - not
exactly what "dense_alloc" implies. Isn't the free()ing part also
dangerous, because the caller might actually use some of that memory,
like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
deeply enough to check whether that's an active bug, but it seems like
inviting one if not.

0005-hj-refactor-unmatched-v5.patch

I'm a bit confused as to why unmatched tuple scan is a good parallelism
target, but I might see later...

0006-hj-barrier-v5.patch

Skipping that here.

0007-hj-exec-detach-node-v5.patch

Hm. You write elsewhere:

By the time ExecEndNode() runs in workers, ExecShutdownNode() has
already run. That's done on purpose because, for example, the hash
table needs to survive longer than the parallel environment to allow
EXPLAIN to peek at it. But it means that the Gather node has thrown
out the shared memory before any parallel-aware node below it gets to
run its Shutdown and End methods. So I invented ExecDetachNode()
which runs before ExecShutdownNode(), giving parallel-aware nodes a
chance to say goodbye before their shared memory vanishes. Better
ideas?

To me that is a weakness in the ExecShutdownNode() API - imo child nodes
should get the chance to shutdown before the upper-level node.
ExecInitNode/ExecEndNode etc give individual nodes the freedom to do
things in the right order, but ExecShutdownNode() doesn't. I don't
quite see why we'd want to invent a separate ExecDetachNode() that'd be
called immediately before ExecShutdownNode().

An easy way to change that would be to return in the
ExecShutdownNode()'s T_GatherState case, and delegate the responsibility
of calling it on Gather's children to ExecShutdownGather().
Alternatively we could make it a full-blown thing like ExecInitNode()
that every node needs to implement, but that seems a bit painful.

Or have I missed something here?

Random aside: Wondered before if having to provide all executor
callbacks is a weakness of our executor integration, and whether it
shouldn't be a struct of callbacks instead...

0008-hj-shared-single-batch-v5.patch

First-off: I wonder if we should get the HASHPATH_TABLE_SHARED_SERIAL
path committed first. ISTM that's already quite beneficial, and there's
a good chunk of problems that we could push out initially.

This desperately needs tests.

Have you measured whether the new branches in nodeHash[join] slow down
non-parallel executions? I do wonder if it'd not be better to have to
put the common code in helper functions and have seperate
T_SharedHashJoin/T_SharedHash types. If you both have a parallel and
non-parallel hash in the same query, the branches will be hard to
predict...

I think the synchronization protocol with the various phases needs to be
documented somewhere. Probably in nodeHashjoin.c's header.

The state machine code in MultiExecHash() also needs more
comments. Including the fact that avoiding repeating work is done by
"electing" leaders via BarrierWait().

I wonder if it wouldn't be better to inline the necessary code into the
switch (with fall-throughs), instead of those gotos; putting some of the
relevant code (particularly the scanning of the child node) into
seperate functions.

+ build:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* Make sure our local state is up-to-date so we can build. */
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_BUILDING);
+		ExecHashUpdate(hashtable);
+	}
+
 	/*
 	 * set expression context
 	 */
@@ -128,18 +197,78 @@ MultiExecHash(HashState *node)

Why's is the parallel code before variable initialization stuff like
setting up econtext?

Introduces hash joins with "Shared Hash" and "Parallel Shared Hash"
nodes, for single-batch joins only.

We don't necessarily know that ahead of time. So this isn't something
that we could actually apply separately, right?

 	/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-	if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-		ExecHashIncreaseNumBuckets(hashtable);
+	ExecHashUpdate(hashtable);
+	ExecHashIncreaseNumBuckets(hashtable);

It's kinda weird that we had the nearly redundant nbuckets !=
nbuckets_optimal checks before...

+static void *
+dense_alloc_shared(HashJoinTable hashtable,
+				   Size size,
+				   dsa_pointer *shared)

Hm. I wonder if HASH_CHUNK_SIZE being only 32kb isn't going to be a
bottlenck here.

@@ -195,6 +238,40 @@ ExecHashJoin(HashJoinState *node)
 				if (TupIsNull(outerTupleSlot))
 				{
 					/* end of batch, or maybe whole join */
+
+					if (HashJoinTableIsShared(hashtable))
+					{
+						/*
+						 * An important optimization: if this is a
+						 * single-batch join and not an outer join, there is
+						 * no reason to synchronize again when we've finished
+						 * probing.
+						 */
+						Assert(BarrierPhase(&hashtable->shared->barrier) ==
+							   PHJ_PHASE_PROBING);
+						if (hashtable->nbatch == 1 && !HJ_FILL_INNER(node))
+							return NULL;	/* end of join */
+

I think it's a bit weird that the parallel path now has an exit path
that the non-parallel path doesn't have.

+	 * If this is a shared hash table, there is a extra charge for inserting
+	 * each tuple into the shared hash table to cover memory synchronization
+	 * overhead, compared to a private hash table.  There is no extra charge
+	 * for probing the hash table for outer path row, on the basis that
+	 * read-only access to a shared hash table shouldn't be any more
+	 * expensive.
+	 *
+	 * cpu_shared_tuple_cost acts a tie-breaker controlling whether we prefer
+	 * HASHPATH_TABLE_PRIVATE or HASHPATH_TABLE_SHARED_SERIAL plans when the
+	 * hash table fits in work_mem, since the cost is otherwise the same.  If
+	 * it is positive, then we'll prefer private hash tables, even though that
+	 * means that we'll be running N copies of the inner plan.  Running N
+	 * copies of the copies of the inner plan in parallel is not considered
+	 * more expensive than running 1 copy of the inner plan while N-1
+	 * participants do nothing, despite doing less work in total.
+	 */
+	if (table_type != HASHPATH_TABLE_PRIVATE)
+		startup_cost += cpu_shared_tuple_cost * inner_path_rows;
+
+	/*
+	 * If this is a parallel shared hash table, then the value we have for
+	 * inner_rows refers only to the rows returned by each participant.  For
+	 * shared hash table size estimation, we need the total number, so we need
+	 * to undo the division.
+	 */
+	if (table_type == HASHPATH_TABLE_SHARED_PARALLEL)
+		inner_path_rows_total *= get_parallel_divisor(inner_path);
+
+	/*

Is the per-tuple cost really the same for HASHPATH_TABLE_SHARED_SERIAL
and PARALLEL?

Don't we also need to somehow account for the more expensive hash-probes
in the HASHPATH_TABLE_SHARED_* cases? Seems quite possible that we'll
otherwise tend to use shared tables for small hashed tables that are
looked up very frequently, even though a private one will likely be
faster.

+	/*
+	 * Set the table as sharable if appropriate, with parallel or serial
+	 * building.  If parallel, the executor will also need an estimate of the
+	 * total number of rows expected from all participants.
+	 */

Oh. I was about to comment that sharable is wrong, just to discover it's
valid in NA. Weird.

@@ -2096,6 +2096,7 @@ create_mergejoin_path(PlannerInfo *root,
  * 'required_outer' is the set of required outer rels
  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
  *		(this should be a subset of the restrict_clauses list)
+ * 'table_type' to select [[Parallel] Shared] Hash
  */
 HashPath *
 create_hashjoin_path(PlannerInfo *root,

Reminds me that you're not denoting the Parallel bit in explain right
now - intentionally so?

 /*
- * To reduce palloc overhead, the HashJoinTuples for the current batch are
- * packed in 32kB buffers instead of pallocing each tuple individually.
+ * To reduce palloc/dsa_allocate overhead, the HashJoinTuples for the current
+ * batch are packed in 32kB buffers instead of pallocing each tuple
+ * individually.

s/palloc\/dsa_allocate/allocator/?

@@ -112,8 +121,12 @@ typedef struct HashMemoryChunkData
size_t maxlen; /* size of the buffer holding the tuples */
size_t used; /* number of buffer bytes already used */

-	struct HashMemoryChunkData *next;	/* pointer to the next chunk (linked
-										 * list) */
+	/* pointer to the next chunk (linked list) */
+	union
+	{
+		dsa_pointer shared;
+		struct HashMemoryChunkData *unshared;
+	} next;

This'll increase memory usage on some platforms, e.g. when using
spinlock backed atomics. I tend to think that that's fine, but it's
probably worth calling out.

--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -787,7 +787,15 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_HASH_BEGINNING,
+	WAIT_EVENT_HASH_CREATING,
+	WAIT_EVENT_HASH_BUILDING,
+	WAIT_EVENT_HASH_RESIZING,
+	WAIT_EVENT_HASH_REINSERTING,
+	WAIT_EVENT_HASH_UNMATCHED,
+	WAIT_EVENT_HASHJOIN_PROBING,
+	WAIT_EVENT_HASHJOIN_REWINDING
 } WaitEventIPC;

Hm. That seems a bit on the detailed side - if we're going that way it
seems likely that we'll end up with hundreds of wait events. I don't
think gradually evolving wait events into something like a query
progress framework is a good idea.

That's it for now...

- Andres

[1]: http://archives.postgresql.org/message-id/CAEepm%3D1D4-tP7j7UAgT_j4ZX2j4Ehe1qgZQWFKBMb8F76UW5Rg%40mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Andres Freund (#35)

Re: WIP: [[Parallel] Shared] Hash

Hi,

Just to summarize what you could read between the lines in the previous
mail: From a higher level POV the design here makes sense to me, I do
however think there's a good chunk of code-level improvements needed.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#35)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 16, 2017 at 3:36 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks for the review!

FWIW, I'd appreciate if you'd added a short commit message to the
individual patches - I find it helpful to have a littlebit more context
while looking at them than just the titles. Alternatively you can
include that text when re-posting the series, but it's imo just as easy
to have a short commit message (and just use format-patch).

I'm for now using [1] as context.

Ok, will do.

0002-hj-add-dtrace-probes-v5.patch

Hm. I'm personally very unenthusiastic about addming more of these, and
would rather rip all of them out. I tend to believe that static
problems simply aren't a good approach for anything requiring a lot of
detail. But whatever.

Ok, I will get rid of these. Apparently you aren't the only committer
who hates these. (I have some other thoughts on that but will save
them for another time.)

0003-hj-refactor-memory-accounting-v5.patch
@@ -424,15 +422,29 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
if (ntuples <= 0.0)
ntuples = 1000.0;
-       /*
-        * Estimate tupsize based on footprint of tuple in hashtable... note this
-        * does not allow for any palloc overhead.  The manipulations of spaceUsed
-        * don't count palloc overhead either.
-        */
+       /* Estimate tupsize based on footprint of tuple in hashtable. */
palloc overhead is still unaccounted for, no? In the chunked case that
might not be much, I realize that (so that comment should probably have
been updated when chunking was introduced).

Yeah, it seemed like an obsolete comment, but I'll put it back as that
isn't relevant to this patch.

-       Size            spaceUsed;              /* memory space currently used by tuples */
+       Size            spaceUsed;              /* memory space currently used by hashtable */
It's not really the full hashtable, is it? The ->buckets array appears
to still be unaccounted for.

It is actually the full hash table, and that is a change in this
patch. See ExecHashTableCreate and ExecHashTableReset where is it set
to nbuckets * sizeof(HashJoinTuple), so that at all times it holds the
total size of buckets + all chunks. Unlike the code in master, where
it's just the sum of all tuples while building, but then the bucket
space is added at the end in MultiExecHash.

Looks ok.

Thanks!

0004-hj-refactor-batch-increases-v5.patch

@@ -1693,10 +1689,12 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
}

/*
- * Allocate 'size' bytes from the currently active HashMemoryChunk
+ * Allocate 'size' bytes from the currently active HashMemoryChunk.  If
+ * 'respect_work_mem' is true, this may cause the number of batches to be
+ * increased in an attempt to shrink the hash table.
*/
static void *
-dense_alloc(HashJoinTable hashtable, Size size)
+dense_alloc(HashJoinTable hashtable, Size size, bool respect_work_mem)

{
HashMemoryChunk newChunk;
char       *ptr;
@@ -1710,6 +1708,15 @@ dense_alloc(HashJoinTable hashtable, Size size)
*/
if (size > HASH_CHUNK_THRESHOLD)
{
+               if (respect_work_mem &&
+                       hashtable->growEnabled &&
+                       hashtable->spaceUsed + HASH_CHUNK_HEADER_SIZE + size >
+                       hashtable->spaceAllowed)
+               {
+                       /* work_mem would be exceeded: try to shrink hash table */
+                       ExecHashIncreaseNumBatches(hashtable);
+               }
+

Isn't it kinda weird to do this from within dense_alloc()? I mean that
dumps a lot of data to disk, frees a bunch of memory and so on - not
exactly what "dense_alloc" implies.

Hmm. Yeah I guess that is a bit weird. My problem was that in the
shared case (later patch), when you call this function it has a fast
path and a slow path: the fast path just hands out more space from the
existing chunk, but the slow path acquires an LWLock and makes space
management decisions which have to be done sort of "atomically". In
an earlier version I toyed with the idea of making dense_alloc return
NULL if you said respect_work_mem = true and it determined that you
need to increase the number of batches or go help other workers who
have already started doing so. Then the batch increase stuff was not
in here, but callers who say respect_work_mem = true (the build and
reload loops) had to be prepared to loop and shrink if they get NULL,
or some wrapper function needs to do that them. I will try it that
way again.

Isn't the free()ing part also
dangerous, because the caller might actually use some of that memory,
like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
deeply enough to check whether that's an active bug, but it seems like
inviting one if not.

I'm not sure if I get what you mean here.
ExecHashRemoveNextSkewBucket calls dense_alloc with respect_work_mem =
false, so it's not going to enter that path.

0005-hj-refactor-unmatched-v5.patch

I'm a bit confused as to why unmatched tuple scan is a good parallelism
target, but I might see later...

Macroscopically because any time we can spread the resulting tuples
over all participants, we enable parallelism in all executor nodes
above this one in the plan. Suppose I made one worker do the
unmatched scan while the others twiddled their thumbs; now some other
join above me finishes up with potentially many tuples all in one
process while the rest do nothing.

Microscopically because we may be spinning through 1GB of memory
testing these bits, and the way that it is coded in master will do
that in random order whereas this way will be in sequential order,
globally and within each participant. (You could stuff the matched
bits all up one end of each chunk, so that they'd all fit in a
cacheline... but not suggesting that or any other micro-optimisation
for the sake of it: the main reason is the macroscopic one.)

0006-hj-barrier-v5.patch

Skipping that here.

0007-hj-exec-detach-node-v5.patch

Hm. You write elsewhere:

By the time ExecEndNode() runs in workers, ExecShutdownNode() has
already run. That's done on purpose because, for example, the hash
table needs to survive longer than the parallel environment to allow
EXPLAIN to peek at it. But it means that the Gather node has thrown
out the shared memory before any parallel-aware node below it gets to
run its Shutdown and End methods. So I invented ExecDetachNode()
which runs before ExecShutdownNode(), giving parallel-aware nodes a
chance to say goodbye before their shared memory vanishes. Better
ideas?

To me that is a weakness in the ExecShutdownNode() API - imo child nodes
should get the chance to shutdown before the upper-level node.
ExecInitNode/ExecEndNode etc give individual nodes the freedom to do
things in the right order, but ExecShutdownNode() doesn't. I don't
quite see why we'd want to invent a separate ExecDetachNode() that'd be
called immediately before ExecShutdownNode().

Hmm. Yes that makes sense, I think.

An easy way to change that would be to return in the
ExecShutdownNode()'s T_GatherState case, and delegate the responsibility
of calling it on Gather's children to ExecShutdownGather().

That might work for the leader but maybe not for workers (?)

Alternatively we could make it a full-blown thing like ExecInitNode()
that every node needs to implement, but that seems a bit painful.

Or have I missed something here?

Let me try a couple of ideas and get back to you.

Random aside: Wondered before if having to provide all executor
callbacks is a weakness of our executor integration, and whether it
shouldn't be a struct of callbacks instead...

0008-hj-shared-single-batch-v5.patch

First-off: I wonder if we should get the HASHPATH_TABLE_SHARED_SERIAL
path committed first. ISTM that's already quite beneficial, and there's
a good chunk of problems that we could push out initially.

The reason I don't think we can do that is because single-batch hash
joins can turn into multi-batch hash joins at execution time, unless
you're prepared to use unbounded memory in rare cases. I don't think
that's acceptable. I had the single batch shared hash code working
reasonably well early on, and then came to understand that it couldn't
really be committed without the full enchilada, because melting your
server is not a reasonable thing to do if the estimates are off. Then
I spent a really long time battling with the multi-batch case to get
here!

This desperately needs tests.

Will add.

Have you measured whether the new branches in nodeHash[join] slow down
non-parallel executions? I do wonder if it'd not be better to have to
put the common code in helper functions and have seperate
T_SharedHashJoin/T_SharedHash types. If you both have a parallel and
non-parallel hash in the same query, the branches will be hard to
predict...

Huh. That is an interesting thought. Will look into that.

I think the synchronization protocol with the various phases needs to be
documented somewhere. Probably in nodeHashjoin.c's header.

Will do.

The state machine code in MultiExecHash() also needs more
comments. Including the fact that avoiding repeating work is done by
"electing" leaders via BarrierWait().

Ok.

I wonder if it wouldn't be better to inline the necessary code into the
switch (with fall-throughs), instead of those gotos; putting some of the
relevant code (particularly the scanning of the child node) into
seperate functions.

Right, this comes from a desire to keep the real code common for
private and shared hash tables. I will look into other ways to
structure it.

+ build:
+       if (HashJoinTableIsShared(hashtable))
+       {
+               /* Make sure our local state is up-to-date so we can build. */
+               Assert(BarrierPhase(barrier) == PHJ_PHASE_BUILDING);
+               ExecHashUpdate(hashtable);
+       }
+
/*
* set expression context
*/
@@ -128,18 +197,78 @@ MultiExecHash(HashState *node)

Why's is the parallel code before variable initialization stuff like
setting up econtext?

Will move.

Introduces hash joins with "Shared Hash" and "Parallel Shared Hash"
nodes, for single-batch joins only.

We don't necessarily know that ahead of time. So this isn't something
that we could actually apply separately, right?

Indeed, as mentioned above.

/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-       if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-               ExecHashIncreaseNumBuckets(hashtable);
+       ExecHashUpdate(hashtable);
+       ExecHashIncreaseNumBuckets(hashtable);

It's kinda weird that we had the nearly redundant nbuckets !=
nbuckets_optimal checks before...

+static void *
+dense_alloc_shared(HashJoinTable hashtable,
+                                  Size size,
+                                  dsa_pointer *shared)
Hm. I wonder if HASH_CHUNK_SIZE being only 32kb isn't going to be a
bottlenck here.

Yeah, I should benchmark some different sizes.

@@ -195,6 +238,40 @@ ExecHashJoin(HashJoinState *node)
if (TupIsNull(outerTupleSlot))
{
/* end of batch, or maybe whole join */
+
+                                       if (HashJoinTableIsShared(hashtable))
+                                       {
+                                               /*
+                                                * An important optimization: if this is a
+                                                * single-batch join and not an outer join, there is
+                                                * no reason to synchronize again when we've finished
+                                                * probing.
+                                                */
+                                               Assert(BarrierPhase(&hashtable->shared->barrier) ==
+                                                          PHJ_PHASE_PROBING);
+                                               if (hashtable->nbatch == 1 && !HJ_FILL_INNER(node))
+                                                       return NULL;    /* end of join */
+

I think it's a bit weird that the parallel path now has an exit path
that the non-parallel path doesn't have.

Indeed, but I think it's fairly clearly explained? Do you think there
is something unsafe about exiting in that state?

+        * If this is a shared hash table, there is a extra charge for inserting
+        * each tuple into the shared hash table to cover memory synchronization
+        * overhead, compared to a private hash table.  There is no extra charge
+        * for probing the hash table for outer path row, on the basis that
+        * read-only access to a shared hash table shouldn't be any more
+        * expensive.
+        *
+        * cpu_shared_tuple_cost acts a tie-breaker controlling whether we prefer
+        * HASHPATH_TABLE_PRIVATE or HASHPATH_TABLE_SHARED_SERIAL plans when the
+        * hash table fits in work_mem, since the cost is otherwise the same.  If
+        * it is positive, then we'll prefer private hash tables, even though that
+        * means that we'll be running N copies of the inner plan.  Running N
+        * copies of the copies of the inner plan in parallel is not considered
+        * more expensive than running 1 copy of the inner plan while N-1
+        * participants do nothing, despite doing less work in total.
+        */
+       if (table_type != HASHPATH_TABLE_PRIVATE)
+               startup_cost += cpu_shared_tuple_cost * inner_path_rows;
+
+       /*
+        * If this is a parallel shared hash table, then the value we have for
+        * inner_rows refers only to the rows returned by each participant.  For
+        * shared hash table size estimation, we need the total number, so we need
+        * to undo the division.
+        */
+       if (table_type == HASHPATH_TABLE_SHARED_PARALLEL)
+               inner_path_rows_total *= get_parallel_divisor(inner_path);
+
+       /*

Is the per-tuple cost really the same for HASHPATH_TABLE_SHARED_SERIAL
and PARALLEL?

I *guess* the real cost for insertion depends on hard-to-estimate
things like collision probability (many tuples into same bucket, also
false sharing on same cacheline). I think the dynamic partitioning
based parallel hash join systems would use the histogram to deal with
balancing for their more course-grained disjoint version of this
problem, but that seemed like overkill for this. I just added a
simple GUC cpu_shared_tuple_cost to model the cost for inserting,
primarily as a tie-breaker so that we'd prefer private hash tables to
shared ones, unless shared ones allow us to avoid batching or enable
parallel build.

Let me try to measure the difference in insertion speeds with a few
interesting key distributions and get back to you.

Don't we also need to somehow account for the more expensive hash-probes
in the HASHPATH_TABLE_SHARED_* cases? Seems quite possible that we'll
otherwise tend to use shared tables for small hashed tables that are
looked up very frequently, even though a private one will likely be
faster.

Hmm. I don't expect hash probes to be more expensive. Why should
they be: DSA address decoding? I will try to measure that too.

With the costing as I have it, we should use private tables for small
relations unless there is a partial plan available. If there is a
partial plan it usually looks better because it gets to divide the
whole shemozzle by 2, 3, 8 or whatever. To avoid using shared tables
for small cheap to build tables even if there is a partial plan
available I think we might need an extra cost term which estimates the
number of times we expect to have to wait for peers, and how long you
might have to wait.

The simple version might be a GUC "synchronization_cost", which is the
cost per anticipated barrier wait. In a typical single batch inner
join we could charge one of those (for the wait between building and
probing), and for a single batch outer join we could charge two (you
also have to wait to begin the outer scan). Then, if the subplan
looks really expensive (say a big scan with a lot of filtering), we'll
still go for the partial plan so we can divide the cost by P and we'll
come out ahead even though we have to pay one synchronisation cost,
but if it looks cheap (seq scan of tiny table) we won't bother with a
partial plan because the synchronisation cost wouldn't pay for itself.
Add more for extra batches.

But... that's a bit bogus, because the real cost isn't really some
kind of fixed "synchronisation" per se; it's how long you think it'll
take between the moment the average participant finishes building (ie
runs out of tuples to insert) and the moment the last participant
finishes. That comes down to the granularity of parallelism and the
cost per tuple. For example, parallel index scans and parallel
sequential scans read whole pages at a time; so at some point you hit
the end of the supply of tuples, but one of your peers might have up
to one whole page worth to process, so however long that takes, that's
how long you'll have to wait for that guy to be finished and reach the
barrier. That's quite tricky to estimate, unless you have a way to
ask a child path "how many times to do I have to execute you to pull
one 'granule' of data from your ultimate tuple source", and multiple
that by the path's total cost / path's estimated rows, and then (say)
guesstimate that on average you'll be twiddling your thumbs for half
that many cost units. Or some better maths, but that sort of thing.

Thoughts?

(I suppose a partition-wise join as subplan of a Hash node might
introduce an extreme case of course granularity if it allows
participants to process whole join partitions on their own, so that a
barrier wait at end-of-hash-table-build might leave everyone waiting
24 hours for one peer to finish pulling tuples from the final join
partition in its subplan...?!)

+       /*
+        * Set the table as sharable if appropriate, with parallel or serial
+        * building.  If parallel, the executor will also need an estimate of the
+        * total number of rows expected from all participants.
+        */

Oh. I was about to comment that sharable is wrong, just to discover it's
valid in NA. Weird.

It does look pretty weird now that you mention it! I'll change it,
because "shareable prevails by a 2:1 margin in American texts"
according to http://grammarist.com/spelling/sharable-shareable/ , or
maybe I'll change it to "shared".

@@ -2096,6 +2096,7 @@ create_mergejoin_path(PlannerInfo *root,
* 'required_outer' is the set of required outer rels
* 'hashclauses' are the RestrictInfo nodes to use as hash clauses
*             (this should be a subset of the restrict_clauses list)
+ * 'table_type' to select [[Parallel] Shared] Hash
*/
HashPath *
create_hashjoin_path(PlannerInfo *root,

Reminds me that you're not denoting the Parallel bit in explain right
now - intentionally so?

Yes I am... here are the three cases:

Hash Join
-> [... some parallel-safe plan ...]
-> Hash
-> [... some parallel-safe plan ...]

Parallel Hash Join
-> [... some partial plan ...]
-> Shared Hash
-> [... some parallel-safe plan ...]

Parallel Hash Join
-> [... some partial plan ...]
-> Parallel Shared Hash
-> [... some partial plan ...]

Make sense?

/*
- * To reduce palloc overhead, the HashJoinTuples for the current batch are
- * packed in 32kB buffers instead of pallocing each tuple individually.
+ * To reduce palloc/dsa_allocate overhead, the HashJoinTuples for the current
+ * batch are packed in 32kB buffers instead of pallocing each tuple
+ * individually.

s/palloc\/dsa_allocate/allocator/?

Ok.

@@ -112,8 +121,12 @@ typedef struct HashMemoryChunkData
size_t maxlen; /* size of the buffer holding the tuples */
size_t used; /* number of buffer bytes already used */
-       struct HashMemoryChunkData *next;       /* pointer to the next chunk (linked
-                                                                                * list) */
+       /* pointer to the next chunk (linked list) */
+       union
+       {
+               dsa_pointer shared;
+               struct HashMemoryChunkData *unshared;
+       } next;
This'll increase memory usage on some platforms, e.g. when using
spinlock backed atomics. I tend to think that that's fine, but it's
probably worth calling out.

In the code quoted above it won't because that's a plain dsa_pointer,
not an atomic one. But yeah you're right about HashJoinBucketHead. I
will note with comments.

If I'm looking at the right column of
https://wiki.postgresql.org/wiki/Atomics then concretely we're talking
about 80386 (not the more general i386 architecture but the specific
dead chip), ARM v5, PA-RISC and SparcV8 (and presumably you'd only
bother turning on parallel query if you had an SMP configuration), so
it's a technicality to consider but as long as it compiles and
produces the right answer on those machines I assume it's OK, right?
(Postgres 4.2 also supported parallel hash joins on Sequent 80386 SMP
systems and put a spinlock into each bucket so anyone upgrading their
Sequent system directly from Postgres 4.2 to a theoretical future
PostgreSQL version with this patch will hopefully not consider this to
be a regression.)

On the other hand, I could get rid of the union for each bucket slot
and instead have a union that points to the first bucket, so that such
systems don't have to pay for the wider buckets-with-spinlocks even
when using private hash tables. Will look into that.

Actually I was meaning to ask you something about this: is it OK to
memset all the bucket heads to zero when clearing the hash table or do
I have to loop over them and pg_atomic_write_XXX(&x, 0) to avoid
upsetting the emulated atomic state into a bad state? If that memset
is not safe on emulated-atomics systems then I guess I should probably
consider macros to select between a loop or memset depending on the
implementation.

--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -787,7 +787,15 @@ typedef enum
WAIT_EVENT_MQ_SEND,
WAIT_EVENT_PARALLEL_FINISH,
WAIT_EVENT_SAFE_SNAPSHOT,
-       WAIT_EVENT_SYNC_REP
+       WAIT_EVENT_SYNC_REP,
+       WAIT_EVENT_HASH_BEGINNING,
+       WAIT_EVENT_HASH_CREATING,
+       WAIT_EVENT_HASH_BUILDING,
+       WAIT_EVENT_HASH_RESIZING,
+       WAIT_EVENT_HASH_REINSERTING,
+       WAIT_EVENT_HASH_UNMATCHED,
+       WAIT_EVENT_HASHJOIN_PROBING,
+       WAIT_EVENT_HASHJOIN_REWINDING
} WaitEventIPC;
Hm. That seems a bit on the detailed side - if we're going that way it
seems likely that we'll end up with hundreds of wait events. I don't
think gradually evolving wait events into something like a query
progress framework is a good idea.

I thought the idea was to label each wait point in the source so that
an expert can see exactly why we're waiting.

That's it for now...

Thanks! Plenty for me to go away and think about. I will post a new
version soon.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Andres Freund (#35)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Feb 15, 2017 at 9:36 PM, Andres Freund <andres@anarazel.de> wrote:

0002-hj-add-dtrace-probes-v5.patch

Hm. I'm personally very unenthusiastic about addming more of these, and
would rather rip all of them out. I tend to believe that static
problems simply aren't a good approach for anything requiring a lot of
detail. But whatever.

I'm not a big fan of either static problems or static probes, myself.

Isn't it kinda weird to do this from within dense_alloc()? I mean that
dumps a lot of data to disk, frees a bunch of memory and so on - not
exactly what "dense_alloc" implies. Isn't the free()ing part also
dangerous, because the caller might actually use some of that memory,
like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
deeply enough to check whether that's an active bug, but it seems like
inviting one if not.

I haven't looked at this, but one idea might be to just rename
dense_alloc() to ExecHashBlahBlahSomething(). If there's a real
abstraction layer problem here then we should definitely fix it, but
maybe it's just the angle at which you hold your head.

To me that is a weakness in the ExecShutdownNode() API - imo child nodes
should get the chance to shutdown before the upper-level node.
ExecInitNode/ExecEndNode etc give individual nodes the freedom to do
things in the right order, but ExecShutdownNode() doesn't. I don't
quite see why we'd want to invent a separate ExecDetachNode() that'd be
called immediately before ExecShutdownNode().

Interestingly, the same point came up on the Parallel Bitmap Heap Scan thread.

An easy way to change that would be to return in the
ExecShutdownNode()'s T_GatherState case, and delegate the responsibility
of calling it on Gather's children to ExecShutdownGather().
Alternatively we could make it a full-blown thing like ExecInitNode()
that every node needs to implement, but that seems a bit painful.

I was thinking we should just switch things so that ExecShutdownNode()
recurses first, and then does the current node. There's no real
excuse for a node terminating the shutdown scan early, I think.

Or have I missed something here?

Random aside: Wondered before if having to provide all executor
callbacks is a weakness of our executor integration, and whether it
shouldn't be a struct of callbacks instead...

I honestly have no idea whether that would be better or worse from the
CPU's point of view.

I think it's a bit weird that the parallel path now has an exit path
that the non-parallel path doesn't have.

I'm not sure about this particular one, but in general those are
pretty common. For example, look at the changes
569174f1be92be93f5366212cc46960d28a5c5cd made to _bt_first(). When
you get there, you can discover that you aren't actually the first,
and that in fact all the work is already complete, and there's nothing
left for you to do but give up.

Hm. That seems a bit on the detailed side - if we're going that way it
seems likely that we'll end up with hundreds of wait events. I don't
think gradually evolving wait events into something like a query
progress framework is a good idea.

I'm pretty strongly of the opinion that we should not reuse multiple
wait events for the same purpose. The whole point of the wait event
system is to identify what caused the wait. Having relatively
recently done a ton of work to separate all of the waits in the system
and identify them individually, I'm loathe to see us start melding
things back together again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#37)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Feb 16, 2017 at 9:08 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Feb 16, 2017 at 3:36 PM, Andres Freund <andres@anarazel.de> wrote:

That's it for now...

Thanks! Plenty for me to go away and think about. I will post a new
version soon.

I'm testing a new version which incorporates feedback from Andres and
Ashutosh, and is refactored to use a new SharedBufFileSet component to
handle batch files, replacing the straw-man implementation from the v5
patch series. I've set this to waiting-on-author and will post v6
tomorrow.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#39)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Mar 1, 2017 at 10:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I'm testing a new version which incorporates feedback from Andres and
Ashutosh, and is refactored to use a new SharedBufFileSet component to
handle batch files, replacing the straw-man implementation from the v5
patch series. I've set this to waiting-on-author and will post v6
tomorrow.

I created a system for reference counted partitioned temporary files
called SharedBufFileSet: see 0007-hj-shared-buf-file.patch. Then I
ripped out the code for sharing batch files that I previously had
cluttering up nodeHashjoin.c, and refactored it into a new component
called a SharedTuplestore which wraps a SharedBufFileSet and gives it
a tuple-based interface: see 0008-hj-shared-tuplestore.patch. The
name implies aspirations of becoming a more generally useful shared
analogue of tuplestore, but for now it supports only the exact access
pattern needed for hash join batches ($10 wrench).

It creates temporary files like this:

base/pgsql_tmp/pgsql_tmp[pid].[set].[partition].[participant].[segment]

I'm not sure why nodeHashjoin.c is doing raw batchfile read/write
operations anyway; why not use tuplestore.c for that (as
tuplestore.c's comments incorrectly say is the case)? Maybe because
Tuplestore's interface doesn't support storing the extra hash value.
In SharedTuplestore I solved that problem by introducing an optional
fixed sized piece of per-tuple meta-data. Another thing that is
different about SharedTuplestore is that it supports partitions, which
is convenient for this project and probably other parallel projects
too.

In order for workers to be able to participate in reference counting
schemes based on DSM segment lifetime, I had to give the
Exec*InitializeWorker() functions access to the dsm_segment object,
whereas previously they received only the shm_toc in order to access
its contents. I invented ParallelWorkerContext which has just two
members 'seg' and 'toc': see
0005-hj-let-node-have-seg-in-worker.patch. I didn't touch the FDW API
or custom scan API where they currently take toc, though I can see
that there is an argument that they should; changing those APIs seems
like a bigger deal. Another approach would be to use ParallelContext,
as passed into ExecXXXInitializeDSM, with the members that are not
applicable to workers zeroed out. Thoughts?

I got rid of the ExecDetachXXX stuff I had invented in the last
version, because acf555bc fixed the problem a better way.

I found that I needed to put use more than one toc entry for a single
executor node, in order to reserve space for the inner and outer
SharedTuplestore objects. So I invented a way to make more extra keys
with PARALLEL_KEY_EXECUTOR_NTH(plan_node_id, N).

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-shared-hash-v6.tgzapplication/x-gzip; name=parallel-shared-hash-v6.tgzDownload

�Mj�X��k[��0�����8g��.����b��Ll��Lv�y�4Rz,�u�I���u�[�$aO�>�d��@w���U�V�Z�Y0��p�H��y8l��]�so�?���V�`_���lu���������A���nuU���;����5Q��"I�9e�����v�l4�x/�0?�?���d�a�����y8
i<oL�I<h�A�����Z5gA:�[����������0���S���~����/������4V����8���x����*+x<�[L?%���w�0=��.T�/QB����x�4&(�Y�'��G������!|5	��`:T�Z�t>�#&2�H������]�(��l�7�o�M�b��8�)
����T���8l�I8T��
wuu�E���0�1n��'x0���T	N2!���<���]�yB��6�A��{j�i�����	{%��P5
aX��;�d�
�Ha@���`~�0sSzn� `���������7u���tV
;e|��qP'�$��	��������7/���:��3j�]��$�{�#�A��H�V�sa$�F�W/�����`�B�h�	k�*���
t��v�����qJ���
�M��@�.6o�i8�%�a��@���w��N��b~"F�4J`�j1�����_n�"�U�q���d>���
�����p���S��7@����Y�Vy�����������k��C8���m6���l�j�z�?�G5��n��V5��S���b�Fg�m�?W)bv?��T�Hm�6L�������zH�����q?������t0��.����B�>�T����u"{E�L��`�~��;gk��yJd��~�����B�\�����m��]���~W+��!��G4����\�&�15����^���a�i>�u�/��\{]�`�8#p�����������I:_L���~H3�6N�WW��b�4mKB�9�y���hP����&��`LSN`�H����Z��S?B��Kx���ULC��G]�F�v��Y8������������x����>5��+�j�&0���Z�DD7���~��/p������b�?�*L�5 �&��{E$��V��`ufq�DA��y�X<�W��E�L<bG�
,u}�O������1�6,H�Q���%�����}U�k�;f�^��q���,l
��jJ'�S���}4L�d�� �9�5�qS��s�j�h�����������L��O�����Hx��l�]"��Yl7�M��!�Ik�	�t$�{�\0}PL!*������<�Sh�cZ���i`�g�b��������@�_2���������?�=���������S�����������]l]y��#g"[�
���vz��7M���?������B���0�lf&������uC&h���z�/����������oO���?���-�������@�5�!V��Us�BT�]�0A��W�8NB�3,@oO�=��g8Q��`^���8�A_���}"I���6�~A�dD�E<�����(���u�F�B4�m�z)�>�#�����%/�3�����q1�����$+��]��"����%r�������qL=4�'����w.�2�$#-c&�V.@���=���x��~�����x��m�4����'��?����?���h<��\�2��h|�V�?C4A�Y��{�|��H���3���x�;b��W��px���D�$@�(%M�_R:A�D�f��������q8-�m4���e>����O����l1� ����(9���;e8��
'J��"����<a@�	8�|�
2re$�1�����}��^���[&f�LY(\�\�"�Y���-������E_*���������8������^9����]U;j�s�%���<-�����k�;��_�i��U��Jd��>���q27�L"�{[pX�|��`�AE'�,�M����]~�'(S��m9_.A#W��az������u���z��v���Y��t�xZ9�����h~��G�EN������fZ��)J���Hs�������(^$j�t����@�8,���X�m�$��
��m��j��DzG��x����ipw�y�K��ws��8~���B�<�������}��H���}"�E*�������o�|�C$i�M~zay��� qUn�c�y�Qg������g��0]����|���	A�Q��{���A��z��!R�Q�A�m(c��P���	oA�������t��u�Ho)!|��<*:�������_�:�dp����,���t
8�+���/����C��E��>9!�@,oV*�r%��P���C���$G�O7��]<F�����F8��_�-��'l��
V��V���Qw���hEQ�����~���
�����
�
�U�V���M\~��!}���p�PC7HU�o��F�d}�>��9�<��V��P���{KT��!���v�vaoFe���)���mc������I8��"��Hm��������;|
��-���������&@2$��L�q��K	i9��y4H�i��.#�5R���aA�?`A�{76,���r���(F����h�n��|o/�4�e����}e���g�g�g�=�Z��>����H	E4�g�1�i�-����X��������g�'Y��o���������^;g�o��m��W���.�k���F]��^�K6ihS�1������a�)`�����z�������(Z���!yX'���i!{#4x������V���
�&���\�Ff��K�7+$��M���&�F�|7�� e�D��S���72<����F��^o�[�����I���]s�>�������XY��z}�N_,&7���"��pN~/�"=��T_@W����+_�����e6�7�v_6���Pm|�� �X�-P��A�J���~��~Oo������h���?���wG�@Z�4������"%���<�OEm����9�|w6����������'�(��#t���A��-���������zM��/���N~Q�SW��_�K����D����
�tg{{�^]���&CH��Wx�m�Y�E�ZE�k�=�m�lWl�wy��ol��.��~3��$���������;x!�U����	��`������4��Bd!�
�CD)�7��5�G��`��H"��AW����[tVcI�Oa	�)�`���x3�B��x��H��pz��_�6_
���7�'�y���NN��l&�e�|�]����I�#�Q�G����������eV	�M�f���V���g0�����������>Yk���s�u�����?���,��w 7�n�'Q�883��K�e��bh-C�|��>�(��c��!��F���������*�����!do/���9a���h#��A:����,��q
�&�q��h�e����
Q�c�|�a�K9�28u5
`^��+���76f��e��H�~����\�_Y��l�O�n����;P����uF1~�F���Rn���0��wD��eE�g�~B�\g}�$���xI ��27�~�(}�&�oH)��Y��G 6�S���4QX��ek���F����p����Q U:����]0'}�,������g|{�q&+%V:F)qO���?v�O%�������P����6^9��������I�:���o��2�$ f�U�g
����(�qZ��$w9V=��z:pC�{� =�#�F������������!�[c��1���T���u�z�1��U;���Jl��������*��]O����m�����[��=��e��{��������������f}�YE�p65^�U\�B_��������Qys��������I�c���r�_O�1�G!����f�����L������O����H������b�/�ADS��(uW��[)uE��lv����S��ac��]�_����a��]�u<�����dt�3J�tv����I���.�y��=��t��+g��a��
��t��������j1g��"��=f���b���"����X=v��N������~X@�|��!�����L��\��~���CL"�o���d��|�u���,�2�����V������I��"��Z��&�_�X���|��(�������wa���A�X�*C����x�/���s^W,'���[Z�����(���Q���<�\}�������qo���I{�D��������PI0����v���\�
d�2��^�Wo�����!K��Wn���v�d*z������IR���1��)@.�����!{�AmM�iip[{x7���xRm��-�oJ�[�#X�v�C�Gl�0D����7(����L��D������~c�f�^���|�g6L��c������>��������g�-�l%���-����
N���&���J
����Q<��*&���+]���.r#�\p�G?,#���~et3K`�Fb/����1��t��U��3o:n~U��R���\�q�m#��F����$*�_{Nc2pKl%�6��<���=w��&fN� ���X�mgI�q��q����������'�W��2�J�[��PC���e|<c�{d���N�f��RU{�"m��[����p�K�'w8=������^�7���"��JB�B�h����8�����������f�5L�����KM���i�=*k�,�a��#�$u�����811���zsN@`t^	���Q��CZN��D�w:�q�����]4|��;�����y�s���[�������#��&�������k	�������6��Z�=���}EQ��(�������$��h�>~�^g����\]z�����%��,����qp[e���D������R������F{d��QV���_�G�&Y�;��SE��/j5d��=���"��e�L5}	Q/i9	E?��/���5+��~�p�a�����jlu�9sL���%dO=2�V��i��(35��>��3G	��������!>�xSG�a�*�20��c&j������%b�yf�`��H����({��o�e/����A�0�t~3/�CJd�?84i��8�����5�z#�x��b�c5���L��6s��=���4*�H���1�p����_f��;M���;�]�D��+l����9Dq`��*����~��u��������&���p�����oY���V�����V��������W�w2�k��2\�1���D����@��#����\?�2U�����0
&�@Qh����RL��R��{�>�H<�7|9��S����d�Tr�����~8���P4&��.��&Wn��G��@�����Gx
9�����MV�%Bs4���/��L�
g�`f=��Z�}�9�����qt3�p���c��r��8��F���;�!���Y�D��a���=�4�G�����+��+��������]��Fs�+�r�a��YD�!�k6w��6u�0��l���i����8�	�������Wb�L��d6n��+�������>V�I�Nl�$�<���w�&`����T��y���I�P�'�����o�K#t������0m�2u:��EAsE��'��5��D�Q�7�	����l6�^{�?$����w���x	���FZu�����PF�����/u��a�t�2�_ t����N5�9�7��i ��gd�q[uZ����'��<���[��:
?�-z�	��w�x13�3�
m�/r���1�	�'G6�q��i��OSL%���P�GJv�k���<x�g�
%*�����6P>$�%$�5�;�T����x!!��BI�('#vh!���xd`B `���<J�&��Y�t{�E��YJ���T`������������������IM>���EB��5kz�;���=���2�)��/>����q�8��F�������������P�8�x2[Q��)4&�Z`��@'@n�j&
�;����ps�����4���	����!=��	E���b���q�?����N�u�"�lK~W;a�����=oyW�h�6|���j���d����<����������m��=��<}���(�s0��9�����?K��e���4�������-.��Uo*V6�=#���)�����M��;5��pS��{�;KI��{9�z����x#R������'W�E�;�d��C���8HXt�	Jg�x<������
3����i�#�t(�PG�8A#�8^�D�C�!��Q�@���Pf�������wvf�V���{!��D���fa
��sT:����8{��MZ�Y��]�m@�@'��,g'Md58�+�Y[�cM���j>�����<fL)t�Y����S4����:7����$Y%��DTlI���n'�h�1���r��BS1�C�8a��i��*7�gk�����a��v�X�M�*��0Qu�0J��Q�e�������U��K4)To��[2�����U`
Y�1P�A�B/eLy��!�M��a��i�^QC�"�:��s����C�������������;�47k���L�V���B�9�rC�3Zc�0��7��&/0%���Y�5G6�,���%~������Ln��=����j������iI�CQ���A6$_P�X�����|KNI���h~�cwh��
�����]����V�q"�
r�^a�c)��!���)���({��������.AEI��"+�S���(�{r�=8�C�!	��E���r�Z�r�e��Y��"�2S9�]����W.�E��C������0��b��t���.F�����������)��S���`�e�{�"�����=P����`�5,:Qyze�$Q4���������K�/$�h�Q��>���!���FZ�2������a�2�"�nZO�S�t��������Z�XEc��ZX�8��r��\6�j��]�����v�	��;,�X�4�����B�!5fa����z�GQ��
:@��8��
^E+��� �����DmI,�!v��uM�lY"�/��f@Z��'�G:(�C��������������Y� ?��!���@/�4�\��4��m��r��/S�r�����R���u���A��Y�� �{���tRy����*�)s/=`�;F��-�u�T6�c��JJ��p4l��R�h9tG!������w���x����W��V��7���>M�:�����������Ds���ZD)]��r�\�JW8��X%_]���q��E ���g7H�I4H��-�!�Z^�Io!W�j`�'#\������J��>mc^�?�i�a2C�:A�X��>x�'����]H���K�E���9w��1��C�]����
��v6��.W���"g��Ixz�"��@�K������B�%Ggq�J�L�s�����<)[�����0mP��W7���M�,+�T��;�������������_��6�#�<�z�`������~s?���5z�@�n�/677_i������BrI�m~*��������_q�$-,dEv&���L�z(����:�8�p�#g�:��`i��
��Q��|@�G�90S$���8�(���
I��}mn^��������h�N�2:w��CS����@l�����2�E�^c����\a+���f0lC��+�eU�eZR:%N#�?t��4���P�~����@����2;�bF�U��d������z��+���|pg��
�b[���two���O���9_��? 5?B1
���&�o��;b��E���T���g�����P6q�`u���l;.�d���_�C'k��
��"�k~����sr9���%K�cA0%-�������(�y�����z���������t��eI���+L��-�
H�at;u&�t���|����^%=����H��dT�@�'E��o�c�|U1$������e��=X��o�/~g���$����<���G�,G;�k�u����=�=�:B'_�A����O�p��.��[�Q��`��,C�zslPv0L�,��x��t2 �}Jw���I��'�
����������;
��2�p���]�x�����
)�|5�cs�X��t����M�W���������{�d
�p�ce2spJ��4�%�0%W-���T/��7^��0���>��|���������?�g���G0C������<54)�V�b��^Qxca�Q{p����O#XH^��[�Z:f�	t�w��������S(K��v�{DY����xH�y>�u%�RX!�e����X�AKHYkq�3V���MR����waza3L����u��&���!�-A�:V:|
���-��� I�3��bB�
�|0��z�HQ�������������raQ�
�j�Evv[<~`{�s��GB=�6�w\[�c��?��e(
��H�Q�c���m�7�a�{�l������U.XyPK��������p�Cy�����T����I_����*��l|��`���'x=���)j�wF�������}�Y@w�~���l!����&`�����L~F�^�_��
i��A������G�r�VdY`KXB�9%?� h�i�l�\��1]�x��&<��rY:�">�HP��$XqSn�(�Y�8�M���������o��.5/�6�I'I��9sy�r�aG}-�o�����/C.���bt��$!���V�H�&jG������u��^��&Zc9���;jq
���>%[�_�m��|���:G|�_�K~�M��=r�m�1�d�=������^���fo��a�,��!	9�m
�ivFNC���������L?\0�����f7��0���
�N������G��������x���+��a+�������r?�
�=���=���vX�� !���oC�o�^8����������fj�:v;�ECd?��%0=zb�����#��Q�g5��� �a@o\-FG���?�
?�S��+����W��-��5����@?���h�J�R�m��Qg�����3����Td������])e�:��*�3. ���*>]E�f������j��5e
s��l��V,S����n������)�v.[�����W����c��(���T���Ph�}���6�{�7��qsPK�o�T��"!�o�9W\Cfe*���W�J+Q
,�cT1�*VNj��l��*����I�������.�^�\�Co����2M�����A��8{�^p��HsY`K�.���[O:���������
UA%��yN���hX�VA��Y^���/N�~|��/���)���0����>p���p�k�CEkP����H	��R�e��8)�]��e�K(�W�����tEMw�n��P+@]��{�~�u~E����+��]��JT� ��\��_0�=�]�<����6��pV��i����nk?����v��e	�%�o
�����k&a�>\#��;��N79���W���� e]lW��U�������������
�B�?_���r��n���	�U����[c*�q(E���W������t3��{�^�������0�����Z2Z^ET`K|���9��.N�	�y��R���	�F=�;U��6�S����&�(�
����i8�$m���MJ�L��*J�a��h��q�v;d�����=����&\yDc�l�:�8?�9�(t"i]"�On��(mi��N�A-�EE�v�7�d�� ��*P+x�u�����r����	'�`=Hc��	'�s2�a�����!V��`J�Em�s;��,�i�����K6�h6aO9����r���G�&%����b���`'{L���b
"9��bx\n-H7i�h9��[���d���N�������Iy�%�����*�}�����R��(���7�����tA�	��^��[��/�W�4"3Wk�K���v�u���vc���[*��!�h��mmFY�cl+�|��G>aV��*>�Hs�������d��"l��W��n7^0�%13��9._aP:���C���m��?��OV����P
��l����y�F�����	d%M�}��`Ha�{WoZ��S|ya�^�7&=p9(#��7q��KY���G��>4������;����#�]�Qn .�$W��M�����8����$���:�����L
|�O+d;=��K�������vf.v��kHIPc�g+�
F��X����`����v�%P���u���P�ZX�����G�U��"f7L�����{�C,a���V��z��~�lb��~�n�e����P��!�U����"<{����9��]JL6E��������Y�.Xh�Z�(=���U�R�(k���R|���w���~��|��9����4�^����\b��K�>f!��I�����o��8����� �E�l-,���!4e�'�B�G�����5�K�����_q"�����N����3>fAp��:��b1Nv1��]~,�� �K��q����.��#3�^W������\A}���#��!p	�
�"�!B:����%f���,�Gy%e��]K�������:����G���r�R\����h��N���l�_O2��.JsC�
��U��|=J���,�I���!�s�/�!��O9�1'�����+�P�>*?������a���
&�r6�u�E�����U��E�4�&T�����.�%�N����4koT�*$z���^arC�J���E�����gRY�3�jZ�>F�4J���8@���=�Z����g�F�)E�{$��P����p2bO�NV|;����R_���p�-JK�R�de�����F�
)���0/�
E~��g�!=;����m�M�;jS]�v�������J}�������ed5�2&��<!vEC��<a��S�������ue2M�V]�(���jTO$�5��-%j�4L� H�@>�0$]�MN%
��+<���:����9#���v���Q�M}�q��J0a��Q�*�m���3;�vh�t�I�M��P���D����_Zu���m$�`�������W$Rx�}�����������/.�/�c��N������)c
�)�=����_T�l�R����W�g'�nG&�9|�q��s�$0�{���^�rO��Q��M&Q�'����?�]�������v[�|�1 �`Z���v���
���{�v���L�������x���E�A��&1&-�}�6�W\����G~a��|��gW�����U@Iu��L�����E���L�.�|~quv�Q�%��G����+K�(�6L�%4�����>�
O:����7����>C���O	0�����0�Y��),J|{�L���e��,f�4n��$���+=$�g������w���g}�=SyJx�p%q,/�9��3f`��q\C�Wo���������X�sa�Q:��0�u���W���	���D���+�;���ue�W�%�jj\
�L�T
�&X�����i V�\�#��B/�����md�oQ���]��H��z}�����Z����������S]&@�����F������->��}�2�b
nA�r����@�T��6��i�Sg��6�0���(�xFBroII��`��K�Q�G�5#yY7�p�����J'������>�S�K�Qv�U�Y&���
0C%U��r�������|� ��h����g�\V�,�e3�SZ�����������&��j��������*�d">�3f\�
R[XV��������g����f�K�fJ��r�X���1e�vEuu+t,>�����B�|�'�6�x$�7V"i��u�2B�9�j��q��;����3�u}������oj���i������;T��_���V������>��Y>B�Na�[�lY��D1��c���E�l�Kgt�g�*���� b.u�d�lI���
+Je��q�_R��D��Z�����E\��}}�����>��t^����\�-8s�s�3�@[;�YW��.bo���$��!+P/�Vw� ����)��U���T���2UGd�!���'��4M,���w|rF������0��L��/��v^�C��B\�G�W���mS�l:,��^���9n5sa�rT��g��|O���5��pv��	_-�"��|v��2U�I�������:=��N��o �9�>i5P���(�vU?�� �)���-�n
�;O����x�76����~1�������x�v��>���������=j�5���
Q�����y(%���n(���)]���t�8�0g��j�z�,3��3*Sqrs��M���_�B|u�t��J�6�C�)
��P����A7hA����P"�qE�*��>�w�M��H�k���|�L]�u���T.��RQ������#�g0����
=��Ws��H�TE`[��`q�]�O�c2;!����b����Q�w���]}S8|�P���������R��Z��T����x
��?��;�A��p�m������.o��P���3�G^���jrZ�S�I��?�����NE�x�����\L��y�;ZE�9j��d�{�uY�w�����ytKW/��l��)z��e�@��?���C�N�bD�d�h:�o��^�"K���H�;��hoIZ���.���!>��F��	�v�������"2�s�N����[6(�M(SBI_;���g?�x�����wx{}���E��d2'M��8��j����.7��v���a7������� �g��������0����O�^~�n�h��<�"o\�����sK`���	�D�����;}R��$��w�d��]�H�bQI�	����A�yh-+�=�6��0��j@�p`��:�}S�L���T��������"����@��g�s���%����W� � ���T���5��/n�*������[�e���;=U�o��T�k��
]u����)%[:��V��--�J�;(��y�����M����'���������I@uaq���������&O``���&}��)�Wd���]U�+�c,n��:����vK�?��v�7�������!�(���������$����}Y�E�]D_�j+9�6��q��Y"��&R�OIYx��Iq `<�<�U��6��HM[���LI.M���Ib'I��C����*�I����F�8����������8aTuck�	L?mn*%m���:jQ���L�.W����9�d[�Y��h'��]H%w+,GX��������	tQ;2P�Bf��j���jJ��I��7����kp��OOB�%����Q\���\\��-S4i�j���5L r�J�w���>�v_��������v�1���sNzn����'��Z7�q��q��{���o���c.����-�����I�����c#f���|'�KF��.�8�J�dG�m���8��o�v��f|.g����[8�:�M�����4	>�����eiK����m�]^�����p>�(||���)���<y�{����Wgg�g���N���vxb�����7q��_2u�t��gq���6L��p��i�~vj������"��Y�cU�Ez��oZ����QZ�z*����d�~����3����j�V(e�&O2��`��H�S��{�)�"��*�L��z���_�?}�M��.I}�B�+�����5X�-8'	�n��%|��<����<��$���wRT�i*AH����\t����n:nK��8��o
U��$S���6�kB����h���dP{�<B�'�Zb(~P-f�\���ra�����G���V���{��\�PD0(�}���e��|!��������$�W{���7�j���������Rx�2�%��������
�����-�'b�<=���x%3o�[��lP;��E�tu��_�=g�d���VF��~���&s���*c�c��:���
�4D+aY3(�j���h��!���}����������,���dq�y�@n*��E�aEw�	����{2B����x�J��`;�ll�g�u��&���-�������d0O��h���k��A�y�o\S�c<s�%����B-mp�2���P})�;r,J���9T�Z=|��WB�L�G{[s��:�L��d����&�$Q��#�N�����OX7�d|
��<��)$'�!������KJn������V�������-WCd>~M��Gu�.���e�EJ_}�0������?m�x�x@#J���C���x1r��{J���J�e2��N����[5y����
�z���������e���>�
�=���*��^Fr��T4E�~���5���W~�K�����Z����!=j��kI�P�E��#��6j�C���a.��v�[C�'��z�1��4�����k���"�LP�J���?|V6!m(������^����KGl"������CP�`��E4��L�C������u@���(��e�����t�d�fz�������k,��d7���������8���6�1r�g]����s+RJ����/�m��~��b2���8��7�TL���/���q?f.��V�$&&KV��aL�=��������mGnt��(��J��rf��� �:�CTi�S�D���Y�J2���C��Z�M"I��wi�' �"��JQP���>c�m���8�iC�zTrG'd�
jxW���z�����Q����%�j���z9���.����hL�8,::�1�x���L�H�y|��w��W�"�e�������������5�Ek�)�a5��Crp�Q��D?�7�x�~Fq���c�!2y�E��'���D�����G8�a��+Y��2��U/	6+��{
0�$+6��A]�K�t��X�
.���Fd/=�m��p�zSK��?A��@	f	��K����I�R�$��7&\�~���x'�� ��uE��8X��(��f�)��q�4���knpY��BwO&�}R�^��?t	���S���GDN�Z����_������V6��Qy<u�x[{%����i{�[���Y���Eh
h��o�,;I��T�W�p��f~�6,��\b��r�/_����D��MY��\�����vs���0:�M��x��P�����(&�gKs��#�����h5t�l�*�5�������S�}�7p��%�\#SgOoA*"�}f�<�\�,��� e��;3�2��-���Qm�
�R�@���1$b#(#�P}b�a�F��"���u���� Lvr��-��8��A:��Nm�c)CG�)��2�9�!��nY�I�+�#�a����}!���OPrWs>�fb�g�}��[G���;���{�=��h�Qh�c�I���1�Js8<�����
2�0}��`|���=��\�������N�������$L�P��}=��m��0���2��+����]��L6I.�A
�n���B�=�hs�)gGdN1��76r,/����|���@k�o.�Z���#�/"�2d2���2t��9�����q�����[%%�kqH�����Kx�����8{��d�X�T��|��e��9Y!�v����gk��j��4����a��i��p���N�2$��zZ���z����y-}J��r���c���	@p�*%��d��?�k?eN�e�#��_��*�����(-Jk�S��M1"��M������.��-]�2k�	dt�E�8���/������Z��>rk"^�� ,6��Yl��	��_��X�����W��+���cr����+:&
��rsv�se���+�����I}:@���E@�&�.�q�n����+��5
����|���9K:*�j�5�����!�Ma��A�����3J��I.�6�����n��������|L:<��4r��Q�����Cr�F>�����-
*�����>%������������������_)9�a�\v^:k�z��`���6W�i�6�N�n4�5�n�{o���Q�y�_xRb��]�Q*����
b7*����Q������c���������dl]�q��a�f�"
���`�I[���9��(w�{J�JC�cd�-R���OH��5�G%������`�����
F�O�{vL`��l��?H���c6���A�N���&qlSr��+���v�Ws�/:H\�>���Y����$�k��k}�"�p\����c�|�_���6x�hZ�Ea v��K���F�H�d��p���8_L3�����D^���'T!O.�jU�C����k�z4.�����q��V��+q'�XbV����U���8�k�O�Q�?\�+t��&��{q��[O����z;y�4�#&NVQ�8Z�f9���)����)oT����PjN��	],���6��=4 �7 ��/�O�y��-3�����S���&U�<7���{)'��	��������
���Sw�}�Dz�U��"�U��.g�t�����p�o]�������D3��|�?Y����y���_��tl��F�0]�f� �=o��7��m��4dm��������G�>�V#��r��YVg5P}c�*������|�W��/0X�E4<l��
"�J����t��.��=l�;�����xj�RO�/�n���phw����\��������-Q��6��JT��.�K��{�������o�NN�.� K���Hwk��L�^���8|�~��G�f��a0-�#�X����������]l�������g��>��a��YL������e��n�H��p�-	�1�����mT[��b6�k�r������v@�Gz�v��������c~���k��a6�=�T��>�7��9�h���T�X�Cr��2gg��)���Ka��L�>�.s9��w|�9t�'(F��m�*����`u���U�5���KG�J������.9����I+��l���
�����D*6r]�h�k�-g��>�1�S>>e�}����yCn���a0�'B������=aBOa( �MS������QX��GX�k�_��K���<w5�[Z�Y�:&E�2�|��FkwtHj�y
<b������|�\���������$a����Q��5��o��Yw��i�V��d��g+�|������L-�\u�s�
�)$2���dmX���}�EW�QG�#1��tw?��a:o0'�f��u�u������+�|�H�n��`����V�(�h$:�|�2�e?�_-���E�C9��3H+�}���n��������
��V��	wz0E��0�8*����)..�!	C��E��(�����n�=���2�������"{�7�v�3D����#�b�5��������T6UX\d[N�S����H�!��@G��aN��M���p��s@Fe����-��n�_>����L���f��YV�6���!HRq��1OFU�%ws*H�U���5$���rf�r^�K����i�/S��TL�sf]:�j�f?��V�R�a��R�;���>_%�m���[[ug�,��d����J2��s�Ty�U����E�9�����O����<�kp��gB��Y�������wg]���u��U�m�:�5�\vb^,&�c�;5ET���������r/	����X�U�(��~�{�E���aRQ��xYg?�'�-��y^������Pw�Z\���=�Y\��t��Fg�\�q0:ps ����8�G1��J$����<�����b��)��&�n�Cs��>��z�F��_Z��en�t\j��x���U��Y���	�EI�J��&��I�t��;@E��s/��	N�i&���Zy����4�������Q�eu[����3����w����V��8��v�*���i�Z�qHeT������'B�����G���$�_h���Pu�
h]
�����Tq k�n%�p1G?
��Rf��cC��������VAz���9=J���|�P��-PW��a�4������Rn�����W��+�S��Ez�F�f}M5�o��g�G��AK���d*qRsP�!u����]�<��
���,�*RV����X	K��V�Z��b����T$��h6�$a��ij�m���:�(f��p
pj��l����?��Dvt����u�}^i�A�/�#1�<F�Zb'��X�L�<6k%������;u���>������^�_��_��wv��������F���p���jY�6�����^���Na�	��+G)Q�I<�F��gqNc��o����u�[�C/�U;�P�q��!��
o�����������O�cx�E��:���y�T�I�'}u�uiGtZ��/���Go|��T�����D�,��V~ALy=���;��&b�r$	(��s=LV������k����ef�k��I��P���H���U����?Hg�27$�^�hSC	���0)�'��v����Ph<W�"�Jv�Z������5k���J/P�yR�K��S����� W$Jk������d�`8��\�%\�$�V���v(:$M�����qN�������:�
�������z�lG�`5@�=O�����-�uE����%�4��R��$�8I�
��*z� q�fs���\����;���I*�I:�L�,.O#����)�6��(�S�#,R(��&��3��g4��#Q���<��g�)�h�S�A�/�������J����5�h��K��Q,�c	�. W��|M����2��XX����B�C���8�3m��r1\����T���[�]@������{N�J�s\��K=HkS�h��:"N)�� YC)|N��r;����<�����r�����s��k����7��\(`{����-���ti��R�) f+�(&�V��O^I�"��k���&Om�~!5��U�����o.����{*-D)0	~�;�����H��.i��2`S.��3�������YY-����VO������]J�y�����;p�����f�����R�;;=��[���+���"vl�hkeD�J5�^�Wm
������������O'l�(cd#����0������?�ec��a�5����v�����q�)��?{O����B�-�G��r(�GI�o��_���C����Ge�|��WNsi�N��OM�Q>�yt{��63q���� �$�D�r�2��2��a!��e��Ng����{)7���	 � ����� 9lX���-�XI.�^��Y��w�����R
��VB�����2(�9LldA���8H'ZV�fT����L1>/:�$������y�V��^�["r�����{s(��"��0��3�S��u�|t���I��,�4lSo����T[���&��6x�b�>�<2�KT����9n�yg��u&�R>�%s�p]^���K_f����_���L����hW�Y�M&����Xd���'�%B!E����������-`���"d~4�N�:e8sm4��ck�r�Q	h�F��H�rs���2i��#-���#�Q$�y]=%����o�b�7N��	��=:�����u�%z� ,0<�0 8��d��r��M���w�$���0�7����@����l��`/�{6�������.�z�Z���%g�~���fyB�%���<}��d�GF"9P�T$t��"�D����YH'�����^3LV�u��'6�.h,�+�wyL��9�4^�g���I�S�r���;�f�?��Y�e��2�q����\X-���������6As�%;���2I�Yc�	�m��!eF@@i4�8�	�_@t�E��fe�L=���f*���j(���vb/���QZ]�����h���_�S:v��Q���Pf����b����JQa�
�1�j���P�r\�����d�!iaEb��3!�y9���N�d����=�� �/[O�t�M�Q�T�>������{)8�F�������M���K��Z\���������yz����e�`�}'�)������|���jQ��R2^�]��9�;O7rL��	�}�����+�_�j�����<Od5@�|F/�U�A��p	
x�+Q�3����I^��Fb�����k��q�� ����4B�sYo�J�M%P�\_�~�C ���q��Sz��J<�M���8�Zx>Db+"�Rp�*oO�N��{�����s���e����)��Z�
�s�z������u[�������wK�m{�[-3N�K��Dq-���+�0w�"%�f7+��U�]��0����j�R�v����I)_�@�P�-q������A�#.S9���H5� ���|�+��]S�����Q�k��*�6�qr��uG���5�v���j���6��j�m�j�{D#c�*v�d��T���;�I��������o3��A0����}>�E��4^��8�����^���#r���2y����8�dK1���� ���4�~M�'�Qy��9;��c�rH:�
���u]��LIR��.�~�	�2�7tmV27b�3W�
GN�o&������;�������Xmr�
�e'�������o��?^�]n��?���x�
�g���hE^����$��&��+#���@R�3�|�0�3D�6k�qt����O��8��uuE�����L8 ��4&��)
s��~���7������6�Fz)�/.�R�4��������v� ���k��l�����c�����l3}O�gg���LJBnk����g��D���Ce�sUX��X���v��`�g�&��%������{v������t���E3�S�zu�I0�R��%O;�"�t,$4��.u�M~�|����;VoS&\]X�~.����n$��J��b�j������X�)��
�b� 'c� 1/���YjS�A����a:�3����&�����y�d�2�-ZQ]DF�9A���F��1�4��
����V��^$G��h=�������������Wo�Ns���$D�������5bl�U�
��.�m,���b}r��*��0��|d�u�}~���F���Eh�1sFg��%��I�_��M����G��N<��
�u�����<���	VL�3�����C'���%|��k����F���.��5Y��o��p��LQ���)�rW�gv�
�/�"�S���"��:���n<}�o��d��pS�]*v���]�Y���q���=�	
Y�J��72����"�Q��c���O����e��e:K�]DA~�,��%a0P6s��������Jz�����Q(�����Z��!��r�C��S��i:8AO�?�]\��@�����g�,C��N��`�&�b}��k�VW�B_�����R��Mq���N�/��W�cs"��R�L)���J��H��"^v�h����8:lc�Ga�����b�6/P�D�E2��_��L��hGt�C����)n�i|�
hn����e��ZF=�������jmk���e/��g=�H��g��$�9VHK�!�%���&��2�>m�ok�Z�g�ZF9������>�YK��9�>5uA���Dz���l�Ody�L,��U:��WZ'��$�����|�XW�����p���B=������/N���������\Y���`�\�p[
U �'�8�k|��������&:'8��-���3��|&�^}����^���Y�������.�j[;����Ww���W�4����%st����M��#��M��)��D%����?���Ul�1\�}k�\���}`T�q1��6�FQ��M[3��$����9��;�vE3K�ze6�H`��;V���q>�G[�<q����I?��Pzg���S��x!O�9�+��S��|���j3m�l����w��0~����R����U�~�c�����a���|���(q����	t*�� �����B&
&�P����u��C�{���2���Z�d�"g,lT�`�S��E�:���<����[�����T2P�Nh�t {�����%�`�h�D�B*W��h/�M��h��Ac��Yp���jKY=��(�B
7^b%����X���=�#��=��D�0n96�K�}^6�b*3�Yh���jq���������u�m�;��f<�q�)>XA����"����!+����������D�������o���i����|S3�1��	�h�;��N�`�w�]@�5|�p��'?����� ��5���h����Q�'��*f{,��������/�y��'�0�AnVC�/��i��6�
���;�X#z�E+j�!����K���<�^�|�����"�3y��5�����P�D�-)��%�q�i1������2�w>��M�H�+Va�/�b����C�E�h�7�	�3�2ZL�	���'����#3�4",r���i%w?Y>|F������!����4���V*�q�5��u�����Db�$6���;�N�Yvf���P�������W���I�%�\|CK@�X���[xL�x��;�)������H��a8�Bq�bV�������U����u$�d���I���{/&����p�{�l���p��������|��p��A��>6@Tl�kc�!�������_�:y{r	W����[z>���(Q\I�W���a.m�B+�o3���fI�%�h&�r6%���.����o_�o�����^�yt�-C���N�9�����{�r��	�W)���OG�T�9�H�k�/�?\�_��_���d)����������L�l�����O�s�PWO���=����+c.Y	tR�Z
���
7�Nx�l��v{�l-\e���a�����������Y!�/�NN�pN��QN��0N
�.tO-N�_�mmW�XB��.F����J�9�����=���h�����G��=���KP�����������s$��@��
����:={}���u���O �~w���������O
?��������x�f�y��#�������>�>�zsryv�}��H�.���������cq�(��������������P����>i��)%9�r$`D�L0�w�U����]sbd;�8��7����N�s��#J���Vg0��zb_}�8�R���v����~b�^����Y>�1$?cIH]����d�F��5D%�rtTb5��Z�b�8$$p����O�`RK8������7p������#c���I�>r[[�����n]+k�L����;����I�Y!h������6��r�l{�����3��a#w�O��:�!���nC�)���#�;�p�d/��e5��x(���s�	���(w�A�[����tn��s�wGe���2���m�xr�th�l��.��#���-[$Y�@��:���2=k�>cX)+j���T�
`&���`���t����4����F����$�b�<�����
�&H"V�����K����KB�(x6$�4��
z~�z��g����b���Fa�\C0�	�������2����H_�P�����M����[��������(2o�@�:�<?yKW����E�AR���u�T����RNc������1HY����d'f���L&t����jRW�g������)^�qr������(�&1���0LA7�{-��s���)E�h�4
�l�������6�������F�axZG[���F�Y��������
]�f}��=.��YLt0e�Y1��Yx.��K�,�7����$l��kz����A���xi��*b3�tlc�c0'��Ws��>�wS

��;�{FK�)�C��fT*E���e�ii�_|V�<G�I�HU�Y<������r�cTv��'��:P	j��^��z�X)��������s�P ���.�������S�v��1g@�F�	�cenM��1B�Gd�n� ���%��%|�����B��	'9i�A���pQ����M�����3r*�9eDg`Xc�I�&��8_jn+~\�)���b���I�)�P8	)������Y�kSbp��S��B�X!���
�'1���}���$����?�<�%�C����	�$Q��E��~x����3��]�:����dX�:{�"��)�w*�?��1��|���4��-P��S��~��S�H&~5�I������p�����,abB�%���N����p��=
�m�5��y�h+��� ����8NBca�e[���bS���yWh�S�����hh�:X��N1��H�{�*GJ]���|������e���z���S+�g���?�2������dJ��Htd_���I��X������-@f�����������lSN�v����J��$K-<�J���T����\�����!�a������R{����-,Ia�J�F�i%�}���c��VVh)���s����l������h��c9"�r@�hM��K�(0h�q�~�Mz����nu@��3T��l|���D�RkU���+/P�����a��k��qm�G�0�I0�$�+���m�3b�~o����Tm������.�
�9�n���<�Ib4l�~�m����V�d�9��=75�����h���'�r*�����vN�&�mg[���E���kM��4���h�;�G���fso�����5����k�-��US����Ck�'�jez�rm�"�J*F�����6�
<CM�))�H�"��&u�
z��p
tw�JW}�?�M���S���m4����R
mED��u������=X,VT�*y�=����{C7 �b��z��v4��~��u�|�4y���!e�����E���V�e�u���L�2�]Jg���_����J��'
��h����o����'�s�y�W^2xi�����N���T�`_���
�\y���� Q�	W��+l��b������C�W����{�@�����98x�1��`�S�o9n@;{�Y��U�����*1v���v_�v��N�w�z|�C��� |���=���TU/�Q*%��X����]�co�7CNLAD�Gt�������U�i9{����r�*�T������"��ZRl�u�SE0}5�^��f�0�p01��}��/�X�8��4R����#�L�|,�h�a��h��]���<2y�����A����D����O�8G'H`y�����/�gQ�����mTN�vCy��|��l��Qnw��P�}$*������,@g�K�������?��1M��Jj\�u>"����Up��b�_Zz.Tl�r(\���H�h������Q��"t�c�F��.�2��Z��Z�����9G�{]�	�"�B�H���f���Z�������
�=V��K�%��c�u��Q���cq-�:]�tR�U��A���9D|��E�i
[�.`F�4�b�mb�9
S�L�����F��]r���wo�������/����S�j?Xb�'���:49Px��bF�Cq��;/�m=����+O��S��_I(v�,���in����9�~�<����Q���p���n������E9��97���$�GI�21�iP��hN��m���Q:�|�:�|��"��R�����
�iq��9�w-N���3���`�o�P2�4����tL�i�A�5p��Uj��V6waa���v���@��m���|�2:O$-F��Fr�����m.���<<k3?���^�s�2#���&���+s�H���NYf�:)\�&���A#�� �>�{\S}�W��}f�*d��|���5�Q��X���L/%��9�Ml���$c�)�V&��SF ���L�/L9���@=����o�\�|#�Mw��#�^j�M�D�s�w4���@��{:�b9���bJ{J���T
�6����w���u�S�����#�����N�E�����8����q�,�)=�����)l �4B|�(��N,�^*CQul��#K+���V�ZeT�7Y���$O�3 ��_Wu�����rV�0Wd[����w1�����kV2X�`U�,r��v�z��~v��5������~�u_;�s���
4�=�}[�v�73���k�u�J�MQm�"�"v��r#7�]�Y��L�������T����eIg�p:��[�������ND��RoO�������n7�����y98%N�>o)���D7p
3��%�X��H������i�\s2�U��y��c��(;��!'1T����j�3
B�����;�<_�!`>X2R�i9;��f���v�z�f��T�<�C�b��d2ou�`6�<p��$�����Y�y��>����cq/�[�m%��������KQ�R'��]��1�a� NN���V��X�Bp}��@9e�~f7�3�����_��c��������f�6�u�H�03.��n�zn��E�^��6��D�k6�����.�R`��7+���{SL9�w �n����H�iwZ�:��t$�z�����|�G�y;��fqF�[z��
G���H�[[�9b�Z�PCR^�Y|[��ORr���|,�)��E�
��TR�H�����9/}l�Cw�q�f�k��>�c��wZ,6�o�'���l6���|�:��sU8#y]��2�d$)
Y�%<��ZhG���Qpx�~�kZ���F���"9MR(���.e�����>��~4la��3����^�{��a�d����������oha�G�����s���a:�gOX1ds�fa�������/.LvZ2����mDaOX._�����u<Je�2�����VO����]�]��W88������A�|/C���L*��&c�J�l�e �{a��w�����{�G��)m��D�T:�f~C��h�����_�������xr;��c:�(l#� �9l��8�ZA�3�b(yVP����;�r	�Y���St'0��rT>��a'J��S?�4�H[�������/�����Y���-��C�O��K�f�9�3O�bA�%:z���'���
�B�y��AU�"
�U��V]`#C� V�Y���Zm��4�JD�vo�Ud_����Q.��Ax��/'��<-������8u�O��)�@$Z(Pn��)��+�}1�%<��O��'�
e����^����Iad��������G�`Wg�u�Tq��
7���	s��q<�zr�.�:����E�D�>1��F%����V
�4R���'R��f�yM������W�q�u�8}����������G9���#2�0��9V@��2����D
��1A�.�A
�M��������#��bK�h)�y'�Y�B6�A�0�t�����;�7A��Z�%����8�[��3����[��f�7����L���S��������P���j|�|[fUK�<��D����BH�K�Z=���d���
&g(�.����Q�V�`�'��G\]�/�#���-S;'��Vvr���-3�+u�� t;�,�z��sI:���-x]��
�6�]4�m�F�J[�?1H
'�V�����w��Xc�X��Fn�R���j�v�#
���Sx���$
�0D�B����H&�����W���/�j5[m�Yk`5��KV��jt���*1����{:	�O*5��|z�po��I��}d�2�j��!%Ckr����v���k����Q>r�\��%+�
�"Sw�b�C��iKp�*��\�(�H0fX�8O@������6��c�����N��<`i�`�y�����M�j��������p�C����:8���~��_s���\����O �����g?�������/%����'����f��|1O����New8�Q���-����f����svr
�n�g[�h��i<�V�;%���C�(��������]���]��2��r��MV�3�8:��1{���3�,���"��(!Y^�oxi���5��JQK�����9^�,�IbRJO��W��(i����
f��$�-���U�t�
��~���,���]r��� �eE�����&�����~�aaB�K�6�Y���L���z� 9��@dN>�(b?D�I~�R�B��$�X�����1� ���4�@�{G�%0��)��-9LU���e?�u�-��c�l+.�*����3'�����W�n;K�EV%�
��$A�p����>t�	/A��l;*L�8�����L%e/�yGH0F8_���� ��p �-C��l\�N�:x���;L�;��[�����f��aB�L�����l�c4m����o{o��F�����Sd���%�VSv=��6]60��=�]��DJA�%�F)�����8[,��)	\5=s��]����XN�8q���r."ooSB:^�Y���!�h��H����}S�p�$�r�+�����T?��X�7br�l�n�����V�������zkES�u�+M
u�B�FTWI2\[�E�*�+f^4�a\,'j�YG�y7SKs��Z
�`-^���ir���>�<���wo�0���S���d{�^o��h4q��y@��}K�f!�����r@$�e����e���u�Y��X5���1H#N+.AL	w
%�e��0iJ�i-��TA,��Q���cJuM2�lz��%�)k��1W�N�.��
����9�I#>�������y�Y7��X���J���,���^#%�9}��f�l?q
>�C��h�c%�����#��~���1�Tg��+���|�A�_�S;g&��A�5�����?f�h���
_	�A���	��@����Xr�@�_@�fT�-_6�@����%v`����A� my��X�f�!��L	��
��^kg�^�����������U
��-4���W�
�u5+N6Q���5Ji���3�,S�����~���]��%����b����R�n������� �i�4���n���eCSKI]L�S\�����������BD������8t�.�:8�����b��t'S����|JW$}�o��x�	�^���[�
R��r�#jD		����O�����R�B'������ '����r��3pN��At�1�,��&��e!�"��+�~-�g����uP��"-7a�������v��Y����`�����%�m����
�O�����c��d�tx%P��m� ���:�k����������
��+��^g��.1�2��5�"�,�Y����)��#�����R�|M��+�YI{�O�T�K%�Y[����"���.���@�1� O�3&�r>x
p���@���S,d�������$v��V6��������[g��<{��!KJs���f+�+��c��B��w�cev��+P��/�bf��P~`��i4�M��k���^��t������.�N����e��'�&-4��KV*�O�XCD���X�����Q�8�+����f9�s��yqPb����1jw�E3��U���Z���S�h��z����SA��/��I2�@�jQ���t<�W�8<�Yo�K��]����xa�T|�^^��l;��(�lR�u�L����[]l�R/��l����.&����������X�^�H��?���L��C`�����}X��Z����{�zx����"&e�P���2�����c���0���<�8�_���<WH	UX`'��)�pIm�W���P3�����G��V�� �����<7o�ZOlZ�6����5R����B��gH���_��J�������E����!���E���#��M���9�$�)�
R�42��V!��L��A:�x&�P����Q�$f����pcl8'$w���_Gm���C	c<&Al.��Ydp�}�/~�M���F��\Ds3��u`B[�\~_F�(,@�m�Q�3q����F-pf l�3���d�`- 	���8y�HO���U���[L����UA�,���m�gW�lEy!4�A�����$�JB���hf��-�5����hO�g^|��hpt?/3�5+{;��z}{���_-beVE��*B���=<_�_r���$���rJ1E��4��~�����o�9�
�W���'��Kq�)�M���ZQ�w��a"�i����!��g�[s"\je�^�M�������(��rs�#��
�� �8R�A��K������s^] jD��M�|�L�����H����@1�_�	s[5����p:�����Md	���JS��8;lg-#J�����B�K����A��^f��6$."�J6��r�[.�P�t@�s��i��;���n��_Bs,*�Z�_[�Fc��;�~�����6�gQ�"�����dk��&W�f��/���[�bn�5t?���9Q��"�*�
gS�a�mY':@�{C��'��*�$�3����"������w������_r�5]���d6i]ia7W������/l<����������'���`��G����e1�����&����|�dre�����'��<y�
�`	��#����)��o�����z}�����(Q�y)�~V1��G��-�3M�?i��&�{i����in�Jru�<��
:���Nf�
@^
	��R�� �%����\�)h��Q��DW�/�~�j��=T+@��<�^�����:�N�j�S}0�>>9�x�y{q�Z��'�goO/a�<1�ja<Ok�c78��R\��{��N��������G����<�^�xM���(�����r�3����fJ�����{m%DE�p������<E���V�����\@��,s��q?�hI�`|�����y�����������_��%X��o���t�J]�2e6����|���L������%�u�|t|�9��������1�m(�lS������h�n��&����M0��W���Y�
\������-�[�l����f������n*F�h�[��	�d�~����2���II9Ul0(yO�	���C~�}V%��M��1�%M�������Tnv�j���.�����$��G��t���0�$NA	��0����&�	]�%�Z����3�To�����EGc��1] �!���4Mz1~���.�����a�0����O�R���O�Vso������V3,I��o�8��-G�R��(�x~�����#�0�.*w�`��T��(���������j�z�\�Cp�:}���h���@�Tl�Zu+��D=y��lIV7�"pn��R]�]����(�G��\Ri�:�-$j����������V�D����j
"���2��+�#�S�����p����o�J,o'�������Wc!Fo:�}��5Q+��	�AA[1+��uhul�^*��!8`�!&��3/�M
���R=I�c�������j
|� �b���7��(]�?��
������,1�����a�f��g[�,�TuN��8:c����5K��� ~�<�J��&���c����rh��C�"'�>@2K���A�#��&"�5��*�x^+��!dj��/h�
��y�'1�g�b��2�
z���W�������SN���bh<~��
BMk����;M-��]:y�8���9���Hp{���t�_�~��-�F�a��0ICa�
^.��w3�,'`�����K~P���p���+���\��xU�t����n��s���z����U�{z|�xTn�7�N_�sR���D���A��������}-��*L��J��{�y�dQ�J��������@���j�NzO�4W�b�4��Y=�K�����{�W����^���h�w�����~�O���?5:���#�3 ��>�+����4�.�,DIYAhD�J����n����S�����}dQ`��"��:v�%�w��c1sPLb��������A��x�^���5�1�+,�t�-����*�zW�~������`k���W�W����[@�mn�N}z-3!��Q%��cV�E@��L#�-�(_a��M3?)F��5R�s�s~~z����@��I������+J�����?�x�9c$gb�������A�0�?�����U�o�_����No)�S���s�=|l7�U�I����F�)����Zpm������H�+	�-8��~�Y�����Q���q
BKH���������8����JAT�������l�b�| s?w��cY���H�Fk�@8�t�r���|��f�C���)5:�<b@�O��6�Q�W��;98):�;"�"��w�Ud�����T�d�A<F�V�X��X�M��q�/k���>�#e>%�!��a����6�f����p�l�?Z���!O#�&|P�fi����!��A�n��j��&K����U���W��������,;w4 y�v��>�������C�O;
:����������G��8!�sr�9w��W���
����y>�����HNO�� ��>�,�P+-t���5&Y�?����+}��9���Q��cz��=����H�=F�Y+����`L��S�(Fy��K��;��%I�Y��N�q�p��T�K�8����cW�4���[R��Ua�L5@����T���0����0�)]�l�>�'P�<���A���a�1^��1�c��b�K�)�-�|D<t:^���-	�=�����?'Cp��>�
}����>����H:�d%��H��K�J,��.&�5�:@/T�}��Yj�SR�f0�����n8�F�Ldq�V�R��xF�iUxPC����|���1��H���&����Q%�+����R���,8z�w�Xx�0�Z�
�����"�r��*�*�P�U�����9��@Q�)���j��K��x[
�#��a�~�q�i�zU#����lJ"��w��j�d�Y�&��v�,E���0���\����K���$�F1s2�}�<x���/�A�0w[�v����������h8|	Gw,UN;�Z%�z��E������
���b�
�& OM
8[]����*�n�C���>����"%V-��_������o��F����>��������0h$��D���o�x�~!n:��	�S��{��S�������3�,������c��������$���J~����Z*���,yK�,_�v��mv3�+3������8
'�(��37�o=������m�K��6���Ki����h�������S`y#=����n�]P~���M���h,�d?�CH���4��;?
�1�H��}$��v����-�Y�<��q[}9��H�?��������M���
��F�h���9��>VSX�#)�u����X>�<���Y7��h�c���R�E`�U*����-���L��R��l�n	1�(Y�$�E�L~0�\N����z�<�����S��M8������8��i���%��k�,e����d��8e3�.���w��p����i?����.�����h� �_���&��C�r�?�F@����z���$��stygi�,��0x�L��(�Wz�������j4w��"��������[��>+~=A)��4�O��U�s�B9���8���Y���*����8��:>RG��kp3:>=��.�$����e�����%� �9J/��7Q�\��O��(W_��&A�8����d����r0����1��������a<RBb��h9�jd1��������xzO��
�#���C��s���e0$�A]�	�pu|=d]C��?��������M�X�`�T�^	a
"��}��J7��b!!M���6
�����CV�'�F�D��|`BN�)�S�j��^�KFQj����Y(��t�a
�����Q�3�M�O1������Ep>����|�O�
zp�@���x���oL�C�<\u�m`���x�C�dg��T�Y���U�fxy�	�PR��
�w!�]��A���S��)�iz�'IV���J�bBgIe�t�6!��G�i�S��;2���:7Wp�����T�.�n�*"W���
d�A=�M�s����f%��S������M�������@�������%/����P���j1?��-�W!�u�Rj�M��(�yc��R�>]��'��8Y,I8�7����>�Z�K8B0�4"��+ix5��l������QSH0v*��I�$RB���H|�#������`s:���>Y����@`��M�n:�/�1��J!>,Q	����@5�i�����������4�����	1��
�%��a8(��^��TO
�A�4�"Ms�W�� y���1�� If(�S�3�T����,�76--)t�7f>PM���R��d�!t������k,n��`���&�c�,S��7����J�����U��� v��1������r
�L�����=����1q#�����8m�]��)�x;#���\�U�\r���j�I�;O�H;��L��4�ck?�:.�p���SR�C	�	�������^�*�J�b2"86g���3��PR0�=N�Y�F���sd�B�0�O00�15��Y&3�7���+M�[�Y��!���� wAE'��g�k�Z4�N@u��Z�~��&@$jta��d����n�P�WQ���/��(��G�9�����m����*H�����"-�p�1��s�P\��Z�'tZ *"�����|m�"���o�!5��sV�J	�{�s���B���r��Ud5�ft"���A.���IOxV�H�S=D���f,YB��Oy����>�E8Osp��*:�g�������u�-�A!�9g(��G;�+a�]���M��'&���b��L=�<V�6_ [@�$�)[S������~a�D)d1UD���{�)�*����8���Q��L��k%m��]+2
���1��;8���4���{�d������/\�mJ0OZ�ev����
'��l`�y��e/���a�|�v

�D�5I��$<>TZZV%u��Q���8u���m���j�����W����T����Vm�?��y�����n-�[`����(�DN0h��2
����LS��R������Y��UU�|��{��I��|S�=Y�ec�0:5����g����y6�\�����7X�0zE�N��z��Y^	�G��P8R�K�M�9�"��\�w������}V�q�*�kr�]�.]��b����l�>��C>z�,�{�j�D��Zz�+QLX^i�p|��M�*�t����AqC�5����0H:�����C����rB�I�9��]1T-�E�.���!�4��D�!^8e��K1�����:EL.�'��>�]gIgM9��t����+�=����p�`����?���������L,�=��c�'�I0���c��B]A��NX��e������C�9�9A�2��.��Kn/�>���Q�i����%WT�|V�w{� :����D4N(��q�P_A%*F �-^.BB���4�B�h�,w�����UV��E���M���8�j?������i���f��l0������~@�0i������_�G�y�+�'f���O�y~1��=�f�"�s��}��S�r���Q�sF�t��}M�GY���%y�����9�����xVrC��p(���#��M������7.�������^4^8���>��3�������Z����;K3n=s1����0��c:F�k�	�&q4�����q:&#�#��C����p8V0��������B:���R<OKL�Cv��Kv�9+�S�VED�GP�����|lr������U��1�D���=��;�8�����r<-WDu7������A>��!�Q,��g�GpI�Ap�]VgEq,���lnI��).*�`N��e�������*���p>K+��/��(�B�O�X�o���0�e�N��0�Gx:���&	F������Z����qE�
M*
/ �+�����L&�1��eo<�mTh��i�J9�����K-�@g����(h�'���6�:�y`���� z������p���~�73��G���5��i|e�;���>�� `��L#>}n��E T�����75&��8�tT�������R�P�c���s�:X���B�������U-b�<�4b?8r��]�����fUy�������,e�bQ����u�W7��8Y��*��){���h�\]��Xd
�����o�-����%�x��*wG�,��Zx��(7Wd�|��e�0���0�7"�@]RQ`���p�)���45"7WFn~����hlld_�jC����js3�\�'���
�^�O�_�4�V ��-BdP�E��~�X�T�� ���]���=`�673Jm�{�t]��_V��0U6~��������^�BC��m�,�UT}OH�=�rqP�O���������2���7���lA��Rg���C� ��]5��B����������n�T���@���.�kk��L��W��}�L�����{�(���4 Q����[� �B1��)��3�9#Q���z����ktC}q��`y��>�%��S���|������o�m�,���'q��3\�sxx-"�`��
��aI�������a?EM�:T���)) ux�{�n��V�^D�;[����V.�F��RP��Z�i��B��O�O����{`�����tv��AH�[s{���U�5�Y�Y"�~��u,�S���)��q1����g���pl�Y�??�;�PD�=0rD�-hy����|N���	c}���/��?��5�,���N���z:��J`�c�'#7�.�4�)��x
j�����2��r�o�����A����oX^�[�?�������2��;9�IJB�{T�zk0�s/|�vJ�go�'��r?<��=U����.�.�k���0!����rf���n/�P������
2//�^�%��T�p���HNY�xa���'��U���|�Mbe���!�x���e?�b��=��r�*p�r���v������6�������������^o��=�,f���)f��.cI_�?��h`y�����&�)c��cp.6�G9i����y������EK:e���i�QT���D���������I�HQ����`��xa���!F�N+����9~�����y4������*&2�H:%j��W�C"��a�w1�B�f>��#j-5lg���<=�X�?��N�.~�J(�q;I��P���0�Y?M������Gj
RPZ��b�0�J���=%� ����&��{��^�+&_yr���\�>OX���,#��6�������\����u���Sa<d����Z������X����������������v�e�������?���G<��<���zv���]��4��o���L�}��8mo��7'��ty��1���:W(f���nN��aM
���3������eb��7h0QD�M�$i����r���SL���lp�#C���R7�����
��NEM��K��$�u�D��%��4
)��H��3N8"HG����2f�G4��aM������Jcc��;1X��s���G�j�q�4cs��D�P�3�H5<>�2F�\0?�y�?�J<R������5v9o�W�w�P
o�MI<h�O� q/���j�����w\���-����83$,B��i�P��'J���:\z��������j������'60$x�����#�t#�Z�tp!�B�7��W�T�
�+�������0��F���f��"j���G�����R����S��M����k���#�M+�3����.D�{NA�X:8GB�����P�Hb�������B\��b��U�D�&�e;RI!���j!*^��c,�1�3���Z��CP%���#���|@�]��.�������N0��2��@�G�x�c6��yU�Yu���k�s��#�� ��Gl����g�
�V�s��<��p0.{�~�sj)wG�^����������1�D�3�-��	��2`SE=�vG�Q#��h�'�O�����Q������(h8
�W��&���� �S1����V���LW��q*�R�3;����^�%���c��0:�������	8��Ks�QS�g���S.gAf�9��Gw6�w�� ��Nm���,��p!���B�@#����<��gBA�GM�%:��l���q�P��U*/8a9�F����|�*�����	5�r�*(��}a!�h�A���p�I^�l6��UA�7��%�$�X���`V�����#�����UU_�M��k��"
%�z�����G�n��)%"2�5�|s�}!�3���aU#u�f��|�Y�>P�^�AP�yra���k��_h1o	�@�Nq��0d��F�0k$�ef�mT�22 ��\�weCe���g�3q^�[5M�b����6�A��&�.>�r����x�h�PI��K�����f�
	9�?r*����?�*�EB/�yCZ_�k���,��gj������ �4��wsE��U��j*,n#+�z�	�}%��L(����! �h9��-��HW}�����V�51����#xeK����6,�����V�Q��������j�Hn��L�u�
�_:�b��H�}�����7q��C����0�1������)�LE�P�!`{�:X���eW�`H�j�
	�;�\�G���YX20���v�7�T�`��h�D�
�{Ku���Gn$S��t��������f���E��_�F�(�M�bu>#���1i� �@�i�>��0�D������Yu\@f�^�U������/$��-���������T�
�b���%�&���i��+�h��=�'t�~X���Tne� ���*�7� ��8��.�_���i2�8� (��������O�1��'��hq3N��a�pA8Q�U(�hF���by�����&����FI������[W�����Z��7����r��MO��
�����n!TkH	�M����5%CC�����u�5�3������jn<�aWV�X�A����5@B��h���h(3h�
���s_�c0�R3�~����FPr)����������������#8KNK�u����M�Q���E�Z���u�����v�����\�M����qt��������i��,g������$�X/9���/�3:�Y��������/�0�F�#��'6r#�:������=����e=q�ss{MI��VA��a��c#���p)�#�V 0ND�������`T?�#�G�9G��gi��L�Q�}���V�Bj��/;�9��$���O)��k�����,S2|�+)�v�6'~�����)G|~
q���[3��W��Z5����=���������z����	��5��,�}��J��:yN	=�S<	N�V1pL4�Y���p=�I��D	�b_!��6�����p&������D�tX$��K���\��aF��(c_X�YL�!'	V�X|��E��'���|A
�����7.b��d O��j�)��Z�T�0�J������uhI&6��T{X[���;�l���+�,��l��u����Xk��{<~<�\8�{�e���c�:Z�X���P��M V�9���H<�?����[4������6���!��K(�4L9��'�}B?��V_���lJ8���)���j�n�3u);���4�{D�����
��cK�Bx�0��a�����`
����ir.pa�����0NMf�}��U�M�e�3yI���Y
�D�Jo�X��d��
Du�0��,&EO>�KQ��#�U�6$����0_[uq6n#]'�a�jrb�� A�|�:�- }���[�S�0�^^��f r���u���9w)t�������:�`�:`#$W#�\��� */�O���#���b�o���%�%m�c��+���~��p���	��HUFe������Oo��8�m�{��_
<��\�k4^)0i�����A?EK�^FK����������=`�9"k
��u��BX'���d03,����&M����|4��T�T�@���n��S�W������G,��y�hi
 ���M��R���)�r�U-����u�4d0���!��4�(�Z~u���f�����2J>s��dMc�E��2#����Pg�Z�F�'�����T�hz+�f�����i� L��|E�	����Z+-�D�|B�f_E�+)@��.:�_���\��C������9�8����ZV���}�'e8#`#h�ob���9"���PFn���H�u�WT��r�?�Z�U����;�Z|�]�y�.Yz���$`����PO{�
��;�)�;;��4 Z|�����L�����+JK�z?�s��+wM���3yP<���3�q�+a&Z�����f`d/�@r�VK^,���Y�%�s����Os`fOB�8���/�k�myT��b�]0l�=���><�^9��kz�+�����pu�w�(������f�&uv����e����-VQU��P�lK;=��,���`u�95����@���(�9�:�L���}���)�������`xq!!v��>^$x
���Q��p����_���@������y�1�;Q+�����������b("�@1��.!��&$l��vmT�h<�M#�x��~�q�:�X�a���t����������M����)�:�������]�����_�_\v?\t��G���'��G���\�|!�29El�9,*�T �����kE�w
?b��n��q!���c����kBH���?}�ir<���x������*��7�5�%Md�T���hZ(7��.�9������xpUoe�A��$B��B8-Y�g!�O�p�D�:�����t�z���L�����{xiz|����=���C��Y�aE���(���_x|X�3�;N�U
nf�9���X������J�~&��O�����:T)�Ss������?�v��?�?Y�';Q{5�
�F#t<�M���G~�XA�����hk�t%72����p2��.����.�vD�)���1�S5}�Tap���t�:��X'm��t��&TK���b�j-�>+����a�S�%���~��T����U�������#��J��`�Q�6��,d�9��]�y�#��yiR�e�A��O��Z�A���p+NY/��B���e���d+�F��?�k������]t_v^+?>yu�9�Pk?��rH#e�2� ?�����4�`!:7�O�UE�#;wnG�!L�1w�d>zI�U��V���0�V�T��yX�:���E���r����m�<R�0�����Z����O�-�Q<
x����lt;
0��	�`�cZ���uNw���}�
���I2!P2�=�G�6��d���|awBx�J\E�c������1(I���C9��!{ X�!LIK�G%
^��#75���\��]���v��_�q�)�����(�~��A�c4��m�6�~�*���=��.M�R�,�������-��F�u�	����G�@�p����j�;�������S6 @:n1�W`���A���&_F����9�:_���B�0�7��h��+u|���,���
�k��8��A_�������:j`�m�pq����N�}���	���c��4�p��������D�#�5��[���P��u��Xh��?�]�`�T������(`!��
���!�F�)��b��8���/���	 8�xF���@���r8��x��
n|	$���8�h��}��&��ocJ�Iu���4����pK�� �|��tx�S��gpNE^�s@� ���D�$7c�W`f�_��<����N������F�%�	�@q�x�Y� �O:O{�d��6�����NR�cb��]��!y��2�Q_\(X��]2gx�u2����'r�#�N�������q����w�������Y�j���m
�)��l���U�I�:f�B7N�d�^��..���K0�Q#�d����-����n����)!��|`���4%� iK������X�r��4n5c;=me;N��t>�K�}�IdU0�N�����6���N���"���y#
��B����y{�mH!�c�+�,)���br)�`��8��z{1"���l��wuX[Xz���p�[V}5��.�� I�Q���I�4C�V00�������� ��<$�����D����x����c}���~��O�]���������qB����!����]2�O`��6�XI����4r)x��J��DZ�/�j���k�$R����t:K&�X�D��&Nt;0�(�G�O�yH}�	�@LRlQ�(�����4e���X]r��GQ���F���}�c���$�S�f��~��C�5����������TQ�����i*U+,:8g����t�cc;/)���
��t����q/8�`�����{�V���g��/�O�������{E�����]�\��?8d�-��'�KA���9���y�iWK@4�(���*x�~+205�'[�'
��#&��6�,1m��2�����n��%=����S<�d��d���	&�S~]���}u*�b������/J���Y3,����O���0�G������^����>��Q�����mJ�4����{���9��}\��,tA�9�
WH��Y����U�9��>�����V�I��~����]J�_���V��3=���7������
<"X��=�������m�)��R��:�Db0a�q�v��Y���#�
�W�m(#�ci����N/?�;�m�����a�W���vG����:����0���EF��K��G�1Pj�+=�.}e�zr�i \����T~{W�����p���)Y���U��G���W�����?S�Y�k��v�)�ro{{��+C��$k�	�������%�J�_���SMN��j����c�|�^�N��A��'6_L�:��&	B�
��t%�Co��{3W�����+bw������b�%
8o3
PnTUse���p�x�����o�@'j�g���0�(c�i�L��X��������K����d�mu�;�mr���k�)���j��
YS)��2��s��E���?l�E5�r�Qu��Hf1\(��?
���<pJw������S�=����_
��V����]\��*��u���0�Y���7!A��>��(U(�Hli�7f_�D�g1p����1]����~����H���(U��QRbQk��(�[�?/S��~(���wM����{�g���%����[8T���Q����lk���D6�b�1��	�����/W"�*�I����q,UB�������(Y�����m
"-X+�*�}�+���^���ho�����'/�xc��k�<9a\���Spo���i���4m��	�OI
<����&S�������f'kl��/�@}��e�zY���4?�#X�>��5�Ae��M�Z�(�~�=��_#ebR�M�|X�2t
�4D�x>�*03�/�{Y��K9
-2�#���z����4��;�]����$�
��VmK��f��������
����{f�XO�8�)�8)�0�I0�-m��Y��Q�j_
������>���@�m��aa!X5M��'$@�j�Tv<>1���}N#<p�\��H�5W����x�p�7jo3����]����2����0;���T1W�E��$����WK2���dx �l��L!��V��c��8��wFqx��0%(?VK��H�fV��2�]�`�o�� �()Q7�*�h�A�����lY�9�_a���$�k�z��Y`k�1��#�pl�����L�h��r7�M����>�q�v&Ax�$
r�#�(#�^�9p��.�h�y�=h��h������"������bA��-�	�
�j��������-��&��'��1�1�U��q��:���R������/��X���\��F���mPq�7�''�'o�����X�OWj�����W�:��PK_'`��.5g%�>����V����Q����[	��R�8������K>�Z�;M�+S?v�6?�'9S+�ML,�5�����"n���XQt�}2������#�x���5��SI�����c+A�n}�����pV:��w���]�z& h���%��\&��jJ��W�I�������KG	$�E���YuU���x=f�AeG�L��;�O�C�|}_�>a�|P�&�x�M.��u�Q|�j���P���.��z��F��
/���i����.�������(����s�!0W�oe�P'��<��*�fb��d;?cu�GC�6��4�:������p�[���7�T�*z���v��p��1����@���0���t$X�����6+�����p���a�����4S;2� ��C�1y%(�@Uu� ���C�"���}8"h��&)�����A��bM
1�ub���)���m��;Y�F(����C�4�;\���!���f��x)p�La
"�C��XQ`zd���|F�Y2a�4U���KH�	�R��F|iE�F�e	.+	��4���_=���F��_����x�,m���X<xJ�?�x�d{�I�F�����9JiV�!aR�U�@k��:��_�k�"�F����O"�Qg*
I?�UY��.C�?�������2X��X���.l0�������Z���!A7�0������i�Q��-U���1����������`���
W���_:�o;�G�K4HS��8��`�Yk������w��X3v�7��8����������~��d���c���Ux�'r����g`������-��ZN���������;2��eb�5;��.@�KBu���S?�����K�o~.�mO�L�\6�.���Q����<nA�Y��x��]��4���83,qBY�K�.�Z������Jg*Vm3	�K}t�@�E&�p�8�d
."�
U�}A;��q��:j�����[j�r�i����f�f;�Yt��g1����'/�:���a�/���=��0&��A���K��-r�G����W���s������)�������y��������a�%��!�{�����f�Yg�'aac���4�V�	���f����������_��O��jh���g��fVI��nr}��rG-"l�h�{�H��"��"����h~������5
���F��'�E��)�9�^�G7�
M��[D�2+7���E�/�RI��5-������K�x������Q���{�%���u����MP��/�-��wi�}3����R���-��cA�n\���&�*,�"�����\
em��l�
����Q�u�D���+
(�k�5��'����cd�C�Etk��n'{�)Zu=��%�7�����������*�f�Y��n��m�[�����e��(:��� ]����5�/���5��r+�
��$����M�)�t��n}~����~$���/���
���0����d�r���o��m��"}�hj�!K_K����B�(��K�^���xqK��������3@�h���[��^I�JCI-o'D��`���s`�m�
��be�&�TT{���}�-�Yp��\��q��b�������L�������%�|�����Bqw;����Q�b�,pl�uH|��p�e��=���V@T����P�X�E��!�62W��0�q��N���V`]L��������R���@�����������}�[����u��v�d�X�~7���#�����5�<��`���"���,M���Q�/|�Av(�	�E3v�p��>*����@52^��,�q��<%�����C�tr'��\��K���2��n�����o6�������!�`0���C- �y�50���|2�����q@�U
p�"�8�h�~~��
�P>���C7v����oy!����ik�1{�� el�����������K����eK,�G�F�������',��{o)|_� ��k5�*�+Qr�4�{�������6����^�����$�$L������"-o�%�Y�E
���,$��h�V��-�1����z�*�L��+�E9^M���-�����CJ�������(�����y��K�����R^��}��$�qgA����[��A�@�Z��=�	2r���AC|@�����8���wO_�|al����\J������`{D�uE���:�vX$r��d�l�~���1.e�(����o%��/{%���nd�Yq���s���Tr:O��!���e^Kz����D>�*U���Bq��a���lk	b�6�4�N�pk3��7�����^Am�c���m��^Y����)e�d�����;��b��E��e���	h�V���V�)::!�/j��W�h]T���?�TT����}�8q���5&g"��&Fc��}�SF���9�����������L9����q3�S�����:�pCJX��@�������4�2�tf����o��h	��V����k����,�t�b�M�Q��u�5��������.��:�^3��0����'���w�xnT_�UB3v��K���l�~�_:uWK`S�V�gsg����;��Vke����������l�������c_u�M�1H���%��E��o�O3���,�����%��El�6�������`���H���':�I�t�������������6xW���!��v����@l��\�i�V�]fgYd��an�pq�b<����-c����>��E���}.�w�t��^�1;��Br?���Vskg|k�E��+�t�sj��g���1 YL��<���>�Q��D�^������o0�e��.�"�Sw"�p��de�DD �b��S^��\��F�w_$n��
6�br��z�V�������[���}��xK�� �!� k�^{����(��E1�4y;��fA�vj����-3NFR��g?���V�����. &z�n4������#B�0�G��j�1FN\�q����\d���d��[^���>��R�r��X����p���O	�~��C�`\c�����1-#���cTz�R�"��}gN1���oUYT�)�9���H8����/��+���wGU�7{-d:;��Z������U���
2��K3��4����:
�!o��!���@�Y�@�������Dv����t��Ga)�?=_"�5R���W���iU-bE�s=h6Z[��uQ�x���v:����>��g��0pI�A&�Z�!��
a��spg���x�����7��D+��Rt!�����Z�� k�������b��)���0%:5�0hB�(��ab)� ����o���B�� .��B�8J�1h��{�z�����W����,��K"o���m���'�8����*�&B�\�������=�/7�/�������"�U�=�nq�s��"� ��+YG�����M4+e����)����F�b:C0W���"p17�)�}{	�Z	PT���RX����2���e�%aA�o�C��]
���Fmk�VS@�l�4&���'�8�m��\��`;U�		(P�R9S��Z
�1�\[)|��#W���x����}���$���x-!/���I���{1Y"�I���B�s0�@��0�j�~v�����'D>P����o0w��r��'�Sg)��f-��������b��1�\.y�?�����@�m��m�`���/�Xd}�':w��a����J�?��c��-�J�h`U�Gh�W�Gu�X&uE���4m������_+M�h����dy����cI2�|�_c�:���/�i�����Z6��"�/c�+C9����0#�`�$: 	�����	��]5����5����4�3���AB�@���uy�Q|��jf)�#����������uO?\v�+�I���-��6��r�����>�A�Y2��V��Qa��^�X�����G! h�J�����hz�v)f�Fj�!d:�����^�S5�����`=�w�q���[1�g�q���r11~7u�����"�#f������;���d�8��P�i�W#����!t����u�Ix�u� �YPf�tK�"��D��g*y�>�dw*L>=��!�%��(An�{P�e�t���^�I���k64tO:�#����9tqF����uU�w�"��BL7C1|N'���Y%��������A�:Z0�6�ON:����?�h*����|2Y:������e��>^�vw(O�2���������e���[���m�����W�`����:�]��aO��B	g6.���n��Z{o����jS7�4������{O�����~��'�;��8����[��8\��~�J��X��+X�y��e]��Fq��`H�j�.H}��=�:V�*���e��\V�|Y�3fs�b���w��/"s��eW�\At��q��GF%�x��5%����`8�LH����0�N���n��W�4a`��D�3���sWZ����-�#$���%��M��R���6�K~G4`j��51�V=��gr��y�ae�������y����Q6l�qa���QB���*�����-��Z*0�|3vo�~��t�j��2����L��e����� #*\�w��f�IB����6��i�5r�X���x�:R�@�k��U���Z����~���j�,��JX9���=����������� s>����[��
/*�W"
%_�K��:����&��;�V���Y_�	�+�om51�Ok����A�U��xP���HI��A�=��:9��kXf3w�U��|-�0��*WP�|Q��ns�T{�+��]�S���hFgY��'����Q��g:�'�G	���*��\�(C�d[���Z���yT�d��rzM��Hp#d4{�W�W��@�e�O�K�
O�b:K�o��5�4"h�@(���q&�������%'"����q�qu/�j���lh"�8��!Y�:�&S�#R7�}�'��6|��@�}��#��=��\uk��Om��O(����d��������20UR<l�0g��N������5#X�i�^I�����9�A���V����
&�Kz��|����=��.����mo{D�E�iv`��QUZ��j4;��#b3��@(6�����Y�:�vs�l�tJ���=�4}���+`
��Q�U>���E������������:��x�%��P_��x���L��u�RL'��D���>�v@[��)cJO��D���� 1� ����kl�Ig�i|=��v���:p�������\B5�o�X�86�@-��^*�@����w��@�D<3��� ����-1������2�:���D/:
>�Rq2��I*��^W$�����,h�-EX[�������(��R�]60� ��#m2�g����4��L���{�����P�Ea�h���1�m�o"�8`�ZV��%���k1�r�K��C�n�[��xs�,��CoV����l��
?+&�:3'��������e��u3��W��G>uv. 0�`���+�&&IJ����+���>b������y=���x3=6_TH8��2����$(�AQe�fE�{���cf���v�}��9��vg�[��rC�/4�!��n��g��;^�pF�w1�����d<�k-�����/sC�	6J�'�b���w��e�4�\�g��[q����j��������7r/��i���>���l������9u�j(�������%�Hwc���#�����tL;�\�5���\����q3�..�eQ�(m6��GQ!�F�+�����Z���0Q�Ds�j6�U�($	Ph��@u�458~9����M)O)K�'��(.:��������CF'�H�og&#�������z�6���� �|��rS�J���r*F3+L�J$�T�IW��K��|uq���tdK���a	cu���g��fs������y�Pi���^,fj5���{$�%c\����'���U�3\q�zG�������{������<�3$�D	NJ���M5�W�j6�G��RF�9;N�XJ���1)2/�&*������
V�-]3��(�Bz�0�u�q�e�~��r��v���F]Ou�}�a@{�������/G���~����[�`Zf��]�������v"�5)�tW�0y�����n�:
��&j���c�wrF���tR������1B��>�}�qx7%�bhz=�4�/�`B����E[��G�,D2OoF�Y��F�9
��<��o�����&�������S�Y��5VW-	.$ ������
55G�
jj��zj��UG����U:+PaS��V�2T�@��qvG7��@���-�O���;
��/ ���\������I���r6�f����x,��M�_�������njnk,X%��:N�V;���={r�F�������p/D,��0PH���h��N#����_�G�TY&RAn`�W�,��~G����0��;Y��s�@�/5�j�5O82����8������}q\��;�<��3cVY��M�H(���B�{�X��S#�
%�(�d�!?���p�>.�O^�U2�����w�w���=�'T�F�`�Y9�Eo�;mv���i��?�C"�a$��oQo��$�hv2xj6� ��OR�|����h�����`aR����p\���bM�='+5��~�G�QPS�q��N6P44K����@}_m��n��m~�hS�s���p�mN�B�
��FF�d�}�6������^W�����s����q�$�w@	�A�x]W�IP��/��v��y�A��=9=�tO.�V\
mZ�k9�\��9
>����;�_������3��L�<=��4-� �S�#Z��G��oB�H����\9�e��������� #9@�@(���>�aRx��������p��gj�F��k���j�Q����'7�/����H�����M�V|<��Q>�"�s-�m��eyH�����/���ry�(�D��*�=��{'P���o���\���rm�8��[B*���'x��Q�g�^4L�O�I��9�%�As�]��)n7�a��s����l�P���V��hl5����,[�Np�H�g���;����O��M@d7L�����]�s���1z�����f�c�t_�2�Nw������f�q����Q��4�V��s���w�`��e"�`��w�O�).�K�����lm���]�o�����f1OqKy<������0S����������p��y��]����c��
�`}��]�\�~�}]��:��U��4������uD	���tVN��M�J��R~s���\�'/�����E�qvx��{y�d����^v Z������03����&yA��(��(�N��dX$��j������C����x����,�����?99���d��f��'jBGa:�u&��,�yJ0�5;��n�^��
vw�v�	��F��|�0>g�I:�/A��{Lv��n<�U@3�����g��[���+L�A�����;Gp:a���:��GR����FA�`O�h��I���A�l�Y��vi��dkq���l/n��b�[���Z�I�,]�&HO����m������""�F�(�p)2r�3J�Z@Ne��������<��B�B����G����[������s����������Fo����9���<��� ��-�w�����:^C*��i8FPDD�BQ����t��}v�s����-v�W����y�}%���a�3{w�Da�M��e2q�X�pE��)Z�}�r���wO���T|��7<���y]�{;�����A�n{D�����!c��@��M�s��%��]�
/I��� O��Z����V��y/�����
�'�A�
Q������w�BJ�����Q�@���;Jp�SY�����A�����J��]J�����5I�b0�3�`&���W�����
��X�x��������&���2�/����#�������k�0^����=��O�"�\K��j�w�'*t?{}�T��;4��OU�5�������z�S�n�H#�R�H�;��x�2P�qE;�U�;
�x�R��dC^�Y v;��W��%`�p�z>�����a|6�mh�
�?,�2���%[�� ���d+k|Ms��S���~�������V���hz�tY�
�+E�'(�
2h�S�:�2�������~��pT�o\I�~�ke��H�?�F)�{�nS�
���,�MH)�:�����r�������6 �9N�g4�t��a�%� 7��R�,_��@N��<9��}E��,�����-?L.E?3�6�NiH;��9> pb���<��X�2�1���1[��1�(}��@����hf3�YL�`0A������8Tw�tf��S����������:�8{*K:����l����lVS��0H>T������0'�L����f�

Y,��=�eJ�d���\�A����T���4aYQ11�N�R���y�Q���j{�S�dv�gb�F��Q{�C�y�C��W�
�u�o��Z�8��J��C���|�rO��L�`�!JVI��Z9�%�<����\�i8�4����x?X�g��0�]k��.)��y�|�pM_�r���/�)�n�-��_A9,^Z�W@��������O.:��TF��"|�&�}���J�X�����"��F��
>�6��*�F@�&����'s|����+��e���J4J�!�
J�cj[���������:�����?����V���:�@$^����]���:���������!w��j��m(�����Q����9�7_il3���bl- ������Q��U�RQ�����/�{�_�j��`���t���%�B�S��S�+�,������5�P�7�''����.��]��Ke7�]����/X(���BG�W��9����8*c�y��GJv�Z��F���#y��9��s��f��������^o�{��F�=�*������"���a��M����B��Q�~2������~����NW����St�5@Pd�X'r��P�;�}�o��+t��e�-N���	��[S�*X�d���M-��hP��"��_��R�c�k���^���zb,jg	��.��f
P;�,��fk���]H�|Do~����@���,f2�ZT��i�_�Z,��8)k�t���lK�^���>BI���W�����P���^���5�HU�[��s����v�vvv���h;��W%L1�B���e�������������L���4r<����\!�Rl>��_P�^se��'�%��^�����1t����J�V��J]f����o�c 
��b�b/{��e������W3���0�+I|2$��<w��f�X�V����7(&O]1O����l������ ��SSM����w�������^k�xR�Y�����K��J��K�nyzl[>!�����o��qz��6PT��?@_��?Ob��������?���|�����������?���|����������:q�Op

#41

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#40)

Re: WIP: [[Parallel] Shared] Hash

Hi,

On 2017-03-07 02:57:30 +1300, Thomas Munro wrote:

I'm not sure why nodeHashjoin.c is doing raw batchfile read/write
operations anyway; why not use tuplestore.c for that (as
tuplestore.c's comments incorrectly say is the case)?

Another reason presumably is that using tuplestores would make it harder
to control the amount of memory used - we do *not* want an extra set of
work_mem used here, right?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#40)

Re: WIP: [[Parallel] Shared] Hash

Hi,

0001: Do hash join work_mem accounting in chunks.

Don't think there's much left to say.

0002: Check hash join work_mem usage at the point of chunk allocation.

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 406c180..af1b66d 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -48,7 +48,8 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 						int bucketNumber);
 static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);

-static void *dense_alloc(HashJoinTable hashtable, Size size);
+static void *dense_alloc(HashJoinTable hashtable, Size size,
+						 bool respect_work_mem);

I still dislike this, but maybe Robert's point of:

On 2017-02-16 08:57:21 -0500, Robert Haas wrote:

On Wed, Feb 15, 2017 at 9:36 PM, Andres Freund <andres@anarazel.de> wrote:

Isn't it kinda weird to do this from within dense_alloc()? I mean that
dumps a lot of data to disk, frees a bunch of memory and so on - not
exactly what "dense_alloc" implies. Isn't the free()ing part also
dangerous, because the caller might actually use some of that memory,
like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
deeply enough to check whether that's an active bug, but it seems like
inviting one if not.

I haven't looked at this, but one idea might be to just rename
dense_alloc() to ExecHashBlahBlahSomething(). If there's a real
abstraction layer problem here then we should definitely fix it, but
maybe it's just the angle at which you hold your head.

Is enough.

0003: Scan for unmatched tuples in a hash join one chunk at a time.

@@ -1152,8 +1155,65 @@ bool
 ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
 {
 	HashJoinTable hashtable = hjstate->hj_HashTable;
-	HashJoinTuple hashTuple = hjstate->hj_CurTuple;
+	HashJoinTuple hashTuple;
+	MinimalTuple tuple;
+
+	/*
+	 * First, process the queue of chunks holding tuples that are in regular
+	 * (non-skew) buckets.
+	 */
+	for (;;)
+	{
+		/* Do we need a new chunk to scan? */
+		if (hashtable->current_chunk == NULL)
+		{
+			/* Have we run out of chunks to scan? */
+			if (hashtable->unmatched_chunks == NULL)
+				break;
+
+			/* Pop the next chunk from the front of the queue. */
+			hashtable->current_chunk = hashtable->unmatched_chunks;
+			hashtable->unmatched_chunks = hashtable->current_chunk->next;
+			hashtable->current_chunk_index = 0;
+		}
+
+		/* Have we reached the end of this chunk yet? */
+		if (hashtable->current_chunk_index >= hashtable->current_chunk->used)
+		{
+			/* Go around again to get the next chunk from the queue. */
+			hashtable->current_chunk = NULL;
+			continue;
+		}
+
+		/* Take the next tuple from this chunk. */
+		hashTuple = (HashJoinTuple)
+			(hashtable->current_chunk->data + hashtable->current_chunk_index);
+		tuple = HJTUPLE_MINTUPLE(hashTuple);
+		hashtable->current_chunk_index +=
+			MAXALIGN(HJTUPLE_OVERHEAD + tuple->t_len);
+
+		/* Is it unmatched? */
+		if (!HeapTupleHeaderHasMatch(tuple))
+		{
+			TupleTableSlot *inntuple;
+
+			/* insert hashtable's tuple into exec slot */
+			inntuple = ExecStoreMinimalTuple(tuple,
+											 hjstate->hj_HashTupleSlot,
+											 false); /* do not pfree */
+			econtext->ecxt_innertuple = inntuple;
+
+			/* reset context each time (see below for explanation) */
+			ResetExprContext(econtext);
+			return true;
+		}
+	}

I suspect this might actually be slower than the current/old logic,
because the current_chunk tests are repeated every loop. I think
retaining the two loops the previous code had makes sense, i.e. one to
find a relevant chunk, and one to iterate through all tuples in a chunk,
checking for an unmatched one.

Have you run a performance comparison pre/post this patch? I don't
think there'd be a lot, but it seems important to verify that. I'd just
run a tpc-h pre/post comparison (prewarmed, fully cache resident,
parallelism disabled, hugepages is my personal recipe for the least
run-over-run variance).

0004: Add a barrier primitive for synchronizing backends.

+/*-------------------------------------------------------------------------
+ *
+ * barrier.c
+ *	  Barriers for synchronizing cooperating processes.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * This implementation of barriers allows for static sets of participants
+ * known up front, or dynamic sets of participants which processes can join
+ * or leave at any time.  In the dynamic case, a phase number can be used to
+ * track progress through a parallel algorithm; in the static case it isn't
+ * needed.

Why would a phase id generally not be needed in the static case?
There's also further references to it ("Increments the current phase.")
that dont quite jive with that.

+ * IDENTIFICATION
+ * src/backend/storage/ipc/barrier.c

This could use a short example usage scenario. Without knowing existing
usages of the "pattern", it's probably hard to grasp.

+ *-------------------------------------------------------------------------
+ */
+
+#include "storage/barrier.h"

Aren't you missing an include of postgres.h here?

To quote postgres.h:
* This should be the first file included by PostgreSQL backend modules.
* Client-side code should include postgres_fe.h instead.

+bool
+BarrierWait(Barrier *barrier, uint32 wait_event_info)
+{
+	bool first;
+	bool last;
+	int start_phase;
+	int next_phase;
+
+	SpinLockAcquire(&barrier->mutex);
+	start_phase = barrier->phase;
+	next_phase = start_phase + 1;
+	++barrier->arrived;
+	if (barrier->arrived == 1)
+		first = true;
+	else
+		first = false;
+	if (barrier->arrived == barrier->participants)
+	{
+		last = true;
+		barrier->arrived = 0;
+		barrier->phase = next_phase;
+	}
+	else
+		last = false;
+	SpinLockRelease(&barrier->mutex);

Hm. So what's the defined concurrency protocol for non-static barriers,
when they attach after the spinlock here has been released? I think the
concurrency aspects deserve some commentary. Afaics it'll correctly
just count as the next phase - without any blocking - but that shouldn't
have to be inferred. Things might get wonky if that new participant
then starts waiting for the new phase, violating the assert below...

+	/*
+	 * Otherwise we have to wait for the last participant to arrive and
+	 * advance the phase.
+	 */
+	ConditionVariablePrepareToSleep(&barrier->condition_variable);
+	for (;;)
+	{
+		bool advanced;
+
+		SpinLockAcquire(&barrier->mutex);
+		Assert(barrier->phase == start_phase || barrier->phase == next_phase);
+		advanced = barrier->phase == next_phase;
+		SpinLockRelease(&barrier->mutex);
+		if (advanced)
+			break;

+		ConditionVariableSleep(&barrier->condition_variable, wait_event_info);
+	}
+	ConditionVariableCancelSleep();
+
+	return first;
+}

+/*
+ * Detach from a barrier.  This may release other waiters from BarrierWait and
+ * advance the phase, if they were only waiting for this backend.  Return
+ * true if this participant was the last to detach.
+ */
+bool
+BarrierDetach(Barrier *barrier)
+{
+	bool release;
+	bool last;
+
+	SpinLockAcquire(&barrier->mutex);
+	Assert(barrier->participants > 0);
+	--barrier->participants;
+
+	/*
+	 * If any other participants are waiting and we were the last participant
+	 * waited for, release them.
+	 */
+	if (barrier->participants > 0 &&
+		barrier->arrived == barrier->participants)
+	{
+		release = true;
+		barrier->arrived = 0;
+		barrier->phase++;
+	}
+	else
+		release = false;
+
+	last = barrier->participants == 0;
+	SpinLockRelease(&barrier->mutex);
+
+	if (release)
+		ConditionVariableBroadcast(&barrier->condition_variable);
+
+	return last;
+}

Doesn't this, again, run into danger of leading to an assert failure in
the loop in BarrierWait?

+++ b/src/include/storage/barrier.h
@@ -0,0 +1,42 @@
+/*-------------------------------------------------------------------------
+ *
+ * barrier.h
+ *	  Barriers for synchronizing workers.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * src/include/storage/barrier.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BARRIER_H
+#define BARRIER_H
+
+/*
+ * For the header previously known as "barrier.h", please include
+ * "port/atomics.h", which deals with atomics, compiler barriers and memory
+ * barriers.
+ */
+
+#include "postgres.h"

Huh, that normally shouldn't be in a header. I see you introduced that
in a bunch of other places too - that really doesn't look right to me.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: Andres Freund (#42)

Re: WIP: [[Parallel] Shared] Hash

Andres Freund <andres@anarazel.de> writes:

+++ b/src/include/storage/barrier.h
+#include "postgres.h"

Huh, that normally shouldn't be in a header. I see you introduced that
in a bunch of other places too - that really doesn't look right to me.

That is absolutely not project style and is not acceptable.

The core reason why not is that postgres.h/postgres_fe.h/c.h have to be
the *first* inclusion in every compilation, for arcane portability reasons
you really don't want to know about. (Suffice it to say that on some
platforms, stdio.h isn't all that std.) Our coding rule for that is that
we put the appropriate one of these first in every .c file, while .h files
always assume that it's been included already. As soon as you break that
convention, it becomes unclear from looking at a .c file whether the
ordering requirement has been satisfied. Also, since now you've moved
the must-be-first requirement to some other header file(s), you risk
breakage when somebody applies another project convention about
alphabetizing #include references for all headers other than those magic
ones.

In short, don't even think of doing this.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Tom Lane (#43)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Mar 8, 2017 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:
+++ b/src/include/storage/barrier.h
+#include "postgres.h"
Huh, that normally shouldn't be in a header. I see you introduced that
in a bunch of other places too - that really doesn't look right to me.

That is absolutely not project style and is not acceptable.

The core reason why not is that postgres.h/postgres_fe.h/c.h have to be
the *first* inclusion in every compilation, for arcane portability reasons
you really don't want to know about. (Suffice it to say that on some
platforms, stdio.h isn't all that std.) Our coding rule for that is that
we put the appropriate one of these first in every .c file, while .h files
always assume that it's been included already. As soon as you break that
convention, it becomes unclear from looking at a .c file whether the
ordering requirement has been satisfied. Also, since now you've moved
the must-be-first requirement to some other header file(s), you risk
breakage when somebody applies another project convention about
alphabetizing #include references for all headers other than those magic
ones.

Thanks for the explanation. Will post a new series addressing this
and other complaints from Andres shortly.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#42)

2 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Mar 8, 2017 at 12:58 PM, Andres Freund <andres@anarazel.de> wrote:

0002: Check hash join work_mem usage at the point of chunk allocation.

Modify the existing hash join code to detect work_mem exhaustion at
the point where chunks are allocated, instead of checking after every
tuple insertion. This matches the logic used for estimating, and more
importantly allows for some parallelism in later patches.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 406c180..af1b66d 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -48,7 +48,8 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
int bucketNumber);
static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
-static void *dense_alloc(HashJoinTable hashtable, Size size);
+static void *dense_alloc(HashJoinTable hashtable, Size size,
+                                                bool respect_work_mem);
I still dislike this, but maybe Robert's point of:

On 2017-02-16 08:57:21 -0500, Robert Haas wrote:

On Wed, Feb 15, 2017 at 9:36 PM, Andres Freund <andres@anarazel.de> wrote:

Isn't it kinda weird to do this from within dense_alloc()? I mean that
dumps a lot of data to disk, frees a bunch of memory and so on - not
exactly what "dense_alloc" implies. Isn't the free()ing part also
dangerous, because the caller might actually use some of that memory,
like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
deeply enough to check whether that's an active bug, but it seems like
inviting one if not.

I haven't looked at this, but one idea might be to just rename
dense_alloc() to ExecHashBlahBlahSomething(). If there's a real
abstraction layer problem here then we should definitely fix it, but
maybe it's just the angle at which you hold your head.

Is enough.

There is a problem here. It can determine that it needs to increase
the number of batches, effectively splitting the current batch, but
then the caller goes on to insert the current tuple anyway, even
though it may no longer belong in this batch. I will post a fix for
that soon. I will also refactor it so that it doesn't do that work
inside dense_alloc. You're right, that's too weird.

In the meantime, here is a new patch series addressing the other
things you raised.

0003: Scan for unmatched tuples in a hash join one chunk at a time.

@@ -1152,8 +1155,65 @@ bool
ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
{
HashJoinTable hashtable = hjstate->hj_HashTable;
-       HashJoinTuple hashTuple = hjstate->hj_CurTuple;
+       HashJoinTuple hashTuple;
+       MinimalTuple tuple;
+
+       /*
+        * First, process the queue of chunks holding tuples that are in regular
+        * (non-skew) buckets.
+        */
+       for (;;)
+       {
+               /* Do we need a new chunk to scan? */
+               if (hashtable->current_chunk == NULL)
+               {
+                       /* Have we run out of chunks to scan? */
+                       if (hashtable->unmatched_chunks == NULL)
+                               break;
+
+                       /* Pop the next chunk from the front of the queue. */
+                       hashtable->current_chunk = hashtable->unmatched_chunks;
+                       hashtable->unmatched_chunks = hashtable->current_chunk->next;
+                       hashtable->current_chunk_index = 0;
+               }
+
+               /* Have we reached the end of this chunk yet? */
+               if (hashtable->current_chunk_index >= hashtable->current_chunk->used)
+               {
+                       /* Go around again to get the next chunk from the queue. */
+                       hashtable->current_chunk = NULL;
+                       continue;
+               }
+
+               /* Take the next tuple from this chunk. */
+               hashTuple = (HashJoinTuple)
+                       (hashtable->current_chunk->data + hashtable->current_chunk_index);
+               tuple = HJTUPLE_MINTUPLE(hashTuple);
+               hashtable->current_chunk_index +=
+                       MAXALIGN(HJTUPLE_OVERHEAD + tuple->t_len);
+
+               /* Is it unmatched? */
+               if (!HeapTupleHeaderHasMatch(tuple))
+               {
+                       TupleTableSlot *inntuple;
+
+                       /* insert hashtable's tuple into exec slot */
+                       inntuple = ExecStoreMinimalTuple(tuple,
+                                                                                        hjstate->hj_HashTupleSlot,
+                                                                                        false); /* do not pfree */
+                       econtext->ecxt_innertuple = inntuple;
+
+                       /* reset context each time (see below for explanation) */
+                       ResetExprContext(econtext);
+                       return true;
+               }
+       }

Ok, I've updated it to use two loops as suggested. I couldn't measure
any speedup as a result but it's probably better code that way.

Have you run a performance comparison pre/post this patch? I don't
think there'd be a lot, but it seems important to verify that. I'd just
run a tpc-h pre/post comparison (prewarmed, fully cache resident,
parallelism disabled, hugepages is my personal recipe for the least
run-over-run variance).

I haven't been able to measure any difference in TPCH results yet. I
tried to contrive a simple test where there is a measurable
difference. I created a pair of tables and repeatedly ran two FULL
OUTER JOIN queries. In Q1 no unmatched tuples are found in the hash
table, and in Q2 every tuple in the hash table turns out to be
unmatched. I consistently measure just over 10% improvement.

CREATE TABLE t1 AS
SELECT generate_series(1, 10000000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

CREATE TABLE t2 AS
SELECT generate_series(10000001, 20000000) AS id,
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

SET work_mem = '1GB';

-- Q1
SELECT COUNT(*)
FROM t1 FULL OUTER JOIN t1 other USING (id);

-- Q2
SELECT COUNT(*)
FROM t1 FULL OUTER JOIN t2 USING (id);

master: Q1 = 9.280s, Q2 = 9.645s
0003-hj-refactor-unmatched-v6.patch: Q1 = 8.341s, Q2 = 8.661s
0003-hj-refactor-unmatched-v7.patch: Q1 = 8.186s, Q2 = 8.642s

0004: Add a barrier primitive for synchronizing backends.

+/*-------------------------------------------------------------------------
+ *
+ * barrier.c
+ *       Barriers for synchronizing cooperating processes.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * This implementation of barriers allows for static sets of participants
+ * known up front, or dynamic sets of participants which processes can join
+ * or leave at any time.  In the dynamic case, a phase number can be used to
+ * track progress through a parallel algorithm; in the static case it isn't
+ * needed.

Why would a phase id generally not be needed in the static case?
There's also further references to it ("Increments the current phase.")
that dont quite jive with that.

I've extended that text at the top to explain.

Short version: there is always a phase internally; that comment refers
to the need for client code to examine it. Dynamic barrier users
probably need to care what it is, since progress can be made while
they're not attached so they need a way to find out about that after
they attach, but static barriers generally don't need to care about
the phase number because nothing can happen without explicit action
from all participants so they should be in sync automatically.
Hopefully the new comments explain that better.

+ * IDENTIFICATION
+ *       src/backend/storage/ipc/barrier.c
This could use a short example usage scenario. Without knowing existing
usages of the "pattern", it's probably hard to grasp.

Examples added.

+ *-------------------------------------------------------------------------
+ */
+
+#include "storage/barrier.h"
Aren't you missing an include of postgres.h here?

Fixed.

+bool
+BarrierWait(Barrier *barrier, uint32 wait_event_info)
+{
+       bool first;
+       bool last;
+       int start_phase;
+       int next_phase;
+
+       SpinLockAcquire(&barrier->mutex);
+       start_phase = barrier->phase;
+       next_phase = start_phase + 1;
+       ++barrier->arrived;
+       if (barrier->arrived == 1)
+               first = true;
+       else
+               first = false;
+       if (barrier->arrived == barrier->participants)
+       {
+               last = true;
+               barrier->arrived = 0;
+               barrier->phase = next_phase;
+       }
+       else
+               last = false;
+       SpinLockRelease(&barrier->mutex);
Hm. So what's the defined concurrency protocol for non-static barriers,
when they attach after the spinlock here has been released? I think the
concurrency aspects deserve some commentary. Afaics it'll correctly
just count as the next phase - without any blocking - but that shouldn't
have to be inferred.

It may join at start_phase or next_phase depending on what happened
above. If it we just advanced the phase (by being the last to arrive)
then another backend that attaches will be joining at phase ==
next_phase, and if that new backend calls BarrierWait it'll be waiting
for the phase after that.

Things might get wonky if that new participant
then starts waiting for the new phase, violating the assert below...

+ Assert(barrier->phase == start_phase || barrier->phase == next_phase);

I've added a comment near that assertion that explains the reason the
assertion holds.

Short version: The caller is attached, so there is no way for the
phase to advance beyond next_phase without the caller's participation;
the only possibilities to consider in the wait loop are "we're still
waiting" or "the final participant arrived or detached, advancing the
phase and releasing me".

Put another way, no waiting backend can ever see phase advance beyond
next_phase, because in order to do so, the waiting backend would need
to run BarrierWait again; barrier->arrived can never reach
barrier->participants a second time while we're in that wait loop.

+/*
+ * Detach from a barrier.  This may release other waiters from BarrierWait and
+ * advance the phase, if they were only waiting for this backend.  Return
+ * true if this participant was the last to detach.
+ */
+bool
+BarrierDetach(Barrier *barrier)
+{
+       bool release;
+       bool last;
+
+       SpinLockAcquire(&barrier->mutex);
+       Assert(barrier->participants > 0);
+       --barrier->participants;
+
+       /*
+        * If any other participants are waiting and we were the last participant
+        * waited for, release them.
+        */
+       if (barrier->participants > 0 &&
+               barrier->arrived == barrier->participants)
+       {
+               release = true;
+               barrier->arrived = 0;
+               barrier->phase++;
+       }
+       else
+               release = false;
+
+       last = barrier->participants == 0;
+       SpinLockRelease(&barrier->mutex);
+
+       if (release)
+               ConditionVariableBroadcast(&barrier->condition_variable);
+
+       return last;
+}

Doesn't this, again, run into danger of leading to an assert failure in
the loop in BarrierWait?

I believe this code is correct. The assertion in BarrierWait can't
fail, because waiters know that there is no way for the phase to get
any further ahead without their help (because they are attached):
again, the only possibilities are phase == start_phase (implying that
they received a spurious condition variable signal) or phase ==
next_phase (the last backend being waited on has finally arrived or
detached, allowing other participants to proceed).

I've attached a test module that starts N workers, and makes the
workers attach, call BarrierWait a random number of times, then
detach, and then rinse and repeat, until the phase reaches some large
number and they all exit. This exercises every interleaving of the
attach, wait, detach. CREATE EXTENSION test_barrier, then something
like SELECT test_barrier_reattach_random(4, 1000000) to verify that no
assertions are thrown and it always completes.

+#include "postgres.h"

Huh, that normally shouldn't be in a header. I see you introduced that
in a bunch of other places too - that really doesn't look right to me.

Fixed.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-shared-hash-v7.tgzapplication/x-gzip; name=parallel-shared-hash-v7.tgzDownload

�C�X��Kld[�(�~�����	F-87���3>��a��fV;����7�������^)t"��}nF���s"}�U�1&��	�b�HL�I����1Ab��'���~�w>���UM�Ci���]{���^��?��`P���I��^��U�����=~������G?[����������V����jmy���N}s����A}�q�O`(��h�(����e���O>�����$��W�W;�dxV�I��_����jmm�����P����������Yf���5_�N�g�o���&��wVV������������?��2$�3���/�Amu���r���~:��Q�wEhw?�}�<|�:���f�Ad�'�����?���=��������y�<|>��������$��_���������������'���������W���4����� �{���������y�<|>��������y�<|>����VV���/���������������5|������++�!|�-����=�o����_������W��������O@���7���?����++�|�����m����}���$|��������_�w��������o��������|����?���������9|�&|��te������������������?�����++�"|���
|w��w������������6|�
��������7`~��g��?���_����=�� N���%W�7����M��`����O�{�.+^'��Sx&�(z��D^?��h�PS��h%=�q��W����`��I@}ta0J�n����x9�Z^8
��Pc0��$�q�M�����(�J����2����?I�n8��i����5I����w�������
���1b�E�;���`�8�`��u��xv�$�F��$�(�`9��(���������G��f�	^$�S[aL��	�T��*�p�^$��&�]A(���{\SS�}�>��5v%��Qt=��c�?�FIk�<�����a������6����}��d���x�Zm}_��lw����T����/���e�c�*0^O�$���^?���K�s@@�"\��.?��n�����H��`DbX�k%�����]��G��jW1�f�����Z`���2��C(H��r�6A���+�=�����?�Y�p�����
m��d�����QcC�.�a<����1�|��y�h��w�7�;�
��h����-|M>zP���8�M_X;D���1�����.��?���\��Q8J�����@�d:�����D�J�1Q�Cb�v h��0�%�-'3x���������6>.]b�����Gr�m9���4�"�h�k
_��k�A������1�i��Xt5?$�M
{H�=e�������p�)|�e�,�,�/w��	h���,�y�b�������5��Q�5�@;������-|� ��l9�����p}%��th�W@������o���`���:N&aw�"�7@ &��/�.��Z���2�$�@\�d�TXB��#���{d���&�k���AV��Y�k��#�����~dz��L����U.�����L:���A�;�h.���:%xn�;3�j��j*}�TUp���3��*4�
0	�D�*rsq�CD���� [���WOV!�����	����f^^%�Zw�k��mU���2qe���0�8Ln��!�u�e�sjU���NJ�	,������D���b�q"���Vc5�U}��L4�����"8�|:�4�j�����6�V����'�U���WnT��m�O�d�\.{��x����g�'���6�1>SC���F/��1�+�*?xby_���;��A�8l��A�l
2�^����W�^��:�����x�p�~�~/��<�unj<G`��%x�j������`����/��$��o����V��/�9���E�����+�c��C�1 ���u�8���P�Ui������oL���Z���i8����:����1@��.	+W<��^�m������������
?GT��"�+��1������{�Zm���u����E����Vagd��wH�&����H*^.�g�@�dZ]xmCbs���.x��#b���|���`�����0�s�,0���CEtbX��4�����wH�.^���w���bX�P�qlh'�9a��&l Q,�0�.��)����n}
���A�-Rm�)QQ���VVzp���x�J�^��/�9�,������!���~���,��Ep�?����������Ro��w��>|>��������y�<|>�����w�S�	\=���yO�gg�Gg�������J9���(&V��w��pM�yO���Z��*(*�U�EU�%il� �W�n�Tg��.��f�tmTk&�p���6>EaO�q<�m�4���T	&1V^)��*�_��@�?�'�������?������z��~����w�hR�hrS����
G��������h���%��Vc������[;��~��J��X������U�2���W����V*��`�����0��������%]�d��,�O��R��,8G���G})iM��������e@�c���'Q�����A	7%}Yr��mM2�j�G��?�ZdV`�K$R�F�����<�(�TY�7���;���������VC��8��T�d�C��1�hQ����3�x�M����%r9�s�����=h���j����.y�%k��S�'m�u0���7���>8�>|��������Qq���b�}��k�!<�E���<��dj2d����{�K�T��N��Q|R�m6�}Q!���+��2L*$��.J�`�Cm��}�`�\��G?�1j9Qm������/�O���Aw

n�W_�fT9%J,�Z{����Zm���6v�"�.e�ymd��y�P�����4^�P���GP���	B�)V\/y�p���6L��W��:S�!��/�y9�����k��f�G�Iu2��b�����W���a} {E��S�6���w��$@�5��w���7�'�����?���u8��)kT�^9w������4�o��\�&�1U��6M=������u���_x�2���@�f�(��G�����p'����*�=L�WW��|�LI�1�y�D�(�W�b�&�@S�3��csg�������!z�%��
���I@�b��xo�P{O��S�YFh�B\�� �L��e��@��s��=�	�i����-�M9��y�8[u�b�i��?h�< �R||�=By����Y�8��y"ms�$&5>�
k�ACM�!����|��v�
xR�'t�Ut���z���NK��lT�^y��Wi�E;���8�KwPXCS��N���<��:�%W�jp��*
�����3�^���G��D�C��?���8�v�`�y	(���d<�.����,4�k�����$n�l�F:��mb����!�`C��1��.������)L�/�5*���q�y�U��y���x���Q��gGg��^z�J�o�;xs����9��[8r��_�q��*���i�hL��`���$�jU��#��f���OQ>[����
	�>.��s�d�x}vt�����u(�+�b�o��<��,�|i�T�������v_*��!3A��G��8P=�����}y ���i�s����x����3m������KX�E�y�a)��Z4	o�Z��@3YW����35����Xk����/�l����>=��8d��s��Hk�zf�\�(6<��pQ5�I;�-�"�i�������4��6|b�|<zL������A�)����4ij�U8~N�t�BX���9M�F���������'@��e�!D���w6w�P��d�#F^O�C��Y vb^^���C��'��'�+c��E�f
z����C��A0*�m4�1���+��C�'n_�V��� ��� ���4�E0�oL<�<��qm�5�	b�Y?�W�2�X��$��v�MsEl������W"�iV��%�yK;vi�3g��r2y��Lv~��4�.�����+�:����^������;��{	�k0I�p������j��`��l��W!8���������#���d�a����E
��m.K�ftRN7���V��e�x�<U�[���`1p�0_�\�4���"�YO7�&�Z8K�M@�zk��8(D��x�(�h���w�X�^�D��!��	Xr���)�]��h{c@�}��\�"�a6���j�R1m���|�EB��}:��������p u�.":~H��nn����o���(T�g��p��X���s��|��I��|j�}����m�/~$%[���.a"������AF�6�x,?+8
�����m����y��<3�����'�{-��5v;�f�vP��pZ����6�1�v�=^���\VD|6�K���DjK	���u�����q���T�'�s��e�VK/!���-�v���o��0��'�S��2��9�<���w��9��s��$������*"c�8�T��8�����`���2"X�'������Fg������oo�j���^cw3+������*�Z��&���Cr3zA}T�������e�����_?�J�(#��Y�l�!��7�������{um�������fn���:�1gn����	h;����
��V�V_
/'��~���|
���OG]����D����&@<$��t�q(�K	I9���h�v�2q%U$2P�7�)5$i���Aq�.�j���d��������M}sY�YXMSF��a�@��a���T�=�=#�1�$���cpM=���@D�yk��j���1�����{����7�?�Uu3��	�?G���jd���V�A����^p�4�Q�y��*
���*�h��Q@/H�nb�~�����Z2}���0P���u�R������G>�����������`�7X������$i��FoLT��CT��D�m_F�$W���`v���3)�E�\���F���}.������$�g#2ER�y<1���\���'����~^�gpN~
N�"=��P^@W���K^��}�C���eVY7	b��o+���9�w��,;c2�@�F�+�Vc�.2�A4�TBsd�S�Dk�/B%��)�S/��=r�W�#��.'�u"b;��|a��������oZ����.@H��,���^}�,j�:n�z����p�v|r�~{������������X��px����n�������(B������!0�����U ���ti�����}u�va�M�C���dl�.���Z>���3��^���L=K����o�t�V��(<
,�N��!`H,�!���$D���w6��u��jt#I"Oa	�)%`�P�x=�\��X�=O��`���w_���b��������<�NK'���6���+�Wx�g�b��Wo9��o��h�#b�VBI�Z�*j�������4�z+i8���.�MZ�2�l�Te�#t����DC��C�;��P6�q���a�5���Tb�,��R�I-����0�9
��`����{�{��KAz)��P��(6AN��'�����lWJ�I��j��N�S�p�@!`.��U�:���!�I��������x}�%�}&q������6���!�w�V�/i�����#�W�*��Q��'x�?����?���Y0��]r�Oq:����@��>5�$�
>A���
�{1�K&<N��IxS� 9�%dB��
)A�0�Z��2���2�.o
ZV�P^�.��K��r�JK����'$�'�}A��m����ol�Neb����Q��'��T�����������Py�s��:^���\�8M�]�8�%����DM�(�,�YZ�+�b��B*��.���c�NnH[���{dJ���;y������_��%�2��Q��+Y����z���f< Z�.��-��-|f�7��t�r���)Y����f=%����i<��s����M��J�����8i�
�����fe�W���`��\��l�)���O�4�T��Ue|�|`���j�nLJ1�"�P�����/1q4@��e���Y)�
���t�/'~HS/����-��\B]���j[�{��z��y��B����l+���$����h�^-�>-����{�*���c�������=�������=���p;�0`�,��Z`�������	��'��l1;	.��Kd@xAp�p=x8c����:�����q��������_
�w�V���������]L������� ���k�;z����	�Qf��}���t������������K,V����F����~�=�T4���Y�p���w����x����b����,����0srd(��^	\���QK?���nVvqom�I{�D%�aHY��a($O�}��]���N���E�A��+����ysC(��W����2�d,zN�����n��h@-r
�I7J�0�2�unjm��������F/����jm��nyI�����[��l���[��Cx|�9~��2��d�v�I�����mg��j~u�3A+
�s�~�Ec�&��������`��:-��7c,���y���e����C��'�_�y�)_m,Z� N�q j��W,�p����f��.a��^����L�������L�Z���$�g��.��D��!�_��g<,��6G��n!�1�V6+����l��(�b1�rZg`0�W�Q��A��u�E4�/V��<�mi+24�=QCXUm�����d�����"L��GYB���"�����#i��^���������e<y�����'���R��^r��$�0�GdL���"�y�8X�����������Gw�}i%	��l.����CZfP>�������%�(��03�|G���'WQ����e�J��Q1'��|y�p���rJ��������w>���Q<�/��RD���1E����&��L}��HX�W��������yA������o���*���:��;H��<�yQ�(��>e[�p����}�BD~�����(8�4��v:A!y)�
��W�r���W�,�n�k�rQ��N�w��u���>��bi�R��pM|4!vH���c�(���0����T��*�$�^�b
�/Zj�?���dD���9�0V@��� �Y#�s�����!{����!��'����T���E4Lf42>�2�o�\��$�3��q��������u�
�V���Z���%dP1h("�r�t),�P���}|1���v��As'GlqO������|kr�6�cM��T��b����A8�����������M�QF�hx��`c$�Uv� �Ao����.�[��>����y#�����������b�oN�O(�_G�_������%���B�3[���n�����j=���|n�3�T�S���}�J(��1{C��RJ�2�*fhn��43G�� �������M6R�����L
4{���+��N	S�k��RA��<��;'�)�%�)�'*9Y�����!���t�I�����U*�2;�n�^�D�ii��UzX��V�t��J����t_	�J����t��J����t�y�J����X2B����
�t�yKnH=�{�2����$�L7v����{�Z��|���L�Z��D�
��+O�s�-�+Wci:��~8#ko�b���KT,�sV���%l.�c��&�K����e�.�gB���%*����P�	���'�	�������4bE@������)z�2���e|�����D��"��F9����cT�Y,��:���k����@�(��H����aL��"��;������R0�c������=S�c��X/�QT�L����2���yS��RLwh���r"<��tA�s�*�������!�����Q��K����-J��f_�7��c����vo�l/�����L�VIQ�B�
�;�	�b
D(2 nz ����-���.�d�jE������~�����]�kJ���������W����'r_�����zu$��O�����/vq����B�H���o(�Pq|�_�#1��.^����)M t�#��/b��sD�����g��0���x�����U|Bm8�e���e�1u:���v�5�3)T�����u�&��|��D��TKay��p��q�~������S~2��L}!]�4�!%�6_
�G3>8���c`f��aF�����`�YS��v'D���[��B5����������l��SE$�cz|�F�m��
(�1���n�1KL�C9$��HH�� ��&�+[E�{�g����h�B��E
j��@#�|��w�M�����l_��kWffg�.��u�	�	3nq0�*������#�4Z22�5
�:���u��ec� ����Hdm���.�zcu#�K�^u}���
���ZO�6��~��+i)��$&��-I��e�Z8oLr�#�����B���bx(d1�4|������������ ww[�A��U|o�D��U������r9��U��,=x�5�n�&��bA%���sZ�Y���$m�Jv���B���L����M=2�3@j�w�Z�.�����0��_WS�Ac|+ �N��n�m������i!��i
��}1��pR�	��r���3�S��b�~
��Di�i��s�[Q�4}�1����&��q��`<ike�@�aO[����z�`�6?�N`�>��������I�q�E�i|4�0+[[AK�p`��3�H{*���j��P'�G�#{����t$�S�8)��,	\����tB�H�'�o/,����J�q��9��\@��,���M�����"�jU�����K�~�k/[�����!C������f�w6AT��E��7��������R'T)��!�9�&������9	#����oL27GC����fd.t���l@�����s<W���������d�X��O<��J�����b�^`�j�06�x,��<JtM�i�!�I���m���C�t���u�����b�P/����Wc���cC�9��a�S���bed�iff�U�l{C�K>�j�hi��%YRd2���g�U���e"���^@�>��r����|yG���c�{kc�R�
<~���'M/�)���J���+1[L���aY�b�"S����[�)�g���q
�����bH�K(��D9������$Q��(������&��T:����O��$�+�x�
)]�)���
����|�G}S��Dq#H��X!�?�O9��s�6�`D�x������������5m�:�y����L/i���d���^�2��dk&��uJ�F
�������V�	Z��7�~!��������N�iJq��Y�V��[�\��Z���g��$����['�6���Gz�����d)U	8��`�"���`����\��EM<B��
?��a7�B,��p��F^VHV"�6�9h�"���+�'h�.$s���hO���
��E�#Yo��(A�o�+��*���,�.�d����!_u�f������aa��������������� 
�o����6���J9JQ�R���j8��".l>��{��]������y���M|�)7����o=Xa:N��A������h�>�����R�P�JO��x�.�69G�����2�a D�����8�����u������F7�=�5��������$����K�G��Rc:I�=Y8��2+|ye	����J��������N�e0(GP���=��'S>�o�2�����/���3n)g�����k��Ng���.g&���X3��My������5#E�(1���8y���W���	�����Dt:#9B�6���h��+B8���H^��8nm=��U{`��������������|z:���@xU�5L��oGn���^����L��V��~�������f���%[���y�K���V��H��[��3#X��cHnk��x���	��D���5�c�y��/���uf���,2�S�S]�,�������������#k"���z��7]��U���������!��0oP�MvL�Z3�,^vPh54{X/�d���O9oh�mfhV��ZN+�(�p���k[���������v2	���?��[=��df������
.L~O,��?�$S��tLn���h�fR���9MK��_�x���d
/Nxo�M�:�q/���D����4rR@�����,�x��x����#1u�������}"�%�����@7�F��U `m����v��wK`��A�p���.���8v�@�R����R}����\~q����07muu������?�
g���G0C4V.��-:x�qR�ELY��Ex�jE��i����k����2�]^��hj�@�f���������t6��~�W�[��Ncw�������9������<�_6W�G/��O��u�A�,�b�����s�v�P�X�[5dE�X�UE��$4EZu.2�U%\��%c�_�����o
~��
6e\�?@h��6>n�=�X�it���5��r��6:a*e�3�����T��B/�Wq[kS��E73�7�*�Ck��v�vk��n���"��6�ne�f��8��k��s&��:�A2|c�B
�7�6t�t��A4���K7�$�w�HN<n
�{$WQN����LpM��i
�v)���0.3aOEH�qY��������.��A��a��Lm�*�}��V���af����L� ��G{g6���8�����6wk��b�q��m��i��{�Zg{����,�m����]�0b��V70����{����=�������u�+_%�E3�f�%x��>��	j���{��]m?�H���-�����y��\��3��v�f�y�r��1 j����TQu�wv���Q�m�vw{�E��tcsHB�8���hYe����"$y������.&���)C�b���X�8D��K����-�+67����������M���Q[]��t�PE�]����T,V��5]3��Ln$gu;-���S5�77��������@��q���(����}
��op�����e�&,��wp�|�Mr�=r�r���d-��m�j��Vkw�����0���� 19�M��S������������H?�3���Du���J��[��	w����v7��~����d!z�� +�7T�#��{A��V7�����������f[�f�P������O[��J�R36���SE#�p��\����\ii9��	#s�����@�"O�f��`d9�zR��b(���Z ��q�0�����M	jF������eB���n���Ui�[�����Vk��n+��^��Ais�#�4	W�����s%�G�_>�)e.����:E����b}�W5[��TG�9O��	@{r��������w����H(��0I��31H�Q�e����pC�����^��uv+s0F��������9�bDZS
�3�."�]C����v�Gr~5Mz������� ���_��]�li�L�r

n5:��Ng�V�������q�mj����Q�i�?�,a�F�"a���I5��s�6B=m�P�j����f����G�W�gG�_�,K�\Q�L�K���m������-��c��Pcs�.]��[�A�?�1���B��
,,��R��x[qR���xx������w�'o��V�� &9X���M(W���k����%0��Z�D��Yr��&������\9E�J���Rz�z�����Lf5��������..�w6�"Z�7�����*�O��:Rb���8��?����i�'�G���6�]�D]H]0����6\&v��m��������7�[����5��"r������d���$�s�����'"Y���,��������m�:p
Y�|~���2���8������B�y�AA���,�b��Os�Uo��������#��`�q����6������r�q�z����Y<%��)%�yD�����W%��q���L�"7jO��E)�(��3>;3��e��w/L�`����r��;d�I"����P�����k��p������!�����5��R�P�K���>)� ������(3l�!�������[�c�S��;�u�����1�i�J&2�$��``r)l�Aax0��Q�B�pX�37�0;p����#��p�u�[�Bm(�X�����$�1����*QT]�;�PQ�j
�q���c�q�3��9mZ����D�E������??��zuT ��M�DY%������nC�c�A���^'[@x�V�������v���b�NNY}NN!��mm���l�Y�CN`7����WO1�J�|�SV�gU�~>z��c�#�1�o�9�[��#b �����eqS����FmI�G�v�
��pS*�Hq��8��L������1�����x��j{[�fgs��;��5w�eG=�p���L�����J�Y\�\9��N��:\qS��..b]7�2R�G�f�@�4;�@O���[%��I�����[��xc��(�@������t&�/���znzN�<u��\�L�v.���?�k$o�5r ��%
�A�`s�-;n.���/��b�e���;�{��;{;3ax�Y@V�e��Q�V0Vnw�yU��8�������cu����r��]j5��������G�MI��"�?������'=I��:�!�@����������z����jy�����YKu��6��j
7��;�f������#"����)��YG�W�����sI�M�������A�:_3&����]nH h`��n���&��GO��c��X�8�$L��$9o�IL��qRpC�
�
�4G�bC�����[Ma\�x��_e��/��B�����Et�5�-R�lm�:�Z\�s� ��7N�����8����8���y-�pg&��`�l�/0�&��?�*W���hh<
�3�~�/��qa���g���E=0��g�Yi������s�Q��;�'��3��
N�85��g�R/��T��q4�,��N0���b�����6��C|Y�Z���S�r��Q��7�L��:������`��'���=�Q�L(�/��=]�K�*�4�]h���~S@t��)%�nS����D	�Q������$Qi+�Jz����U�D�c6�Q�p�W�A��q��=�����5L#�>Dw��X���S�=
'���z\p@�����
m�
�t�!�*�#[]��P�J$���tT�4���tJj)�N<K����
�!ln9�k��:�nQ����lo/4�h
��\�����5���y5�2��H?!rECr�8f���YTV�5?��.iy�%��4�(r=�`��J��]GC�����D�d��8M=��xHl)jF��������-A`8r����0a&����[�3"az=J�`
�c������*������{���d[��+��	���S��H�+�R�{�M[r}}|rr|�5���wjN�\��J�d]�<$=
c�R�G�S��������.��t"���$�Nu����bc�W�R��C�����/�G?;:�k���k���H/#<�iD~��s�Z���_�gT��`���y����7/mD��b
aS_���8�).%U
G?�g���zvt~���X}���"���L/�i�@����O���LP�������h�������&H��4����A}��X���+��(�S��|a�;�i~J
�X�������8�V��0Q�Pe��F�a��$���/�%�K��RC�}���1�W�x7�m�~����g*O	�������gLW���H�lU�����b�%��I���.����6C	�R�Y�:J���
�m�����-�O����W�%�H��x&�[�����3N�5a����,���th��j��qp�����8;�\��ZoH��urQ��Ae\�����*U�PH�d�d
����H�
���SN))��(U��������d��<�=�c���%��!
"��2J�H��AHpFI%��">~)���y%d��7Z�&�����mF����>�S)�K�/�w�U�Y����c\cJ{�dX9f�������y@�������,
	/E��ui�Txad��|{��I�'���ccq���J(�������!9`�N.��k�;:�~{t�Ko�q�U���G|c��/�y9�)H��u$73d,.��I�mE�	������p^]��*�t��}>�5�7t~`^X����U�	�0�E�����{z,��B`�R��|�v��(�S�������L,o�����-^�_E��K�%I1�|,�K��c<
I��Sk��uI�76��p.)��p8U-5�&��(���IMRjk�����ET��}q���%xU���������2�rh_�����N*�
R*ds#�.1L�I4����hu�1��/}�H��hLZ��R��Z,y�����1k���d�41iW���G��7���e��+:cb.UR�l�Q����n��R���Q���?�fg3�gs���f���'yJ�$��%���d:������L�1���&k^��$��A �[ ���"DN�b�c��VQEmj�����j'���`"1�u@qm8�:��'H
�j/�B������~?��j�FE�y<�.����>7����H�^����6�^��0���a�$�d�dy�n�������h\!8�0�5�uH���9������T&bg��MQ��_�������V*��!����Z^(q3�"��5XPd��+��9�H�K?�{�U��$�J�1+������5Fr�~��`/9��A�]���Q�w�����K����DA�gPS)e8:����+i*����������/s��Jzq:�$pM�(R�bz��g�z(����w����r�W��u����W>�����"<�<rRU��n�~�	3��j_W��#a&;t��U�W���k^������*rw��8�h;���t����[l���zRT���������
F$q��#��P��%�-�d�
	Eq�Se�����}�,r�w��#��f�0R�k���Qon�I"U:s��
�E������(���w��������o�o������o�JR-�:8�`r&�����R��/�r��/�Q=���"���u^�WN���,jE�,}�����G/�
��l����\�$�<�R����?�K������=�H�2:���rQ�}�Z�hT�`��?��$PJ�Jw�M�|��4�C
p�_��m
�I���J�]{C������r�$x}��X��"�1G� �z���k�<HD*P��/b���\�J��Xs	�,��J��������=��U����B�)�k* �V��-%�J�;(����Q��U�����{@����0��O�q������(��"�``����E���sR�]D�����P��KG���Rs�����O���t��d�kz�<%�(���������lW����}^�y�]^����$&i��~3���DDp$��'���&��4���K��vi�����(3��l����K��T`!�Q�H=D�:	��0��X�8"�I��Q�LX�^;AIxb�YUt�H	M`�Im]c)I��s���E]�/=��\e�\�c�n]�n����,�W���F����[���i�Qp�3����r�2��"�i�	9���;�?"M�b���Gj<q4�(S*F19�,���r$���J�W�3���	XN���cZd<�����}
��0v�k�Me�su�-a�_��X)��(ZW�p�s���_�=�B����x�/�T��-��������c�c�fd����%��(q�L�dF����X�������z|6e��H�5K�����I��2��������R��U�1(�[;:;;=3��`2A�z|~x��d�_��m�9~����������������r
xb�����3qc�_0u�t��g`���2H�~���i�~zj$lk!1�^�}�i�+�C5}�6���:��-�=��?Q��]W���'����Y���d��<��f�9G����\��)�H��FJRUTM��j���_��<�2�X���B&D����`��UY�-0'�bn���|�unt����4��$���W���iJ^K�U]NO����K���C#g�H7�/��I�.'su���B11Q�:�V�xPs���	5'�Rl(V��f�T���|f����[��9ug3��0{��\���I3����GS�rj.�`�j�[3k�A��-��Q����l����q�� �^���������l�
�������NDSyz(���\�L�Eo�K���{�/Z��OQ,���sVKf�[`e�mmW�p�m������
?��/�	�P�*R	C���@�V��>����-�����������l���'c��H����Z����"�G]�oF�#a�d��������)��Y�t������MbOW""R*���q��`?�������JNkO�e���>�9H�����6���B��V])�;2$���
S���z���Z����{[�[����r��e=�#
{'��O��#�J���$��Y6�h��
Ey�>HF=/�m�<�����[�+����F�"U�m8Z�U���,.�-R���[��{�u,�i����hH:luH��aMG=rM�TUi��P�	�i��������yuF��~q�����u�<n7(���V��m����HR��Fh�/L��&|"����|)7;B�UP��)��I��iH�Z�����
C����|m���V4>r5M\r1�{��fW����@'�L{����=���b�RU���&����	)E�6��/��V���/������y}��f�8��h�A3�60�S*�JW�.1'��L��� Fg@"�n�[�5S+ 7�5(D�f�X�A��Jz���������Y�f*#QV���]�n��J��7���~����sF��.��/\�!���,�u���O�������
����U�>z�#wk��/lU����-����"/W���E�lu��U�wA�"M4J�6�5P���s�l����aKA�v9G�]���!���X�J_��f�-Pa1g9m�����	����� ���2�'-2�3AbCI���5��^F�;	�h:��	"Zs#co

Q�?Tt&�H��I�a�;G�+k�u�2B�/i��b�0O��\�����5p���2WE!�9YV3�dM>�y�#���W	���=+��� ��T_���G8�^
��+i�Q3��U/�m�s}20��(+���AE�K�t�u�p�T��K���-���Ty����;8P�qL������Q"�>�>����E�ir���x�����B6i��,q�e�XC��qd9*QQ���n�I��Bu
���lR�\��+������S,�4GDF�Z\���xd����+i���Z���y<��x]Y%����i;��������I�"����x<���It���G~�3�_��i;���������%>�4��~SX?>W\s|�@1��;Nj��Nm4�Z�d���cs=��a�YWT$d�61$}"j
�V�D��XD�t����?�>��������.����D�K����)�Y;�
g�,����h���Tdx��{�W���q+����������(>���0P-QD�������;`&��{��pdl� ��u��`)��$*���{[)���lo���������0W��by���}k�(��Y�F)��S������{��Jc�����������G��-�lm�4��C�0)�� �5��\��/��g?�UX�(�O������,	M0��$�����R�-�z��X����=�9ES�y>��G��@��"��$�d�<��r���So�����K�����`~�3j����Hq�r�B�bH�E�SM�}`������6��a�<oU����������%co�Nfz*a�.��q|C���1E�<��J��~�c�XL�J���U��V�����4����fM�*�����p'D��Hy=%���D<��k���.&+e�����}�4��`�U:
�Y���Lm7dN�f�=����ztI!�imQX��V��Y�5��F7����V��l��xt���'������c��7�����|���[���a��@���`+o��3���<�`E��z�1^��/��������lyX�1������+K��lf���S���)mBf��,�p�����V�����TRr{$��"��|,)�#
Wp:Mz,7��N�������+�f�i�:�Q����n�W��S������N5s����`��fC2��>����5r*���A�<%���G�u�I����9>�)�ygW�w�n��sg��,C�w���8��4Be��������
[-��(2�-�4�(O����!4�����U����V���L�E���Wi6qy0�X��q�� ?�
)����j�G�mX5�HC���|T�*�a�$����%N���������G�W����u��HbQ�`i|C���G�5&p�����?H����c<���N�����&�QldS2�����v����$�A��}-8���a+��t���v���2OX�b�P��r>�.�Jik4����1�I����U#�H�`d��p��d8��R������_��%0���rirQ�EeE�
�E��M��������-`SW`N<c�Y �kW��U�\����m6����"X�y�l$������4���q����O�:"��%A�%��������i��|��L�Z��f�()���Kyu����y��H�����<F���^}|��U�o�*E9�k�\������	�����k�����j3���Sw�}��{.��*A�-�rE]��b��#��&U��JH�E���dL�����c}��|U�_X��o��j�0m�f� �=g��5�Mld��cm�������G�>�J��r��WiRg$Pm�����y���������	�+{0���J	��g�q�#���)Z61�n��l���6g�c#�z,}iq^�z=�#��M�rW������m��M7i�.���`���h
������|�~}t����
��/�t��HO���N���'��y�laL��9�.�o�;xs�����o.��{s�>����+��T����/k������=F��Q�j�
��F�e�}+2a���P)[���O<�h�Z$'o47�FV��@�{��5�a-~<Lf���W����
biJ�;Z?'#f��X>bC����vp
��o�R�/���K]7�_we��"(C�tE[VU�8N��U�5���G`s�l����6:.���I{�9��y'�=����+�F�^�u9v�I������)��kD��&���A�O���0kD����
1��`�F�2abJ����-f�A�����N��S�B�--�m����c����@��L*^'9��������I��X�_�C���?��Zx���a(+��F]	��X?�Im���+�l��{�8�+�Ty
���B�����f�fDa��m������5��X��]r�����6��2O�������I@���u�67����#��:�:�{������b:�9��Nm0}�T�s3{�Q4bt�y���������s�P�O~���6E�a������CSwu���g�� xt����M7ER���2��O�b�"�%!r��YO��
L`��|��e,�)��k�g\d����h�`��rc{s���FM��U4y��2M��M.�.'�K����H
�!��@G?��"N��M�l���F�h��"�3�l�t5K��Q��
f�O��.�9hm������S�d�c���rBb�la����2t95\9/���J����+S�����c�:w��A�����Le6���KY���*m�J�5���%�����(�-��;W'����*�(�(2(lJ�/��n���I}5-�gfh6�>��[Y�x�F�j�&��d��DvV����������x�px|�����<�3G�s�
���n1�7w?��_�1��u�P�Q�+���7Z�yv�gA<#?��DD�8B6o����h>�\�+��m�b[�,m����Y0���R@����QN�]��3��$�0�T,a�<�h�QG��:�X�57�61^p����� F7�%�q�$���	�,3#���|Tk���{7���&i�� w
(Wh���c!gQBc�N`��M�Oa4E�'8Y8��d��+f'�	��<��WY�5Y���������H��B8�h�T!	���9�l�X�S$���rE^J���m���"H��lpk�qtj�}��FX�	V����� �IX:
|��V�2��d�3�����Ri��8����k9�nn��t+�����2o�6��m;4������u��[H
�4:��������%y2�j��}I����3�#b+&
T���Xa:(��=��^��l+��Y��.��pY�/����pX�SS�P���/���.7N��RQOhe�*��h]�s,�[�4���p	yp�e�������iG������9�]Z��AC/�#Q������3q�XDe�9,lZc��;{av*��m����mW0�� ��+��v�#�O1�����vT'�X��r��������>���AX��l�7W�X"���g���Ob.�Dx�^i���ss����^X*���zm0k��j+�1���@�A
�w?��x��/���YLm@��3�N�=��kk�vD���s7H�{������jZ9�(�z��vf
"��IjW��5���#�AY]Ml���ka�:�\k�n�����{������c'A�m��f��ze��SV�O��~�h��
�cjH4�-�Q��<��^P�)��tny�!��P}�Y�HK(����FkK����^�7=G �p�6��|}\F.��V���u[/I=�����[-������~��L�"C��u�Nl���g�JR���.�@��f}����[e3�)�be�b�G��
���dY����H���%�G*Fq2��RdWY��u6�c��6��V��y���D�$�L���T5�1�-Q#h��J���a�B1�����`�O8���$��.�~7	?��H)AC�����$@�4j�CT��P>$��b��8����:�R�nMz�����x�#���9���!�v�==$x��MkY]t����)�v�b�cP?��7O�=�<K��?���z��:=1�x�^��A"��+�
��r��;���Es68���K�2n���^����O���������OY����vG��Z��L�e���jx��I��'2�5:J���M�F����W����������o^J	
��,��$B�A�r��T��)��^MI���h��b���l���#�`�����K�yQ<#��t1eE��sE�%�wzzR�+���I��� �Z0(i�kO/���*�=G#F�h�^��{C	���������uc`#�V�7�����������{A���Yc�y�����q,*����������C���E����0����s�z��J����32v���U\����Tc&G�f^^$�MM��l��!����[NS�S�r��-�N�lu��(q&w`t-�G8�65�4H2n�S-p�=v!��D�2�$�Z������A��C a�V� �E�%���f�O2���%$�q1[��l��Y���[�]�l��4-u�/I-8K��}��>���	��q�����K�����������tS�M�-2R/mi������8Rp_�2Q{
lg��f
C#���DJYW��,�uq��%/|�Z�}�|M5Q��WiWGi��AC�n�ed_���C��K�I�f����D��'��$3�ybBR?jN'L��6:���r1���
�s�0m�Ie-dR!	���}����c+�P��*�JH
	�4�pL���y��;4Ji�R�W�������0EW�\����������r7����(�������;D�@fg�L#/���H���>\g��>|t)�#��L��[�v3v�^a�'�Xb?!��e��'�c$|�����n|�w���S�=ci��*�=���k����N�����)-bl7K
����<����aH/��b����#�r���;]���s.��_��u�cI��fD��p�����4B�?�A;���2I�Y������D�.���r>��^�����HQ{D�pMg3�D�����'��/,j�Bt��gx�UHF3�~����)�����v�eJr����2b+E0��+��X��SJCq���
6@��u
����6K��td9���[��m��"�<�W��=.����]�SNH��T;��%�s�s����>��8]��8�T��������/�#g�[Ji�sv��}�<��"��E`����y0�E�e��������x#�d��p��3=��t���[
�A�<������O����k����]����J�L/�l��$�e_1�����s�8��hz|Tj&�����z��rS�dC�W������Hx�e�$$��������v���J
��HdXd�N]@�������^	��+l�.K���S{:�sv�Tz���H�lX�ET��,���A����������-%'m��`��9����w�<!�"7���n��ia��e�S��i����_S�A����t��g����\{a��U��������!��
��3������u��"�Jl8�����6��Z������z�z���U�V���V*����J�&el6HQ[�~"Ix�G�J��G3�S�n�aw��������%�\��4	��0�/�R{G��)�f=C�1����Q1B60-3,p�2/N�5�6���Bv���6�D�����������:���I���.�n�q��Y8T�VRW#�3�GN�p��������ze�]@�C,6���v�2�{�M����7���Ggkb���Vf��
�g�I��E�����$����+#��X��}�y�m&G�uN6��bO����p"��;���ra��)�pJ^�L�	�f�%|��T=�{r�],o�n���R����P!�������2� ��U�K�Po���Ad�����L1����O��^������6��9����Ba�3/7;��J��������MM����
u��l���I��E�j:�1:W2�j')��~O�n���,��4���40��������=��������,����*���.����n8���!�.;�`��E�,����cr�b� %c�!M-���Y*S�^A���f����8�:���g�y���2�5ZQ�PF�Ad�~��������
�f2���//�`��r{�d������������Q����Km���G�%0#�&4��%����3T�M�����}6��������OT��P47
�1����8�&`����]�>{�����Qr���ne��
E2�q����������P�>�g��E�54��}p�Q	�h��2|I����8�q#�0�I����?�X���|�gG�����$�]�t�V���UwI�m����%{���$���A���;N��)J�K
J+���,����2�(>~�}�w.���[����Et�C )��$/�I�����C���U;�U�M}0���Z\�< [�>�K������Z������
�oN�O4�5�4v��y|�~���(�u`S�U�Y�]	��b�mk�35�����Y�z�~�_�
���`K!2���V�^�X,,�y��^��S�����|"y(7�Y�����hO��,����J�x�zD�=��d:�E�6�f�hk|��)�x�c
7�Mv�x�u%�W��y�\�C�����[H\�`��d�Y�%�m�
0C��`��z��1r�Z�y�3�u1gY���6k��3��':G(�2TI�1����$�����JTy�uR;I�l�=�H}��zp5��8�
�	�"�c�P�=;�p|���h�
����1��1�A�f��7iAQTy���j��I0�����D�'�!��\�lFX��h�j��{�����5�CK�_x#��}�>0
��ZW�&����4�D��B�Oek��)���L�F=UI{��t����4:Qz6n��� ���<��.��5B�^D�nc�r:v�s6�F^2��M[��q8���Y��:vEKze�%O`��>��m�e|���8��2�����$������k����syM8�\~���M�S��L�D��1��/���b�XC�����g'���%j|8�{m��sM_����-�B�b�����;)�Lj�����8��{{E����r������8�'������s3h���"C�w">�I���e��di$�(#��]:�<�vGf��F0��}�C,��_���6m��'�@���:������+{�}I�G��F���X�E#�0V[�r���Bk��(�0�Y���R��M4�t5�Z.��G<C�5������1���;��f�q�)>X@���#�a��!i-����������X������7o���i����lSS�1���/o�[�O�`�w�k�dk���QL��D�������`�%
��z�p;��>9_V>�c�W&��(�ez�p��D�>���2��Q#��i�&�**��u�s��5�'k����)b�*:��3�K��U�e�
�����.�Wt[!�n6������9xb������h(��vc)�;}��&y��$����H�x�������������)l�OG�D;����E�����!&����(!w�^>xB����R�x��~�������|�V"���5�%%u��q?>��Il
�w����������D[��cTcs�-��'��sQ���"5���>���F��		v�R�}?m�,���`�I��D,���j6Z���Gg����
��F ��z������n�]j�~������U�����l������b��k�=@�Xc�vEx�8<}�g����7gp�>z�r
]��G��!�+(q
.n����*�r�2����$�@�d.g#"K���{�,���������i�j�`�������j��V�?�����M>�'��'�Gv<�QI&#����4~8;�8j�8=}S�\���e�B�{,^�\�������2T�G\���m�s�d�Y	4R���{������f����nm�t����B�b�e�����~�z`Z������,���b���Y����/OO���g#�hB��z�l��r������;�f��`�������������*�����Tq�K`�/��x��<zu���E���O�Y���}xz~�o�O����U��}y�6�Vw<k�T��w�Y�'5�v
u<U<}pv�����H�.�~|��������]�U=}wtvpqz���9��N5�o�Ieg{����/{�
F���F�J������kB���$����G�!B�����^�_�������V�Nw�����z�~^OI�F���b��.{>�;�?`����!UJN(��Qb�W�C!x��Q��x�+���L��������[;A*�A-���:�F����E�y������^�'����.m �J�'�t��4�N[u �y����BP�51�er�f�Yw�����3v�t�F��v?�Ft0C<!9�5���R.��G$v���8�N0�j��q@^v�g�S���G��v�6+[����z��18��zi�>�R�����?�h�s�F&f���d�����5�,U�,~��h��"9K�>�[)j���D���H�O�Q��u$RFj���#7�'�#V��� ���t�*��E?Y�'L�.mF��/qi"�������*��s�������?PxU2
�@'|'�9>_��F8�}uACY��������oP�r����#��H��������
]�c���!H!�QW�P�F�w
9��Sb+'�r� a��7b�q���"����F���q�>��$5�ZG����(��	����R�����n
����)YT��:+�Td��^��(��^[�S#jg[�]V��T��#u4�+��	J���"������+R6�^0����45����Y��D�(��R>	�M�i��%[��^��!|�p�BA�9^������1���k��6D������������*����RL�d��S��
�,p�e�"����O������*�,����FtE���	*=G��F�%U��`B=�����HtYKj��9I �]�C���i��F��a�M%#���BEFfGkF�Z�"���Y��
�=H��'��z	ndf��1��?� }E1��!�n"���������Q�QF%0������g��0����b�E��������-�9�r'!��C����'m#[m��/���++� �}���0b�?��o`y�x:t������|�V�������=��A���Gy�>��}�~{��������K6�5��u-�J�rr�M%�=�������@��)
��q��������'��q���"p8��h=� 'KkWg�}d��xqK������J���^�D�5JK�2�UI�|xEq�5��l-E"],*�m=o3���LQR}~��]guS~T�!F~1�2/[�H�x��ZUmY�6�}�c�17�zl8��Cf����������Li��$E����9+����H��R,����y�����SN�v�� K�����L�Q�T���+\���c�d�;�{^��K��W����0(�q+�e�����R��Y�K��������S2�5w��
��Wg4�1J8�=U�
���`�z��}�?j���~������PpC�q%�K�x�D!V'��<G@��:
]���F�-�Mt�5�8�'�y�P�MuF���h{w�#�Vy�H���PRQ6�K7
6X�S�$6
�u���XwP_�c��G������S8Au�#�E��F���[�rV�F�uk[zO���J����\�r"��;���^���m�����k����k�,"���W���}��q�V"��Q�H�Y,
��������]D��<BI�K��PE$��t����g� ��h�����Dp�f	����;[��[n5E�� ���:������,*^�(y�>�+�
��*@"�bc�����kc�EJWl�1�P���m:�Mr
G=k�zZ5�$V���3��Xp��rksoY\�.�"ug��dO�n�b_��������_9��m��l'*z8�c�9�j�a�*$s�)n��
��p�m��F\p+��p�������q���G_L�c��cp��c����-���a�e��d:B[Xcr.*D�-3.��}�5R&�t��U���(����$���\�I|6��2xqI!������{#�)tbB$����N}��;uu�O����TiY*�KY�d_-IN��l�T��6�x/���f�ada`x)�;�k���X�8C�P����"�H��/���a��p&7]���y����(
�JW��������?b	���b�������8��G�����"������C�e�e�r���;���V���"`�B3�Yx�h*_OsG0f���1Mj����2�6*|D�E�l(���({�������XAe��R"�n	EZ�|_>0����x��H��GF�ac��DW��f���\r�:��''��j���V5-�W�B��6�}\[��"��Lq�${r����<��W#�4"�[�k�N**+#�0���;oT��0ub��)�h��7�����FF��q�C��(
�H_!y�.������Wy��p�f_��)p��
Xb�'��2T9�{��bF�]q�d;/�m����O��ifF?n�%��m��Z���u�
��8��Yvi�d���+���%��;��m��b��1r�O��0n�q�"1�iP��4�k����\m�B�
�<�+"��R������h~��3�����r�X+����D��������R�'Q�S�����W�@+7���	���a+\kA�n��b��/[ZF������WHN�B����eHx���u�����h�ZF D~U{�Se���V���S�Y�NrW��(dP)��#�y�����N�Ui���D��s>�����b+���7�.���������ONV{I��T���3�r+��
��0r9�"��/dfR|����{����$]x�����m�Es�f�]G�h[]�����^�.D��$�&H�����o���I�U��o�t�$ 3�Kddd,O�(���u/*$/��%X%9��\����\{���"��8���X�����g���&A�T\�b�m���4�����,��h�����@Tu��5��"9�������C��~�=<cG^5K=���-P�s�
9m�L��Re��Ie6c
��#/(9�Dv���+�J����V&�F�Y�����x�S����[��#[RGH���������;��i�d���Uj�����i�T
��G6�R���\��uO+�gB�nP�^�����z�t�}m4v��{���5[�<���R������k����qc�f��*���2^\�PbP9m�
;��6�����9S����e����6Xm1��#�Z�Y�xe�$�Ao��>c���<g��%�����%y�BS+�5��U�\�"��gNA�T��8���(�&��1�g�~�C,Y40b�j1;��cf�����;��~�������-���lY
�j��~�Z|`���r�.���6�����A����AjI������#T:g�/~����0L��*����V���jq��B�\OPxC���f�=�����o_�3�f��F���������3I4���C5�m���Z�����2d��T�&Co�]/o�a3��Y���g) O�	E����������������a����]���U��e�m�z��3������'�"yvs��0s��M �|��/���D�Z����R�[�����R�~�XY	�������-���8��u����I����������3����z����W@R����R����N�a:!����S�Y�n?�_5�z}w�9�D����F���J��jw(&�=��c���0���i��_]u����
nQ������u���W����3�F2�M���y���n�?��I�d�T�m���������l�������:���$�/i��y����=��^������v�����;�^���9u���o���O#(�y�c�Vk4��iR�������Y�$x><=������r���G��$�D��C��N�8���G��7��})c�
^GOG�����-@�����3�v����N�>�t����y+yV�/�^nM��?P
9W�X�]��	�"��yI^A��u�3�_���]Zn"5+��"���i��K[�|��<�_�:4�v"�ud��<�;�����*A�z !��HS�'&�1�7����(�L���0��������b*�dq�z���$��ic�w��m�} �UD�.��\y��!g��6�������J:Z�1�n_������E0��z���I�����/lr���������������.���������N+��)�8�M�dZyx�����d��i�N���jx�}�"K�W��������4G����#�(�U)\@-�:z�������@~�J�9������,-���_7e�}r^]=kD�����M��������#N�b2��@_�a�,)!�Rc/j��`������&(k����R�����s����!����e���Z�������o	��-�i�%�X�899��W[_��.��u�B�p;�����)���o&E�A@HWG��
�
�p���(z<�R�?�)I��G��!����K.&���Ne�!,�����~
N��j������Y�;e��\n����vo��U'����(O)�N�����k�����x����7'�d��4�����\9�J��Mu~c`�g=.>F�/q^��w�������������W���'A��7����I������h��-��T�%I	9L�Y���p�`��v������,�0�F�-���) �Z�2�e:��N1�V�9��{��;�$��g-2Y�aw�*6[���&�X�RU#iPvwE������i�Mb�5B���\�N��u�� �q8��;��������FR'= �X��V�]
k�R������]~�����w��eC=U
�2{����c������m_v����t�����.��|��p� ����v���6���lm+�������������$�a��a6I��&��xZ���7*�Sh4xt�`��$�{,�e5�� ]�8��- sF
,Q���P(��z�F��^�=.��c��1�c[b	���h�LR���_"\_�B�a=,Y3�T��K��,J�Am�nu�V42���m���Hr����Z�����DGU�������>���fX���	�{Y`he�Jr�G}�9�����ZN�C��\��z!�9����1�&�4�8�@,�}K��f �����-YLU�vS:e*���_�(����
���
"�~f����s������%�\������(H�Q@�<�����D�Km;�9��|�R��tBf%B 	�Gu�dy5�R��AG�[����$��U�A�%�
�b�����u=���R���w(�(��l/����
��'[�J��|��My�8�x�g������m.!���J�M����,������SQ�x~cu���e���G��6xbEk��(Aj���W4������f�`&YaD�q�$����Q4��T�`�Ek��r��ut�w�4��c��hpD������%��	���#�.|�fS98����A�(�����Fk7I�D:���j"��-?F�Y�^�j��N�P���{���@�Pc3��4������p�P"f����D���+A���A�0)�4����1�T��D���7�y���B�s��t�q��y�����t1�d	?:�?�<�����O��)h#2���` �HD��|N���t�������/���B	/x�t�S��_`�&EL9�6��J�b&-{P�_���r������s������\�����6���y�WBFP��m�M@P��+ F(���fU��+��hb��^e������0@�-"oa�k�LZ�"}�)!��h�������a3�����.k�D�f�B%t��-p�UC $�C��������RZg���7�5Y~|�lqy��}�u�o�X�\'����k5��s�6HzZ8�}�����?�|>EI���RRs�W-��1e�(�*������ 5��td|;H	h#u��f:�����`�$^�|�P��d���3,���k��Tdk����[.�bD�(J��CJ��!1X*_�I~�``�a
j>�@B��T(y�C�A�iz�����Lx��3�����DON����X����0�R�j2��[W�z�?�7�{��e���PmsF\�6��V���' �����0J��s�A��Ou��-�#�D�-v�{V2`�c���{�\0b�	d���E~Yf�4�n�!SR%G�+�r+=�����Wn�����l�FW����,
��EB��]��"$��c�A��g4���b�Pc�<*��!+c!�����������1��o��ak����_�$�-��C�y[��/C���8������1�M�q����@a(�����}~Ci���f�h>����j�{�&�9�s�{dz{����~v�z�����IM�������4��4�1?+2�d*�����)���YM�<��8(1������K	
X^��XvF��WQO���u*�R�>9I�]�u-�9X��'������L����n�H�^ ���R��zyQ�g � u�X�I���<��"�u�	Ki�8#@.FH��8����v:�#b�zq>F+���3%�eL���^�a�����U����[{WJ��,cRvEL�.�)v�|?�R�YU�A����v`��+�N,8������X��6�,adcc��~�I^��S+[L��It��~V�'6�V��c�)���DQ	��d{��O�l���~���W��"�2�������-
��1���`��s.��")	�����H��`&l�&bh �S
L�(r3{B�t`�16� <��Ic������k����1� 6��d48�>�?�&�k"�ks��,z@��yq�p�!
]��e��E��N����	�����ZN�����D$^����"=!�ZVyS�n1�j'i��s���\����,��-.l�@/	�r4b�;�M�=��[������=�y}�]6�1�����
�n�������po��|[�@'����g�W�I�>��a4}�\�H ��W��p0��W����<G�!���W"��$E�b)n"�A��6�aE+���>L�PDM�8�8�#���akN�K���k�)��b>t��6H`n�~�'�����Gj4�bx}�Z����B���D���"��)r����	^��(��k;��@�f<{^N��:;9��,�gpcYij�gG�ldD)��5�XTC27�vQ�kH.����9��E�P�cAZ.y�%/*��t��9-�y��!8�-�"�8L�o�Ee_3�j]���^�^����%|��<��� Zr��>���-��F�L6)[t|��<l�~��+�s�WE�U��fX�����NJ5t�l�<E=9�T�&i���w2��lV�go���;~vz~�I�\�������w����wF�RP4��������
NB����4v�#�A��6[�mU����Zo�/��N�Q�YN�����(��/��<��
�`�������)"�Yc�������^���`Y�H����$a�%83U��������m+J�f	�Q*���T?^6��:��+�{)$Q��<mh���7h�������]��+�]�B�=�����<���"�������m��������g���$S����<O�.��^�"zbi�*y����n�����3x+�+���x1��i�k���yiG��q� b���y=V1e�g2K�}-(�Ag���^�:;WQT�Wr
��+W����co���������)�DJ8���_w�{���p��w�N_����()��}s����W��)���J�d�+��t�����������S���I����������C�q��O�O�����/��j1���z ��u��������m��h����vk[�l�;��:�w�F��������k
����*���b2KJ��b�a�{L��/�w8�ZW���I�F�x����.���<��[@�������[�����k��
�A�������`}C�@r�`�������E�dk1}0'�Mc���2�,���P�4M�1~�cP�.������0������R�����Vso������V3,����R���#U)�j������ZT��Z�imT�^*�{�-��T�RU��|�~=J�������� ��I � 	*U�:������<�\�$��t�N���.���x�L���+.�4_���Yt5�i�(���j�3h�����%�j
"��-�	6��[P��{��6�~8G��S�������5
(W�k���q���-p���~�J.u�0�]�+��?fZ5���{p8X��	�Kb����C���R����������I��`)B?�!�t��z���w��A���G�'��1���h��miu�:�U�)3b<P�)�'��p?X^���N�[��2��yDi�O�!�����P��%�L������&�D4��@\<�b�����4���<����3e�c�c��|=���+�Owi
�3�) q�O1��c��>�u���i�hy��^����~-.�
F�M��MD��we��Bdh�&X
��E���E����1�rN�i<����������1�1����~H�)��P��8:��i�?�%�:.�_u{��G��|�����;� ���L��>V��?���0����Wa=�C�����3%��VzG�����,���5`-W�t����j�)p��������cW����<=�KQG�����<N��������]�9������?�\P\L������A��H�����<q�����dQ`z�"��:v�+�w��1�sPL���+�������x�^���
�1�+,�t,_C����*��W����e���lk���W�g����[@�mn�N}z#3!K��Q%��cV�����L#�-�(�a��M3�+F��5R�s�{~~z�������cM2tF��[Q�?|�a�{��lK�,���}4
������X����������"��9�+x�g���v�\��N�YpJm��R��N�%g���,�,8��������Xp��HXJ�K��� �����8n��l}��E����D�O�k�n�
��/f���dA���s�@��B5�)��hu�cI�(
n�����kV9���R�!��"�'�y�8In��u��������)��a������4�|�����7�'��k�����@ �1H1BTl�%��%������${F�^����3@D��u����y�"��y�4��ZP6[pH���_��|�����m�����Y5��zE����l1m���sG��i������� -�<>9��������������iZ�}4�M��'���7��EL�0�X~���lnR����]*!3A<�ERj��x��$��r�beD�� ��4J+�=vLo��'���i����<k��]�
�J/=5��Q��9���To�c!/Y����.���0~�f�GO"����vep@��!�K'�nQY��U����/IC��Y���8����?e��_�T.�g���e?�:��6��j�R�RP.��Cg�	a��n�\���c��)J��S2[���� �g+-��#z�I���S�7pI\�����D(TH�j���q:O���CJ��Va�`t�'��!�,�\�RJ��I5#�J^j��>p�o^�C7��+�u���A7�1cEy��Z��S��G��nK��P+�a�9���ZDY��U�S�*��:���&7ha)P�y��I�%��R�Wd��>}�;�����4�U��x�������|F"��w�Q� �&�_��~;t�#Jr�b��w.w�$V�%�lP�M�����9����x�|�����Rv���6ww����~�F/�������dY�dX/������<t���"��;>P���#x�����b���S���v���_��p���C�$��������^<y���������p��q�(na�.� x��&R]N|���� �q�!NNx��?�C\���8��e���d��-e�bxO�Fd���%��;W�s�=����R��g�`�[Zf�b��^m���^�i��@��i8�F�L����������?t{���Tm24A���$��14!��5����0^}����R�� �d[Z	��qZ�~l��H}5�Y�xw
xca���>�[���5w��@�t�,�:|�V_��	��]z������Y�?�@a����679-#��z
+{$E��It����~0�f��}�Ex<C����jBEx"���������ZJ��-��-!�%k�$���I���i���W��g#j���uj6�	'����G�;�����"�u���8S���0yf��E�����`;� g~:�>=��Kxp�C�?5H�Xk���	������l3 ��~V=���a���������4{����J�w3�+�j�����j5��jg�Y����o����V������f�,�KZ��y�P���0�?)q���
G���$u�����p����OOx�+/��rrz��x��d��.H��L�����Ap<W��#eZ���%�I�-��1�*��82�����j{�#�='q:���`���8�� r�C�Y������!�>�����xC�hHl�P��\7 ��|DP�ah\�_O������d?��-M��2� 0*U�WB��H�`_������XHHS>���BA8�H@��������<V}8��3�J�T��@�)���q��n7�uJ�"�:�B6�6�w��FT��t���S�?�D�n�n�O:7�8��SpI�!���*�#������s3W]Dy��0�pR5�Y'"U�P� �#��9^^e�(�yD�Wj�1<�����L��4=���$���K%�=���2|�[��Z�#���)���^�L��+�N���`�D�u��+��F2�+�&O�o��P�L��y�?��&jjl�n�!S GY��IrK���F��t(��s���������:�`55D���n�A�<��Yy)H��A��s��KN��
�2�������4�Hx��W��0��$n5��}k�������p1I�`?�����h0B�f ������O�`3�8�������x���R��KT�m��2P
C��|ja"�x�5M�fv�lB|J�t���8�}���w�W�+��Se�(�gH�����8H�aa��:L�9����j�{uh!��MB��EKK
]����TSp���&=�c8#m�����Y�%;����	C��0�=�ME,����m��,f���3�`�)����P�@�������2S�j�
��3{O�}�4�pCO,��g�G>N[DW�k�g��(��:W �-�}��{������5pG�-%
�����K2\)�����P�n�j�K�����7����������9G�T�I<�����s~6�Z�0�h���#������ g��L#'����,���4aoEf��K��9k��3����h���Mtk��:��E�����kW5�Q{���%�� �������	�R���b�s1��(����������*�rp
#*;���EyL��"�kQh��v� ��A�2��p��
U)y|�y��Z($�\W���f��N�X�3��<s ��1��cIC������F����S�3+8u�Oi���"��N����2f=+91�	�����3������0�.�<S�&���A�B1_�=�<R��^ [@�$�)[S������~a�D)d1UD���{�)�*����<������L��k%m��}Vd���c�5vpb�+ipi�������}
^���8�`����F���N������&��<^�#���������'j�@I������J�,;N���iPq��I)���'����?�����!$\M���<�|���*ko���Z`������?P���`���$���Cr������U�5����?���:��a���6o��T{�d��:�c�pno���]V��l��*��Si����a����":�8�<���z���p���
�,s�E`E�N���[s+H���/�(UL����R]��/����c��;|��F|��Y,�P�V��E��2RW��p� ��������U�'���}���x���a�t���� b�s�������S��b�Z��T]���C&i_��C�p�X+H�bH=�zdD���89i%���:K:k���{���(p�\������=�������>���Ee_<'gb!�Y-���>9M�a8C�;��
��t��e,��m�_�
���A����ta�_r�x��N�O3����,������g�������f&�qJ9F����
*Q1	o�r"���q�Esf�ke��$���.��h���p�QW�z�`
eLeg�<l&���k�	>�������QQ{����x����b{bfJ ������gS�S�nF/��1�l���1� +gu:g�L����4{�uIyX����)[1)��� �����%7�|�2p�y?��D�I�q�}��zm���E���|����Q;��Zz����Uz�n��4��3��[o
���?�c���h�������c2�K1��HTL�~l�c�Y[-�q���+��<���=3��������"V�/;�h�����:����g����>��	�A����c����{�v$q��aM��xZ���n�wy����|��Cx�X$���������������X��e��������F!�(�������!OU����b�nm_��Q���������^��a��N�E�Q��t�#=M���A�/�����%�#��,�T� 6^@�W�����Lxb�
��x����;*�<�rj�����K-�0����(h��$���6�:�y`���� z������p���~�77��G���5��Y|e�;���>�� `��L#>}n��E T�����75&y�8xT������R�P���C�:���B�����U-b�<�4b�9r��]�����fUy�������,e
�bQ����u�W7��8Y��*��)����h�\]��Xd
�,��o�-���%�x��*wG�,��Zx��(7Wd�|��e�0���0�7"��]RQ`���p�����45"7WFn~����hlnf_�jC����jk+�\�'��k
��(�z��h��@/[�����*��&����b�AV�'���,{��mme������x�/���a�l�����o�Y���$&�y�hVQ�=%�����A>�"�J���I"�p�{�h�_����J���v���2q��
�b����r2�_P�F��{��nl�3�w_���Y`9m�.���u��F�j���}���n����Pog� ��6B�HTD��.��<`�}�4��"��,����Y9��uV���a��T�������e��$����a��+��A$�YYp�/,��<�^1����uB�J3�>%��x/5�N;l���a�����^���r�6�����M4j��8�v0>u>���J��[�5n=S�R./!-n1��z*�W���f}����1��q��Oa��|���h/�Z��A+���ug9����Xxs@-���5���U�E��Q8S,k'�����t|�6xu�<��g����u�'����
���8��~2v���B�"�l��`)�f|�
�*Sj%W��^�����/�����i�FN�7��N�������'�!	z`h!n���������)����
�����@J�TqO�{;������fw���"��G�������@y�����6����{���PRY�)��29eu��i3+���W�2����6��)$��`��Oc�������Y����Xq�� ���v���N�w����������aS��`���
�g�Q��R>��u�e,�+��
,��?��$>eB�u���('m{�0�h|=p��hI��\�;��~�J]�w�!�yy���Jq��[��m���.^X�~�����U�^'����=x8�?��P������{�����||�J�e
�I|..O�����C;
����Df/4���J���u���L�y���j4U�77��4b��A7{J0}�wWM�����������<u�Jqo��'���P��v{�����w�o.�g���������0<�jK�V-��Qr
�D��l��Mx����Y��VE\b���mA�Z��Y�w��hEmO@�,������eW�A
������j_��Q��_�"�|��R1Ff�p\�F�\��Eji�4�I���O�SQ�����LH���'���i���0��X�\������<+C����{[��M���$@���U^F����u������n���(��$� #�E!%������e&	�@u��Y�
q�F�:�I��\�a��}	Bn��u'$�;r��A�<0,�fl�A� j�`F�����@�F�����0������@1KuT��L���Z�O��M�%9!-�)�!�%p���Z��������G��e�Sw�9'��E��1���4u�D)���HGO�B^s{}���Z�9#����
��i���4���Vn�]����-_�,�g#	���b��=��&$j����j��aq��`\�Q����e��qj<�)v�&l�j���0b����g��kQRA��#��=��@h��#���Q��V�P$�1�I���Lb�E?��;5f�*�J�|�������j!*^[�#,���3���\��C@&���c4���@�]���5�������Nl��2���A�G���x�c6$��y5�Y����kEw��#��� ���Gl���g�
	^�s���<��pl.;�~
Gj)wG�]���������1�S��3�<�����bSE�=t~G�Q#��hL�'c@�����1Q�D���>�,ht��W����� �S1���hW���LW��q*��a��n������'[��cy�0&��������)��Ls�QS�g���S.gPf:k��w6�w� � ��Nm���,�p!��w
B�@�����=��g"C�1GM�':��p���a�P��U*/8a9�F����z�*�����	D���*(��}i!�h�1��
x�I^�L8��UA�7��%�������V�����#�����UU_�M��k��"
%�z�����G�n��)%"2�5�|s�}!�3���aUw�f��|�Y�>P�|�1Q�yrat�����_�1o��@7O��g1d��F1k$Vef�mc�22 ��\�WeCe���
g�3�e�5M�N�D��m���oMB]|23������������"�
�w��rF��T��9� XU���^8���2��>O���Y����i����GAfi�������R�	�l��TX�FV|���JL
9��87*��#Bh��+�[(Y����9���[��kq�%"^G���znX��[���ZUI������O4L�����>,��tq�@��0����	�o�|3�X$��Q4�#:b&hG��!"�S�'��F�C��4t�T /$���C�T�THp�Q��
<���|����������Y��C������c���[��C��r#�btg������
4XK���,�����4����h�����		0$/QC���H+�1
F�G�Q�7O����2���JdE���"ucI5�T�J��|��JV��.$�04��}N��]Y�@~���<�����d��r+1�>�U@������4v��������y��_�HAM������A��)%�=!�4E��q..S����H�B�GsJ�V��g�Uw�5���-f�6h��.G�f����W�����������omzB�o�D@L��ZCJ�n
-e��)7��4l������I_�5��T�p��������
�\���0��D��$HC�A�WX����"��1���av�`��4��KA/��'������|Z�y�p�����%o=4�G��ilq�12o��Y��#2��}�{�CA7}����&�mO_rr��Re9�U��|$)�z����}!����2�\�M�
�|���q4N��>�����5�o����m�'.{���3����kJJE��������*��.��+1��qr"j��������	.I�8"�9�<H���g:>�����u�Ba�e���	��%���o}J�0D]C�g&�����gI���9a,,g�|N9��k�%(��	\�������Vdf��G�y|�T����p�E�Q>� z%dA(4�Y����pF`��ipb���qd��N������M�G%Jx���
��Ot�s	���?��$���"�<_��7���
3�MF����bR9I�������-8=i���Kj�fWW�qkf&ic�>�S�M���R���!Ub�^W�d��CK2��<����������d�N^�t`A���b�S��Cd��;����s�(�1H�-�#����h�c=s�C�K6�X������E����P��������������qPw�XB����`����<I��X�Q���6�����3�
�Ni�v��PSu���K��D����#�L}D/m���X�h�#DMv3�.(�3Ht&���+p��.t���qj��KW�rm�_.����K�l�j�*Wz��"�%s�W �S�i-e1�(z�	\��-��2�!�4�L�������q��8i�bU���	Z��$��o�c�H��������f6�������&H��0L����M������[�!����bFe`�x�}b��a$F}���3).i�K._����K���h<���_D�2*
�g��|z3m�!m+�����R�����\��J�9�VG�
�)Z��2Z��(���-��V�X�r���YSHD�[v��:G  ]%c@�a�n�h�Lf��'���b��Y�&{��]8�9��'gv���8bA|$EK3��,m2���r�DNY�[�j���!�a]�7�(f���0*7���$�5;�x��D��q����%kJ*^���5n�:#��4�>�>���D�[!7���w�M�A�m�+2O�W��Zi�&��4�*�\I��.p������^.�rE*�}E_�����%^/6�����<�(�1|k�>���5�t�G�Fz����r00%W���9�
�j^e�Y���[���k�s���S����&�~T��	���?0��3�����;�J���w=��k���������,5��x��r�${8�����<���Va�eI�q�kf��D��$�n�����l�>!�\�<�m�\�4f�$��c����A���A��/V��&�]�������s���g��=��W7!{�������?lnRg�-��I]����bUmH������3k��M�
V��#�S�����.���c�C�}���?�Q���(�I��c�-��eB����G��N�%�x%t5�/��*@_���uv�T�[�G���a��Q���b�]B[�?vm��h��M�#�x��A�q�:�X�a��������������M����)�:���������.�BJC���	�<x}|~q��p�=�u_�t�_6r�)��)
h�LiQ���Jw���~+Z��S�+�t{���(t+�4���(]i�lK������
����U��,��[��V��4�SQtH�s�ia����8F��.Z��_)�����	������02�i��������w����CY��\������d\|�[���������[��_���d�G��_����=���������S`]��p�w��]�y��:�����i1�����d���U����n�w;9��V�w����_���?�n�|e���O��d���#/V|t�8!��F\����`���	%���3g8�K��Pr�p�e��T
D�1U�^c� ]�e1�I[��}�	�R�*�\�ZKfO'Jt����������N��*�^4h����|m�!e|�@��4j{���o�d�,�(�K1�@p��:/M���;!��	�VK"q ����)+�:h�`(�,�����C�0�����<�T������k`��'������b�gOi�,����BxV���D'8����nd����9��;���,�/��*�����1&�*�*�,_�y�Np��jYk�����x[8��6�fs5%�V�����ct�^O��ja�e�N���}����i`e����2�q����`�L	����E��2mP��l���������@���Z�%����D��2���C�@�>�;�����J����Gn,j0n#"U�;�����,���j/��A��Gz�B��^\:���S����d���Xn����{4m+}d�|�7K^[�R�~�%�+TN�J���{?b��n�U���������������	���&���'�e4��Nf]������=�.����PvG9^��[�0�`Z��lP]��{=�����e��h�Q�`�pP�����@8�u2F�bD8>�+��a��~��
�\��%Jp�!0���q�*����"�?d�	�{3��@���'G9���~o�6
15*�H��#&�'<���O��s��@���d��HIt��m��K i&��qDc���0�
Il~S�M2i����q��|A6�[�]� �1�����s��d1��7*�2��)� b&�����"�3K���]�6u�F@��D�5�.AzH���(���R	�E�������%��'t�j��	��
�%%���r��B��U����Pp��l���9a�i$p
������x��*���sLU�=����U{f.�Qk� M���X`S���2M
�1c�q%k����pq��^���	$�&���Q�..f�g�|%��L��g��Ne�)�I[�D�>��R�[��q����i+;�I��xg��xZ�/�O"����pB���e��)ttjuU7ih�i��������clC
Y�^)gqH��0���Ha�L�)����=�d��������sE^�#����q��p�P�I��E�@M2����1���qv���.d����!�'��&2��=/�>W��vh�{~�=�z��of�OO��2��d)x.���a}�Z������Jj�g,��K�
��T�$�*~	W��,]{ ��������y25��&B��7�h����E�=
}��C���pb�b�yE���OU�)k���Ko<��v6RG��#n�$	F�j7��85�>�����D4������;N���MS�Zq��9d�����yIY�n��������{�i�9w���������;;?}y|�&x�%_O��)�t�S����:��!o�M<]
�>M�Il��[L��X�q?���l�+vT����?�r>i�SM<0���Y(H�i�i5���rs�.��g����Y%k~%�oOM0q�������8�Si3���E|Q�gX���a�=t~�#O
���>��fL8
���$tb��A�fTmS��I��n�W�m�����exf��I�an�B��Z%�����OQ�u�_��m?2Mz����7��R��b� ?�/����/8x�zJZ���#���Cw����>�&�R�+%����*@������Y7��'����>B�e�v�.r8��N�m������#�X�����z��^k�s��
X��x��s�@�^d�\��D`<$�{�S����'��F��5:+N�@�w���N��y��
�*Y����}E\pk�^�3����(jG�2)����i�2T��Q����@j�;H?���D\	�+��|����B]���8T��o�m�)��"j�����iZ����u��� 
3U�n�+I�x��B�����x2�f=������k,i�y�i�r��B�+������s��?u_~x�:Uk=V�������3�[c5�j�JH���/�?�{����-���������-�x�$���+dL��S,������u;@��}���qG�B��p�D
�(�F�"|X��)�AN����kL�����V~=(,<J�[=���wq5��DdW��z{�$g��g<��|zt��or�T��"�	���l�y|���M�$G&t	'���6@�!eN�T�_{@I�}DY�����oU��J9O��p��
4�OV�#<km�,����P�����|8�"G�Ud[|_'����!4NH~���|���_POZ����c�b��%_fL��E��6<��lS0�h�F�T9��k_h}v�j��Z���Zs��B�P�D �IP5��!��b��H�k�N��u;�Q����	�&O$5�d4(c[\L
�SJ��r��l��@�>�����9��eU.���X�`��w�������k������|�+~���IQ�D��n)��9��p�P��\�d��e.�4����O�����^P�B�2t�v���\`+��[���U�F�]�o�{aR
\yO��;���?p�S�qR�a<���[�p��f1C���"�P���	�5@��5���P���B�j��OH������x|d�	��Ex���\��U	j��q�"�����"���c�BU/�!������A������W��S\?6�Of��W��	<1F�H��;�����~�lk�N�[3�]C�5�W0�7�h�Go�� i4��mj����f��s���or��5K���,1��x�f��l:fe����,��B��h����&����T���.�p��^��	��Jl��Oh���I�:k����-�e]{�����t�FS�+��A�p1A3�XO�D�>E<KF,��l11�mxW�g�
�J�`��vF�������2����Hm��^��@P|�J�����'?�nc��6sa�E��l��C�9>99>y��F� ���x�R���G
�z�=<��X�:�t�9+������n%����u_��Po.�������BVVp�� S+{�)ze�g����GL-gj���	����`��T�-
X+�N����0�1�"XcO�>��&Yu*�~��xl%�����A<��N����[��`Z������$�������U{��tx|�������"BP�������h�	��?3p���a�������'�!���>���������6�F�g<��&�
��:�(�_5`�u2}Bc��	�}L?#��������,��L�T�y�l���b�?kr5�o�U��@$�	�&O����� �
���X]��!o��R�m��l�~K_����An��M�Q=�`|��������n^yv�{�r��u:0��S��f�P�������1��0`fjG�d |.&/��)H����:�z}HU���GDi�$E�Y�w��`���!& �N��[:�����
}'������{���|�K�Z0B������/E��)�9Dzh+
L�8����H�?K�l����}#b	���JJ��������,�e�Qp�F���������������w���-Xt��O)}�;�lOn#Y��x1���@)��8dH��Jh�7WG���|�[�+�����I�:�LEc_"���*���eh����a>��V^�z���
&�R@�WU���^ �f�f��5��~8M�>*��j��#FI����/�4�"�~zAZ���;�S��m��H})�h*���|��5k-�4[�������n�&��9����Qt��^������$T�
/\�D�1���,�����=�����~�C+��Y3��\s���czG�p��@��fGX�%�pI0�[tt�'y����v��������I���������<
V�.Y��-�3�O���K���g�%N(kwa�y��R���Ut�V�L��m&�si�N���N'����B�]�
�/h�y".�VG
u��qKMUn1m_z�����l2�.y��,&�}b�Z7 ����CT@v1����R����d�>(��v	[�EC.��-���gq���=}�>�;1�99�5/������?LS�d�<d�9w6����>�L�$,ln���f���5����������x����������Z������E���U�����.���Q�["�E�"�8���������� ���#w��������	qPzp�v�{�������IW��b��Z�c�F��H��^*���ER[]�|���p��Y�X:
:xO�d�����yOq� (Y����1��ly�ofP==7[�V�����t��/�����R��XD�a��������������828?*�N�H��{EE{-�f����0z�����n�����d�>E���[���&7@��Vu����_���c1k6��Z��X`��]�|����'���������Eu���^S2%��
��dJ�53��5���	��������'����g`�����=��l5��.&?8�\F��*vqh���
��:*]Q�rC�J�
��*��|�P����B'qq"��@+8#�v�[	��G?�S��^���^���ZQ��
F��P�g3�gc"(������V�)4���D��Op����q�jb /��������>���*iyW]�9�/%e���0���,��
PE�.�>�s�?5�m� W��.+�,���ZjUE�Zh&�)��J����D���Z���!S�RF���_��6
����������}�[����u��v�d�.k�������\����J�r����"�g�-M����@_�<��8Pv���l�w��>*����@52^��,�q��<%�����C�t6���\�0K���2��n�����o6�������!�5`0���C- �y�50���|2�����q@�U
p���x�h�~~��
�P>���C7v���x!����ik�1{�� el��o��fg���=F�R� tcUA��!{Tm�����O+x��R������5����Vc���E �L��������nl���{���Ib�A|4�w(��)���_Q��U����B����m
��������'�2��Y
���Z�������r���=����Nmg/G	����%���5�^�e~4��W��
���$q�;��@0�����D����I�b���5����g���6�����"�����c+
�uD:�?��G�#:�+z����`��2�W�� +���K/u�b)�E�|
~+��|�+�n�u#k��;-�������yZ|��/��X���>%�yT�J�����P�c(|]`d��� m�[C��d	�6��i�8|�>���96��������OM^R��@v�z+��,Ft]v�XF� ����i�	,k5�����V�hp����E��c�=%Y	�g�3�[�[cr&��na46���8e�O�����`�.=��.����S9`�6�>e�)�1��7�e��aH,:�I�IS��_&�<�VL;��pK�i5�h ����H��N�a.W�%�]Gq�u}}wzxT����_'�kf:f��_������.���k�Jh���!|	��8{-���K��j	l����l��atxsg��j�-�Q4�=\b���
9����FWw��NB �	2F����d\����
�i�����e�v��D������z6����k��(�ei�	��l ]a��/����|�����W���������V���hr �w��N-x+�.s���-h_�0�[��G1�LH�Y������\�p�"=k�����p:�q/����sY!�(����x�����l��sM��tN���L{2$�I��7��G�3������u��#���
f�lZ���\Dq��Nd��u�l���h���j#����������h����������]L������k�y�:�xK9x��5oI�9��dc���aoP���@�W��(��&og��l!��N�����U���H�_��\�'6����Y��DO����T���Y}qD����Q�����#A���`\��k(�yr!�o���3ie��8�T��}�����4�&A���g�S����P:�j����"yL����������g��SL���*��1�5'�3	�v����[u�z{�����f��Lg�����o1Y�^%�����!c���4�����h��0�vH�������<�{��/
Md���	=K�xu������	P#�>}x���V�"Vt?��F��.x"�O3���N��z���������.�5��[�:���!�qr�����@�������8�7����]�j,0�_�����%�6/$$l7����i�*�>L��E�8�PD ��c�X�'Hm�5�[�����.��s�P)���o���~�>�i����5��p;+�B���[)�e�9;��8N��`�
��P�� �y���m������K����y��Bvs��[�6�\���&H��J�Q��m�|��J9`/}~�xb7���N�,��h�\�
w
r��^��V���D�V��4��x��@�wI8XR��Ppr��DA�Bjs�Q�tl5�vJ�a�8QR���,���>�S5�����*�3up��@c�����G��q}�������79jK��	���B��T�����%B�����.�8Sl-���g�[���@�u��)�sW��!7z�0u�o����L�HKLX3V]��C�`��;1���x"��j�F�V�3h�"�E�|�s'����I����S�P)06��2��V�x��{U|\7�eR�D��^���������$�f��H�7�z�8�$���e9��;���r��&m�����e��.��21�2t���}�B1�	v?@����<&]�n`� ���U�9���Q�
���\#8Cp��$tTI�]�������f��5�k+@**x}��]���e�����jx�R�hC�+��
]���A~t�%��vj�&�
{��:�����<�	U�]��8D���K1�%0RC!��������������t����q���[4P>���>3�����������[��6E�1x��hM�x�I4�&�K[���H�����{���kO�[Pv�����2�[��&$"�:Sa�k���$�Sa�9�1�
��0E)ps����b#{�����rBB�/=����w�����D�Cg4-���^W�{�*��+�$q3���t2���U�lM���Y�@�]��%�l����{�)��s0���o���'����1�PO����x���lw��T��k�>Non��}������6sq��A�k��5>X��}��b@vE�t�������A����K�A���v�]k��s^ZYm���&�����c���Y�������w�WG��8p���+s����ybH	q�ZsK2�_����(��	V�������\�jV�|�l2��k�:�/�`�l�_��Y�����Edn_����+�n_:(T�����O��d]��Y��	�\��f�)����
�����&,����;�}�J�_�|�%q�d�1�D_�)�]�Z4��s���Lm?��&�J�g7�Ln2�8�����w���J^����{���Dq\X��f������wv����_K��o���|��z��N���Q��1x��vr�l��vWd��@����4�l:M����q>-�F���Q�O_G�(|m8�
~�R��]| _���g�Z��-��V����bO��8 x���u�-����S����-����+��/�%�TJ�by��g+�I�������;�&��iu@i�:�D*P�x���������cK,���8���!a6s_�������7Z�
��/��mn��j|��+r
�y���,�v���S>*��L"}��(a�
vT���O0��1�����Gba��@����^�#�-]�F�6�w���RnG#����������[wk�#�Z"���j���pf'�1j���a�8{�s\��:�$�;��gHV������Mk����	)�
���;tE.�H!h�}iW����S�;�
>�>�Y%sFG'r/��8�L��&��d��'Pn�P���,�4j����_g����si��pZF�%=
|l�K�����a�m����="�2�4;0z���*��h5�Q����Y �������}�F��A6y:%pb�Q�>k�X�50����*u��"h��Vlu����y_fn<���IY�/�q<�{d�U&x���U)���[�hGp�R;���������Uc��
MW��B�IA�5���T	�4��D;Et�\"$���a>�P���9V!�
2PK����9Pt<��+1�:��3�,�+jhkK��B�8~����#0��NC�O�R�s5������	#4)+!�vKV�U��^��dS������y�M@f���8>�f���0{_;���UJ}�(�-��;F�
�M��\����d;�@r-f[.w)�th�Mp��o����~(����yU��V��g��]g������>9��>#�nf�1�
�����NH�LVQ��I��4I)�TUy3q��@l����V|9�g>o����*	'UFbSYT�%|UTY��YQ�^"���p���p��h�h�����h��P��[�$������Y�4��W6��]����fA2�\�z�MG�����!�'��E1������^�2^e.�����8��PK�j�������7r/��e�V�>���l������9u�j(�������%�Hwc���#�����rL;�\�5���\����q3�..�eY�(m6��GQ!�F�+�����Z���(Q�Ds�j6�U�($	Ph�Cu�458~9����M)O)K�'��(.:��������CF'�H�o�&#����[��z�6���� {Yp9��w%H�
M����&L%G*���N�%_
g��8{	�q:�%��������>�����3�\��Wk��]�<�\��4�xW/3�[��=�S��1���a���f����8k��#dtj�kH���~oA^A��9�U�'%��T����I5��u����F)����g,����1)2/�&*������
V��\3���(�Bz�0���I�c�~<��rh�v���F�[Ou�}�a@{��������F��������[�`Ze��]��������D�kR����a���={!�\u���L�V���(���������9���c
�
����y���� ��������X�	Q7�2<m}����<���I�q�(���>����&�� �{j|��RO�fU�X]�$�H����J��_+��D�6�������VW5��RW��@�M1�[��P-�s'=��x������>U�S�(DVJ��(�BrE���&�K�����[�j�R���D"7=���+r�������`����8�[u���#���Q����{�#�������Q��K��$���F�s�k|-�#Pe�H���_=���s���2��P>�dn��}d�R��d���<��<�J�j����K�n����=�1�y��	���oiFBy��*����B^^���T��,�E�'
�a&����qy~x���������#<<�=��7�&��B�9,z3�i��$��b<�	Bq
#I�~��sT'��@���S��QF}�����L>E���G������c���}�k��;Y��/��������Sp���)�Y�/O_��j3�u[m��e�Z�s^\��ls�n(T72z%����i��%-���8�����#�t�|�%��J��P
b����ZN��n~�����s���V�����;�|[q)�i����s��(�����,����������5�9Ge2���y�i����2l<
�|�G:<�v���.�|e�u�!g���Bq�e��a���?t��m<W�5J%^��V�r����_|�qbp#A��{��d�=�*��h�������>���f�\���P����R(J!�g��N������;�w�1po�f>o�����-���<���l��~�7�Z�zv��E�$���Vpx�cXr4;�Z��������7���QBu+�[Q+���LO�l	;��"m�}���xN�k�>��7Y��4��r[��v�G���p�5O��J���uT|���;��rBpF���)�����G����[���f���}�a+�U���I[���=��p��/9,��r��3�D�A�^���~�����<OqY`;�ML?Q=�����z0�P��y�a<�%`�$�����@�����8d�=�F5�{#�eQz�$����cr�������\��$����a~7Eg�D�^��]����O��]�4�4/&(��ez�E^��1J�*
��d �����"��B���� �,���.;Kfr*g�ON���n1���/$����q��a�	4)K~�Lx�{��p'���{�N�j���<��Ja|N����I�z�������O��w!!�����g��
�lP,7�p�M�1�N�����~H�]���,h0��i�Z������K�Y���-j����M��7�Z����&�k6�Y�dg�&m�tQ� 9<}G�~�������=��������(�k9��V�;J�K��\���Jz:_���nA���2�A�w���F�^o��W���b�o%�C��0j����C=8��c�5�b���ED�/��<p^�f��q�g�<��!��A`��X��������+A����	�w'O���h�T.�����+��N�B����[_��{Jw����U���.|�����i�Mu24��A����kqfM���zn��q��j��lB0w�^�(���������$ol�������9��?��{���
�������q��.��|��1��W:�U�m�M�	QJ��$���������zQ��-3>j���<}���~����O���Xnl�YB�J*#��b�}�.��0������x�����/3<���s-�O���=Q���{�K�2����y��U�X�z��N5Po����v+�g ��K�Y�����@���tHV��8T�Jy8[�
y1����@v_����������./��F�x� ��]x6�"����h/6�l�"���������5E<�mfO	����~�����[��N�og_�
�+E�'(�
2h�S�:�2�������~�pT�o\I�~�ke��H���E)���K	x!,
5Y@��R�u�&RK��,��9� ��m@�s�(�h2�za��K^An g��Y�@�����yr���r�y�/�=�[
~�\<�~`xm ~����vR	s|@��n��y�m��ec�gm/b�h�c�Q��?��=��%��f&����p��y���h������W�6�_��Hcek�Wu�q"�T�������f�����^&#��*sB��������,���\�!�z�G�Li�4�k3H48�BW�����&,+*��#�)[��;9�5*�6[mw8Nf7z�!&i�y�g=4��9�M��0x�-p�@��}�AX)�t��_�O�W���S���9D�*iXS+'�����SS�#-�������x?X���0�=k��.)��y�|�pM_6r���/�)�n�-��_A9,^Z�W@��������'��K*���|��B���Z�%v,Y���\��u��
{�p��F�r# B�K��
	���9>U������S��2|�W%�%��[%�1���rg�Ta��fzJf����oR��x�K�R ?	:�V���Zr��'xmY�P�����+5��6�]�U�����bl��[/�4���@�e1�H��Vj�����Q�(VPU���=�/5�C�)�!;�wI�����������K�n��NB��p������	��d{��{x����.�����X(���BG�W��9����8*�y��GJv�Z��F���#y��9�����f�����;��U����=�j���c�B�E��xHe_�0��&I�H{!C��� Y����]taT�����
��RBY�1�� (��X'r�{P�7��o��+t��e�-N���	��[S�*X�d���M-��hP��&��_��Rb����a����j������Y�x��r�Y��&����Z�>��o+��_z�� �h�'3��,��}���/�N�	-q���v:�&n�z�%c�
�Fy��\��+�BY�.�(�`I�
h|���J�*�-G���L�ak����_�_�6�������o!O��2t��W'��q�7��7�}7�FO���'WH������T��\��J62��l�|z�C7J�p��@�l5!���e�q���;��:?B!��-����_V�~��1�bFT���`%�O�����������W�7���5�����J�J�l��M�A�Y�H��j�=�_������5��Z�u��
�����D\��U��]�����m��@�RSc�q������@Q��}�F�<�=���������������������t�

test-barrier.patchapplication/octet-stream; name=test-barrier.patchDownload

diff --git a/src/test/modules/test_barrier/Makefile b/src/test/modules/test_barrier/Makefile
new file mode 100644
index 0000000..9f330b6
--- /dev/null
+++ b/src/test/modules/test_barrier/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_barrier/Makefile
+
+MODULES = test_barrier
+
+EXTENSION = test_barrier
+DATA = test_barrier--1.0.sql
+PGFILEDESC = "test_barrier -- some tests for barrier.c"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_barrier
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_barrier/test_barrier--1.0.sql b/src/test/modules/test_barrier/test_barrier--1.0.sql
new file mode 100644
index 0000000..822915f
--- /dev/null
+++ b/src/test/modules/test_barrier/test_barrier--1.0.sql
@@ -0,0 +1,13 @@
+\echo Use "CREATE EXTENSION test_barrier" to load this file. \quit
+
+CREATE FUNCTION test_barrier_alternate(workers int, loops int)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE FUNCTION test_barrier_reattach_random(workers int, end_phase int)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+
diff --git a/src/test/modules/test_barrier/test_barrier.c b/src/test/modules/test_barrier/test_barrier.c
new file mode 100644
index 0000000..07127bb
--- /dev/null
+++ b/src/test/modules/test_barrier/test_barrier.c
@@ -0,0 +1,211 @@
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/barrier.h"
+#include "storage/dsm.h"
+#include "storage/proc.h"
+#include "utils/builtins.h"
+#include "utils/resowner.h"
+
+#include <stdlib.h>
+#include <unistd.h>
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(test_barrier_alternate);
+PG_FUNCTION_INFO_V1(test_barrier_reattach_random);
+
+extern Datum test_barrier_main(Datum);
+
+typedef enum test_mode
+{
+	TEST_MODE_ALTERNATE,
+	TEST_MODE_REATTACH_RANDOM
+} test_mode;
+
+typedef struct test_barrier_alternate_state
+{
+	Barrier barrier1;
+	Barrier barrier2;
+	int loops;
+} test_barrier_alternate_state;
+
+typedef struct test_barrier_reattach_random_state
+{
+	Barrier barrier;
+	int end_phase;
+} test_barrier_reattach_random_state;
+
+typedef struct test_barrier_state
+{
+	test_mode mode;
+	union
+	{
+		test_barrier_alternate_state alternate_state;
+		test_barrier_reattach_random_state reattach_random_state;
+	};
+} test_barrier_state;
+
+/*
+ * Wait at barrier1 and then barrier2, state->loops times.
+ */
+static void
+do_test_barrier_alternate(test_barrier_alternate_state *state)
+{
+	int i;
+
+	for (i = 0; i < state->loops; ++i)
+	{
+		BarrierWait(&state->barrier1, PG_WAIT_EXTENSION);
+		BarrierWait(&state->barrier2, PG_WAIT_EXTENSION);
+	}
+}
+
+/*
+ * Attach, wait a random numer of times, then detach, repeatedly until
+ * state->end_phase is reached.
+ */
+static void
+do_test_barrier_reattach_random(test_barrier_reattach_random_state *state)
+{
+	bool done = false;
+	int expected_phase;
+	int i;
+	int nwaits;
+
+	/* Make sure each backend uses a different pseudo-random sequence. */
+	srand48(getpid());
+
+	while (!done)
+	{
+		expected_phase = BarrierAttach(&state->barrier);
+		nwaits = (int) (lrand48() % 8);
+		for (i = 0; i < nwaits; ++i)
+		{
+			if (expected_phase == state->end_phase)
+			{
+				done = true;
+				break;
+			}
+
+			BarrierWait(&state->barrier, PG_WAIT_EXTENSION);
+			++expected_phase;
+			Assert(BarrierPhase(&state->barrier) == expected_phase);
+		}
+		BarrierDetach(&state->barrier);
+	}
+}
+
+Datum
+test_barrier_main(Datum arg)
+{
+	dsm_segment *segment;
+	test_barrier_state *state;
+
+	BackgroundWorkerUnblockSignals();
+
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "test_barrier_main toplevel");
+
+	segment = dsm_attach(DatumGetInt32(arg));
+	state = (test_barrier_state *) dsm_segment_address(segment);
+	switch (state->mode)
+	{
+	case TEST_MODE_ALTERNATE:
+		do_test_barrier_alternate(&state->alternate_state);
+		break;
+	case TEST_MODE_REATTACH_RANDOM:
+		do_test_barrier_reattach_random(&state->reattach_random_state);
+		break;
+	default:
+		Assert(0);
+	}
+	dsm_detach(segment);
+
+	return (Datum) 0;
+}
+
+static void
+launch_test(test_mode mode, int workers, int n)
+{
+	BackgroundWorkerHandle **handles;
+	test_barrier_state *state;
+	dsm_segment *segment;
+	int i;
+
+	handles = palloc(sizeof(BackgroundWorkerHandle *) * workers);
+
+	segment = dsm_create(sizeof(test_barrier_state), 0);
+	state = (test_barrier_state *) dsm_segment_address(segment);
+
+	/* Initialize state. */
+	state->mode = mode;
+	switch (mode)
+	{
+	case TEST_MODE_ALTERNATE:
+		/* Initialize a static barrier for 'workers' workers. */
+		state->alternate_state.loops = n;
+		BarrierInit(&state->alternate_state.barrier1, workers);
+		BarrierInit(&state->alternate_state.barrier2, workers);
+		break;
+	case TEST_MODE_REATTACH_RANDOM:
+		/* Initialize a dynamic barrier.  They'll attach and detach. */
+		state->reattach_random_state.end_phase = n;
+		BarrierInit(&state->reattach_random_state.barrier, 0);
+		break;
+	default:
+		Assert(0);
+	}
+
+	/* Start workers. */
+	for (i = 0; i < workers; ++i)
+	{
+		BackgroundWorker bgw;
+
+		snprintf(bgw.bgw_name, sizeof(bgw.bgw_name), "worker%d", i);
+		bgw.bgw_flags = BGWORKER_SHMEM_ACCESS;
+		bgw.bgw_start_time = BgWorkerStart_ConsistentState;
+		bgw.bgw_restart_time = BGW_NEVER_RESTART;
+		bgw.bgw_main = NULL;
+		snprintf(bgw.bgw_library_name, sizeof(bgw.bgw_library_name),
+				 "test_barrier");
+		snprintf(bgw.bgw_function_name, sizeof(bgw.bgw_function_name),
+				 "test_barrier_main");
+		bgw.bgw_main_arg = Int32GetDatum(dsm_segment_handle(segment));
+		bgw.bgw_notify_pid = MyProcPid;
+
+		if (!RegisterDynamicBackgroundWorker(&bgw, &handles[i]))
+			elog(ERROR, "Can't start worker");
+	}
+
+	/* Wait for workers to complete. */
+	for (i = 0; i < workers; ++i)
+		WaitForBackgroundWorkerShutdown(handles[i]);
+
+	dsm_detach(segment);
+}
+
+Datum
+test_barrier_reattach_random(PG_FUNCTION_ARGS)
+{
+	int workers = PG_GETARG_INT32(0);
+	int end_phase = PG_GETARG_INT32(1);
+
+	launch_test(TEST_MODE_REATTACH_RANDOM, workers, end_phase);
+
+	return (Datum) 0;
+}
+
+Datum
+test_barrier_alternate(PG_FUNCTION_ARGS)
+{
+	int workers = PG_GETARG_INT32(0);
+	int loops = PG_GETARG_INT32(1);
+
+	launch_test(TEST_MODE_ALTERNATE, workers, loops);
+
+	return (Datum) 0;
+}
diff --git a/src/test/modules/test_barrier/test_barrier.control b/src/test/modules/test_barrier/test_barrier.control
new file mode 100644
index 0000000..1cfe3bc
--- /dev/null
+++ b/src/test/modules/test_barrier/test_barrier.control
@@ -0,0 +1,4 @@
+comment = 'test_barrier'
+default_version = '1.0'
+module_pathname = '$libdir/test_barrier'
+relocatable = true

#46

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#45)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Mar 9, 2017 at 3:58 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

In the meantime, here is a new patch series addressing the other
things you raised.

Please see my remarks on 0007-hj-shared-buf-file-v7.patch over on the
"on_dsm_detach() callback and parallel tuplesort BufFile resource
management" thread. They still apply to this latest version of the
patch series.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#45)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Mar 9, 2017 at 5:28 PM, Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

On Wed, Mar 8, 2017 at 12:58 PM, Andres Freund <andres@anarazel.de> wrote:
0002: Check hash join work_mem usage at the point of chunk allocation.

Modify the existing hash join code to detect work_mem exhaustion at
the point where chunks are allocated, instead of checking after every
tuple insertion. This matches the logic used for estimating, and more
importantly allows for some parallelism in later patches.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/
nodeHash.c
index 406c180..af1b66d 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -48,7 +48,8 @@ static void ExecHashSkewTableInsert(HashJoinTable
hashtable,
int bucketNumber);
static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
-static void *dense_alloc(HashJoinTable hashtable, Size size);
+static void *dense_alloc(HashJoinTable hashtable, Size size,
+                                                bool respect_work_mem);
I still dislike this, but maybe Robert's point of:

On 2017-02-16 08:57:21 -0500, Robert Haas wrote:

On Wed, Feb 15, 2017 at 9:36 PM, Andres Freund <andres@anarazel.de>
wrote:

Isn't it kinda weird to do this from within dense_alloc()? I mean

that

dumps a lot of data to disk, frees a bunch of memory and so on - not
exactly what "dense_alloc" implies. Isn't the free()ing part also
dangerous, because the caller might actually use some of that memory,
like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
deeply enough to check whether that's an active bug, but it seems like
inviting one if not.

I haven't looked at this, but one idea might be to just rename
dense_alloc() to ExecHashBlahBlahSomething(). If there's a real
abstraction layer problem here then we should definitely fix it, but
maybe it's just the angle at which you hold your head.

Is enough.

There is a problem here. It can determine that it needs to increase
the number of batches, effectively splitting the current batch, but
then the caller goes on to insert the current tuple anyway, even
though it may no longer belong in this batch. I will post a fix for
that soon. I will also refactor it so that it doesn't do that work
inside dense_alloc. You're right, that's too weird.

In the meantime, here is a new patch series addressing the other
things you raised.

0003: Scan for unmatched tuples in a hash join one chunk at a time.

@@ -1152,8 +1155,65 @@ bool
ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext

*econtext)
{
HashJoinTable hashtable = hjstate->hj_HashTable;
-       HashJoinTuple hashTuple = hjstate->hj_CurTuple;
+       HashJoinTuple hashTuple;
+       MinimalTuple tuple;
+
+       /*
+        * First, process the queue of chunks holding tuples that are in
regular
+        * (non-skew) buckets.
+        */
+       for (;;)
+       {
+               /* Do we need a new chunk to scan? */
+               if (hashtable->current_chunk == NULL)
+               {
+                       /* Have we run out of chunks to scan? */
+                       if (hashtable->unmatched_chunks == NULL)
+                               break;
+
+                       /* Pop the next chunk from the front of the
queue. */

+ hashtable->current_chunk =

hashtable->unmatched_chunks;

+ hashtable->unmatched_chunks =

hashtable->current_chunk->next;
+                       hashtable->current_chunk_index = 0;
+               }
+
+               /* Have we reached the end of this chunk yet? */
+               if (hashtable->current_chunk_index >=
hashtable->current_chunk->used)
+               {
+                       /* Go around again to get the next chunk from
the queue. */
+                       hashtable->current_chunk = NULL;
+                       continue;
+               }
+
+               /* Take the next tuple from this chunk. */
+               hashTuple = (HashJoinTuple)
+                       (hashtable->current_chunk->data +
hashtable->current_chunk_index);
+               tuple = HJTUPLE_MINTUPLE(hashTuple);
+               hashtable->current_chunk_index +=
+                       MAXALIGN(HJTUPLE_OVERHEAD + tuple->t_len);
+
+               /* Is it unmatched? */
+               if (!HeapTupleHeaderHasMatch(tuple))
+               {
+                       TupleTableSlot *inntuple;
+
+                       /* insert hashtable's tuple into exec slot */
+                       inntuple = ExecStoreMinimalTuple(tuple,
+
hjstate->hj_HashTupleSlot,

+

false); /* do not pfree */
+                       econtext->ecxt_innertuple = inntuple;
+
+                       /* reset context each time (see below for
explanation) */
+                       ResetExprContext(econtext);
+                       return true;
+               }
+       }
I suspect this might actually be slower than the current/old logic,
because the current_chunk tests are repeated every loop. I think
retaining the two loops the previous code had makes sense, i.e. one to
find a relevant chunk, and one to iterate through all tuples in a chunk,
checking for an unmatched one.
Ok, I've updated it to use two loops as suggested. I couldn't measure
any speedup as a result but it's probably better code that way.

Have you run a performance comparison pre/post this patch? I don't
think there'd be a lot, but it seems important to verify that. I'd just
run a tpc-h pre/post comparison (prewarmed, fully cache resident,
parallelism disabled, hugepages is my personal recipe for the least
run-over-run variance).

I haven't been able to measure any difference in TPCH results yet. I
tried to contrive a simple test where there is a measurable
difference. I created a pair of tables and repeatedly ran two FULL
OUTER JOIN queries. In Q1 no unmatched tuples are found in the hash
table, and in Q2 every tuple in the hash table turns out to be
unmatched. I consistently measure just over 10% improvement.

CREATE TABLE t1 AS
SELECT generate_series(1, 10000000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
';

CREATE TABLE t2 AS
SELECT generate_series(10000001, 20000000) AS id,
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

SET work_mem = '1GB';

-- Q1
SELECT COUNT(*)
FROM t1 FULL OUTER JOIN t1 other USING (id);

-- Q2
SELECT COUNT(*)
FROM t1 FULL OUTER JOIN t2 USING (id);

master: Q1 = 9.280s, Q2 = 9.645s
0003-hj-refactor-unmatched-v6.patch: Q1 = 8.341s, Q2 = 8.661s
0003-hj-refactor-unmatched-v7.patch: Q1 = 8.186s, Q2 = 8.642s

0004: Add a barrier primitive for synchronizing backends.

+/*---------------------------------------------------------

----------------
+ *
+ * barrier.c
+ *       Barriers for synchronizing cooperating processes.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * This implementation of barriers allows for static sets of
participants

+ * known up front, or dynamic sets of participants which processes can

join

+ * or leave at any time. In the dynamic case, a phase number can be

used to

+ * track progress through a parallel algorithm; in the static case it

isn't

+ * needed.

Why would a phase id generally not be needed in the static case?
There's also further references to it ("Increments the current phase.")
that dont quite jive with that.

I've extended that text at the top to explain.

Short version: there is always a phase internally; that comment refers
to the need for client code to examine it. Dynamic barrier users
probably need to care what it is, since progress can be made while
they're not attached so they need a way to find out about that after
they attach, but static barriers generally don't need to care about
the phase number because nothing can happen without explicit action
from all participants so they should be in sync automatically.
Hopefully the new comments explain that better.
+ * IDENTIFICATION
+ *       src/backend/storage/ipc/barrier.c
This could use a short example usage scenario. Without knowing existing
usages of the "pattern", it's probably hard to grasp.
Examples added.

+ *-----------------------------------------------------------

--------------
+ */
+
+#include "storage/barrier.h"
Aren't you missing an include of postgres.h here?
Fixed.
+bool
+BarrierWait(Barrier *barrier, uint32 wait_event_info)
+{
+       bool first;
+       bool last;
+       int start_phase;
+       int next_phase;
+
+       SpinLockAcquire(&barrier->mutex);
+       start_phase = barrier->phase;
+       next_phase = start_phase + 1;
+       ++barrier->arrived;
+       if (barrier->arrived == 1)
+               first = true;
+       else
+               first = false;
+       if (barrier->arrived == barrier->participants)
+       {
+               last = true;
+               barrier->arrived = 0;
+               barrier->phase = next_phase;
+       }
+       else
+               last = false;
+       SpinLockRelease(&barrier->mutex);
Hm. So what's the defined concurrency protocol for non-static barriers,
when they attach after the spinlock here has been released? I think the
concurrency aspects deserve some commentary. Afaics it'll correctly
just count as the next phase - without any blocking - but that shouldn't
have to be inferred.
It may join at start_phase or next_phase depending on what happened
above. If it we just advanced the phase (by being the last to arrive)
then another backend that attaches will be joining at phase ==
next_phase, and if that new backend calls BarrierWait it'll be waiting
for the phase after that.

Things might get wonky if that new participant
then starts waiting for the new phase, violating the assert below...

+ Assert(barrier->phase == start_phase || barrier->phase

== next_phase);

I've added a comment near that assertion that explains the reason the
assertion holds.

Short version: The caller is attached, so there is no way for the
phase to advance beyond next_phase without the caller's participation;
the only possibilities to consider in the wait loop are "we're still
waiting" or "the final participant arrived or detached, advancing the
phase and releasing me".

Put another way, no waiting backend can ever see phase advance beyond
next_phase, because in order to do so, the waiting backend would need
to run BarrierWait again; barrier->arrived can never reach
barrier->participants a second time while we're in that wait loop.
+/*
+ * Detach from a barrier.  This may release other waiters from
BarrierWait and

+ * advance the phase, if they were only waiting for this backend.

Return
+ * true if this participant was the last to detach.
+ */
+bool
+BarrierDetach(Barrier *barrier)
+{
+       bool release;
+       bool last;
+
+       SpinLockAcquire(&barrier->mutex);
+       Assert(barrier->participants > 0);
+       --barrier->participants;
+
+       /*
+        * If any other participants are waiting and we were the last
participant
+        * waited for, release them.
+        */
+       if (barrier->participants > 0 &&
+               barrier->arrived == barrier->participants)
+       {
+               release = true;
+               barrier->arrived = 0;
+               barrier->phase++;
+       }
+       else
+               release = false;
+
+       last = barrier->participants == 0;
+       SpinLockRelease(&barrier->mutex);
+
+       if (release)
+               ConditionVariableBroadcast(&
barrier->condition_variable);
+
+       return last;
+}
Doesn't this, again, run into danger of leading to an assert failure in
the loop in BarrierWait?
I believe this code is correct. The assertion in BarrierWait can't
fail, because waiters know that there is no way for the phase to get
any further ahead without their help (because they are attached):
again, the only possibilities are phase == start_phase (implying that
they received a spurious condition variable signal) or phase ==
next_phase (the last backend being waited on has finally arrived or
detached, allowing other participants to proceed).

I've attached a test module that starts N workers, and makes the
workers attach, call BarrierWait a random number of times, then
detach, and then rinse and repeat, until the phase reaches some large
number and they all exit. This exercises every interleaving of the
attach, wait, detach. CREATE EXTENSION test_barrier, then something
like SELECT test_barrier_reattach_random(4, 1000000) to verify that no
assertions are thrown and it always completes.

+#include "postgres.h"

Huh, that normally shouldn't be in a header. I see you introduced that
in a bunch of other places too - that really doesn't look right to me.

Fixed.

In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a

few regressions,
Q21: 52 secs on HEAD and 400 secs with this patch
Q8: 8 secs on HEAD to 14 secs with patch

However, to avoid me being framed as some sort of "jinx" [;)] I'd like to
report a few cases of improvements also,
Q3: improved to 44 secs from 58 secs on HEAD
Q9: 81 secs on HEAD to 48 secs with patch
Q10: improved to 47 secs from 57 secs on HEAD
Q14: 9 secs on HEAD to 5 secs with patch

The details of this experimental setup is as follows,
scale-factor: 20
work_mem = 1GB
shared_buffers = 10GB

For the output plans on head and with patch please find the attached tar
file. In case, you require any more information please let me know.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

ph.tar.gzapplication/x-gzip; name=ph.tar.gzDownload

��L�X��{s���/��z>E�gF{���z�l9B/�:w5��qcccc���4\��<$g����g��U�
�4��n"�0k�FWe�^������o/��<�J����0q��R+�	V��O"�R�Hn@I)W$p�+>�`�������r�����m�{df�	�f���/_���������������'��|y�Z={}}w���N�0��fu{��w�
����7�o�������p�nu���Sm�f�b�hz�zw}}s�)|�e���W���o{3��<�">F��K�]:��e����s��'�gw�?{��������={���o�Wo.���^��X����Sx��'W�_�/V�*}sy������GH�/����Z�\���~�������+��n���\����������#$�?�������n/~:���R�P$��"����)�@0�JE{�����F�F�D������(���]#���+j������������zuu����,����������e��9�\3�M��3��0}�j���2%d�Z�w^C+������^���������_����?��z�����%�����U��H�Y��F��	�qw���c������*>��>���]�0�/��p��*/+�#|]�O����e�7�3i�6���W��jb��&-�E���+.(?��MV�o��c�iV"��S��w62+����Y���|���xVp��'^��� �H^79��<���t�C�]��&��|~{�����������oV�Mj�������U���T��Ny&$m�Fx����N���6����0.-c�(��vDQ	��`�Dzy����_��^�^�������O(��/q������������������du�7d��J�p!���H�[�����Zl!`�8�8!������7������3x��fz�����GvS�t���`�I�c��`R+���|�Q���1�q���Ae��#g��^�����������{�d�H��/w�?]|�2�TO���7�����T���p�����[\�P���)d���������x�>�1������g/�\�}�w�c?���8�$���������9�;���������.>5��L�(��|Y�:�����4(�M���/b�4s�h
��^[��U|P���H�������J���������W)+��P�,��s�P~��0o
��\um�l�� �=�7O�toY�?���g����Z������_go�����-]�'/v�8}����w�4p�)f�}D\n�����'W��r��4p4������?YY���}�d���.^]���:������Y��GO�<���p�������������O,��}P�]t-���4���)��^Nf��L����E�<w��b����~Z��n�uLIwd�?�ls�S��#��4��&�.�q���������gq�zd��('�t�Fy�W�����	
��'�r���B�P����o����x\�t8��#0����~Nf�����@h��������_�Ga�sx��u�X�����J��]�Z����g^����������������e�#��o6��]7p��__�y��+5��/P�����n���z� ���>��6:��a<z Q�(=WB1�gLn��bU�����������H�p�+GV�d-r���������T���/�����_|����������
��{E?I������7v������7�_�?�\��]���@3��u3`0�m��N�IC���2��F8�w�H�B;<��9�:��8�����7[a�j��|�u����D����n8���%�c��4�-:n���1��2���KT.����xWl��l�E�2�QP�����7R|����L;D��F�?��<Z��
K��5��)9j���/H���\����o��������%�jR��ujG�Z1�_��b2���g��lA��3Go���|�x�G�=�P�;�Kj2������q8RM��6�|wq���������(��7��>�@sa���8��K�������s�^0�����m��~�'�����OQ��Y�g��-����!'C�����*w���W��������u�����������|Q�z�������������<��W��QS�����c9i��������8���(��U�G9m]��������V����v��vH��N=\X>P
o]�5]|�Hc�!�0%�N�G*'Uqp�~�������+	�
6Zm���,M��y����~�������O_}�'\�|<�p�������=�og���3���3)�$�|g��O���o
5��MU���
mZ0-L�6�;P���M���M����_������T�+�?Z���+z���A���!����5�����9�]�x��[��M��������)��t����v���P�:A[w����������������#\��=�����8 �*A������8�M�.�������AK���AZ�SHEOi�S$�(-�}Z��
0��z9��.����!���^q�jvp-hs���ky��#�,�Cx���<��j�3�Kb��\v�Zn�`x�
yg^�Z�����i�����6!5���8��h#������!i�(��c�|YR�p�I�
��n��`�Xb��o���C���@�ITd�wN�s��Ei�����f�v0���K�
HR�Y��"�Q$s�(l�PD"�����/yBu}%��'�Q�W�+�n*��f1*e�)��]���(v�����������)�,�Yj�����T����@����dT)����Z�/���:y����X#U�f���1���<��RR�b����j��+Q&�H-�@�4����������LV
Q"+E��I��iq��2U��nv��`��UB�5�P�alD��Dj��H&��	i��	�REfPk�j��lhY)��=��:"xg�RP�T�G�����q�����h���
�x�����6��(����`3��,���z���^19�e0Q���I:���n���v�p�7EH�vai9`3|
�ZEva2��#7_e��W�������<\���]����������N�,�tM��[�w�A�n#���R�����I�1�OJ�������$%M�JX�-6�!���M��v�E��t�w?�hAWNz�����hA�y�|�>H�Z����_���$���G���������b�W\W��D��g��>�����'���v�R=J(BE1MGVn�_�d���rd�p���P�"�;]x��L����"m
�S�\�j�
��������X�v�vv�-�v�b`S������sp+���_Ju���e�&�7J��
���^���}[���v��87 ������_e��k�����1����?�������W��m�m6�YA
6�b"Y5�uC����g3o���I���6;k�����)�)�8����-_��
��h������/�M����&�LR�2m������8JZ������� ��A
r��A���GjfZ���[X�L������H���w���P4�;�=�R������,��+a��S .�����+��yFm�-��N�
�MA��������>$�L���d�K
��l:�1	4�!�',��E���f
�_�5ob��O�Mj�N��"[�����_��R�E(k��	!�!O.d���H���t9}w�E*�������7�1�u�.�^�N�W���p>��f�Mu�I6�,��<G7l=q�JA!�����n*�g��##TW�n�:
������W@�@�	���A>���[��_����kNf]f��F�k)�{��^~�����bg���E��z�R�B�T�(��>�����dqF.�w'���fT����>�6j�S�x����N����
V_���,�=/�7X�
?���"��hf��n���#��M8P�K�gVo��Hpa�J�"h/���e�&��jhU�"
�8j���}�������t�^k���q!�1���v���I�x6����\���N���,:d" ��$$�%����$d��LMB�+'���v��>��h��Ks�5\�M'QN+������iX
��"��Q����\h������WU�E�����������w+M��+A��w���*?4�o�%�.UK��H��WyG��&'8L��c��G~A��	5sY�����_��SQZ��5>.����gS����>]�_�j�OZs)����z����L�%4�c����<��_�rM���<]`'q���
H��{�����,�SH�dN�[��������t�-����/	�I3�Z������R��p�3����s^�n���[���
��G�/M)��VK��Z��.���)���V�+�6��Di�S:��H����A=������t��?��{Q�g�9+�O�C�����"-�?%����T*�~��?�m(��W��;!��=F5�5q��P�G���j��r���s�?�H*�(8�$�s~�����*����BG�J1�]p����g�������[�V����g������?���j�(�_�h2~�L��C�����S�,�)!�(;C)>��v��MG9TBO0�rXS��\L2���?��X�XT0E����7X��3+XN@�3��/X�
���r�(J�[�Xo�A�������_*���2R�',���P�}d7Z�_��I�_���.�M�����������%�������v
�f3M���_�(�
�$)��H������_|k`�T�/B ��S��N0_�em�7�Q����"�u�H(�l4�	xx���	!���&��=P�0�a�_<K������IZ�����2!��b��u�^�_v�A��P��K����x������@���t����J9�b��������b�	�(�\�_�_(��X������2���p��e�����|b�i����"�X��������`���t��/f�=/�/��QB5���e����xF�Bc"���0pj�9uJ��9�@o������!C�.���{�;Lo�����}5�p����(&#��d�8�P�vj6�g������r���
�O�������H'��2� u0��(�����!!Y*�������#��=A��DY��I�;'�q��z�	!���Bm��;��sB�OW��<���������,eP������=���M��2^�d��*�J��;;����8��Y�^2��>�l��)2���T���/��&�P;�;��&���4��BY�>I ���J�i��I�*=W�t�(2�K����@6��$�1�$���@�r�)��D)1�(�Jx�D�q�.�r�s$������^|^J;��4�>�c"�i�V#��v�y2eb���Q=e�D�)_��z����B��Fq[�
bK��%`(�.6���}>��
J�]��u]s��&q���{'���wbcJ���"���,��N���Gs��dwy2�p��Bs
C`�YA!�{XK�����;^n��M��"�s�)��GojJZ&J�-$ �*���f�gT����xf=n��
�?�����?���A=W�pZ��i
�C��T=�h�l$��'vgsw�S�����!+bY������9�p������8�d�Z2�\	'��C�?d���&��������,��@g}�t����?T��R���DF���X�H�_$))������,�o�LgD������C,O������"���M;B���w0�?��1'��������������b���CG�O*2,F��R����L���Q�@D<HCs��?j�`V�I�i�i���P����m�?V�P�~���U�f�o�����_��b����Gv��g
���sT"����J7���St'sw����?�Snx@�����i6���'���Q*�|�+�����������Gy����3�F�������H����|+E�2��B�X����8&���	�;�I�?���������8�]x�@^�����&)�Z)B�0�$������C����8�}�G*h�<�tYKn����������&1���m�3���HW��m�12A$N���Z&m��h�h��3J�gJ�����M��# ����@0�Y���)����,����&�h��P��f �j�/@�M7o����������6�����������`�����c���~�� yI��Y����2��Yo��Z��k��Mp������l�6Rj�"=����Et}���
����H��Z�4�}3=�1UL��T�Fl�\�lh�j��5R��Y�DiE�:
���NheS���*+�rC��0k��N�N&�w�y�Z��G�����|�|u���&\�Zc�@	'4�`$ 6C�d�2i�nY��5e�F�H�z0[��Q���>��c���_���~�>��S+�?	�=���*��878a(E,���)G,��	#J��l�V���q��L��J���.����As@|� ��H�;^��k�hJnh3.���p�.RJ�
����j��������6����n� ��:�����/������c���D#�d�G��kU��#���Y���E�
w�k�j������i���~�2��E{�/�pp�%���� >���U���H�=J�&�����E�+���wm��?����U
���	�/8K�/~@PD`�r\����f�L��k��H��C��%�<��U���T��������ub
��@�E3d���{l��@Z�i��w�/�9k�o�0����j�\tHY�)��/��{��������m�_m����
�G���A=���h����jcA)@�Q]������wp�	;�R�rskAsF�z���
:�(n�
�@�"U�d��?����H��v..����q��;����8Ae�����<�aw[���-����1X��C�&JY���[��
�g��,45�,��[>�=�V�<�,Z��2Y.+���E�Py�����b7b}6��f�9gA]�3��8���e���Y{�?�@�a;�������>�~BK��F�87�� �J���!�r��Q�dj�p�x�)���<J����(�����Md_�b������/�=�6�O����]���$�aLlo�P���V����6=���@����� �v�7���������-R�����v(�T`|]���,4��e�1r�H���L&�H�i�w�����IM���8�_�_�h�k���T���y�(�Z�m���)r0���������|7�{�6�G���*�v(�:��M�� ��Qx��,�����[y���c[�=@@=�gS��\�N�t��R��s��Mv�|�z��W_~��oV���������W_�����T�����������t�jg#���U�f�lHyR!)L�\�$�9c��Tt�L�IkbUi�\�Z�zS�X���G������#cr3Faj��g�p%�6"����]�����2nf��Q9�����77:zb+��T6���X��*Z�p~��8R���r�v_X���;xC��3����On���(M��&R��R����$0Cv}��Q���EA��q��������5��B�� S	o�KjBVl�.u���z����|��,R�vCl�P������(5�|��Vg���O
Ne���5�����9�)D����o�z�����^����	���W�`�V\W���+�O����SPT���EZ��T9�������� s���
�t���P!������\�Q� �A�o��p=?�����Y��T��1?_�v�W��g����lJ�/�������������������&$�/5uyg�����>w'[��#9�3*
����	������Q�$T�P[L�����m��N����i��Y*�|�������o���J)�dA�u	��8n*�s���7(P�u�����yH�g���tZ�9nF����Yz&�!�g�S��?O����>Vy���
��3��vf|��Gu�����L��{���������&�W��s�b��?R���2�Y��+��?�!O	3�����<5�3-��c/�/��<PF�s����$��?��j����Z���|�����?��y�4b��J�����L`eT>���B�1qrr������V{��$����F����8�����VE�E��ya�g�{���#��L�{�c�U��I�dx�~�G���r2A�E�T�O���*-���hay��s�h�]H?t�g7��o;������/��}���6-�_S������������(z����D:�����:1e�Y�?p[W�y��T�?5�m����L1,���tN�CS!�]�Q��d���c,���`��Lu;g�?���>��I���������?�H0��7M������C�8u@���X�8n�8�����,W
76�mYS�@�?$E��"C����H,���D�����?d�)L������������,�%�<T������Q��P�T��d����*a��"�I{h .9�w���@l��R��,����?WeJ�^@H�JuC��h<R@�x�h�#H���3:�������'��+�T��w4h!���}O��[���)����hn���0���n0���m���������\���j��;j9�L=`��<�(q���L��&/����/�xf��*���I�DY�sJhwM`��1���&��M��w���F>]�'�lJ>��^C�����������{�U�7����2.���TM����G�e��+[P�0�m�]
X�Dg��/(�2���_�:�5!�Rz�Tv3�6����7^6r�y`�vB���]��6"�BI�GGRG�T~,��>�f�K��p�����2�I�$����9�7�	$�)���!����$2��M�W
�*	N �r��F�K�wT������v
��H�����Lb�1b'�+%�5|��*%G��^r)��*��3j��F���e���UH8�)���3���l%�����N����N[7�4��M��Mg��@2;�L���^wx���m�����	�CD�0��}3Vk���Z��:M����wE`k���(�R�j���P���qnV��|���]G�v0�����$@j��QM��n��^���V�h�u����u��a���Q�����mC��X��B=W� Z��7h��o��� -ag�Vc"����Eg:w����4I�7�mvSt��q�kP�Y�	�Z���2��
v��xy�}�����QL�����r��}��W�����e[�&�������6�7��q6^
���~��������P���C��\��B[����OA�K�=)1)5���]����������P��<�����T���Te�����������nW�^�Py�������X�R���\�%���)*�6g!��N	�����W2�@�|���1����/����M0��e�����Q�a�"�<p���	�a��4�[�8<0A|�7E�0e�a��/���)�������H�L�!��YK�w��o��
��/���qL����*ZK�w�����|���H'��M�"�K���e��C7k��g�";�/����dO����v�9��������n�w|��60\����U���-��L�]������9��+������
Z
i���IS�;��0A��fk��I���^H���%��[^��Q^�/��w����{J��"�-,j����]H��T���3�U�G��M�t���@���>����68k���u��hn#M�7�-)kS���#��F	!�	��$&����&��p�����'��/�9G�7E�y-��F����F{������G��m4��Z/K�������|��o���_����a����m��o����L����.���h���v�����Jy4���*P�q��:��o�/��{��+q-��4A��F�����Z��������i�6�$����e4;��Z���Rl,K���F�
Fe������GX�;h��U��������My2�B������E�Qg������P1-����hC�������s���WF@y��"����������;�5��/��T����p��,/��v��H�v�KAn����-�{�o�
K���r�c����]c��H���^�$��8<O�������
�Vq3�w���6����2N3Na�R{���	(FQiQ&�������6�e������6&r��,~i)!�xf�w�����lK���"�e6�Fb���p��u8jp:�k\5�9�e�l�f��!����G�x���M����'�����4o�w3����Y�h�]�����P�AYf`��i�z3b
�xsf?��<dUyrA�`�,T���*���&�����f��(�����m��I����A��7O0U~2E�*�����������e��,K���NNeQ������R1�6L-wt����&�vh�>��d���f��%5�359�4Kj���&o��t��Y��Q�B��e�5�����gG(�+k������:��$�OC�ScJ����0�i[4��5�i���n�8�t'�ti��qNNuF�/�t��=&��r'\(U�'=�������
�<��d���	��-on*�-FJc�@H�6-G��g��8����7���
�A0!
�1nL���bP��B���{0�?�������� �?����������<�����s��PzS��	��\t*r�������92BB�J�X��	���S�/����J��v�h���}���!1h-���#�TQ��^���CfM�x�������D}���)6�S�`�;A��32O�,�e'�8����L���!j0yQ�)�����)f���Z���H��L��%�=[�\���G�,<A�(�X��?�z:��	���}!#���~���j���S
Wb�L�'w��T�B)����*��
���������){s���?u�g:n�5����pP��!A=��AJG�^���B=�PB3-��	��������M���Y����N�������yG���=�����^�{MO�9���l�N�g��lv'���3�s#��}n_���%��6��b���
ZiOn�S���|���Q�Y���^�7�����(�J�������J��i��c�u�A�9R�2�m����7�Z�h��~�uf%����(f�!v55�6��
�>9�t��nH���;����$��T]n��lv�(�J��
����ML"�x2e&��?'R�"�\����A�P+�YD�5�&<�u���sh���@�#�:	&X�:R�!���rZ�3I��vW)yI�Zp�D�+MP����"������<���������W�������_��h�����:}�����(?�����������+:�o4�+�k,������N���N}���t1,8�#Ut]���fA6!i��L��a%������J����J���m�YH!3'�g��E����3
�Q�7���@���<T$�����
�i��bCMg�o3��&�P.~�0�R��t������4!��!\(^	�]l��t���Z����>h����@
�N6�r�
6}p���Ke2�z�m�i�G�,�Z�\�	F��1\XQ�d�3�� -��R�	]���.�M;�
x�	�Ot��|4������U�����X�tR��k�rSm=����>�� ����d�.��)���oN�����.T��:4�P��;m��DJ[OcEH$+�2�V���?Sx�E�(��F���%�d�E�����0�wH@�������'�t��"-����K��c�s�N�~��>s�ws��(0�@����n2�"5"M*��IRS/wD��QwS�7:��Em��C�0��v��}~��	���-{}��z"������?&�����?�X���Bru��A�UjZ�:P�&x�,��)�e��{�D����g+BS8f���S��4�D5D���>���4a�C*��9z�KCWqjr������X��C��MN�����Y�={B�v�����1?��iQs��i=�"i<�OxG�
�4������ �A�<�:3O ��w�Sl��4���8��a�^x[|Z
G<j�\���j
��������0�� $�BQ�A��z�z�X�#���$���)�e�������)a����?W|Y�����NZ����i��4#�[,7�����������|���/	��o��'h��O��/�'S}��7� N��/P�[E�65�~��>���@Z�G�� ��f�_��!w-x�<UZ�m����K���2)��yb������	��A��]�r8����c��D������J�7|��c)�F{,�J����x
�')��hL�r����_x!I��X8��&��1n�l8��q��8	w`Np3����Yyp�B:��T�Q���������
���.5�c,\��/	h{�7^��M�_a��>��B����o�����<���)e#������Vt*r���8��[`D��(��V��A}������1k���J���S��S��
huxw
��Q*��8���'�\B�[X��K��$��9{.�&��6���}{9J��������c�G���M�����)K,��W�M���B�7��z�W�MP
�[��r�Qv�u^��,S���d�8��o��|05U||)�����[�"P<\��n���ZZ�v�+<Z��y��%X�Zy��/�k��GU��F0:3������[��e�#�\�o�S��H_�
���k/5&s�q�7������;��U����gO���ZP�,�f�n���eP�[�-����������0���o4h
���7M����L�Pj������+��r�4�H���8�(W��~�^���U�%���$-aO�����'�c������J���QD|l����OP�r@|�Z�z	#Ei�����U������Q����u2.>r�?�C|�������B������g8��S��G�4�<��N�PX�s�?��[�q�,�Z��Z�f�J���%p���3�dj�e����������'�?��2�8���xx���S����xk&���F:~�?�i���?���>n�?9�"�*�\7��e�)����������_���W������^�]_��7w�����
�D�����G�%�k?t�|���?�+�7���/��@������D��/�]|�����x�?������W?^�������?�x������C�������x�|�����'�{;}��_\�x4������Q�����?>�����xoy����\�����_]�~{����O~��]��
��r�Wv�V�+Px�]	X��<�T�/�J�oi�p+TG�O�����I����V������~������A����������}�~{���7��xs������������n��~x{�������������W�w���h��q^?���?}����O)��)~���
lu{q�&����\��/�h�����?���S��}�����]�2��\}�)2C��r�o�]����Z��~qs{������x��'��p��?��_����kz,���I��O����_}
Oj���+���K�������s �W�� ���kD�puwq��/\��-� ��\_]<�?V�[�_���w��/V��/��
�}�er�������S���.H��r����Q��AK��TG�N��S�g���"�7���8�����A���t�d����c�N���T����yn��L��S�a��S��e���������|W���j`���5YaSi0��-�@[�#F6��j(w�!T�t^0�+^�kM{�&�hC�����i�)�{a5g�kG���
T�Q�����$j��M����T����vs�,�Ajf�`CE��BII���ODA���Yz�������$�ino��8>B�&�gM9����ux�>���J��	�i~��Rs�.�J�`,�)�3�
/���7�@�"�&�����m���@��h���2�j�~���F���3?����mu&�$��A������M+�r���c�CR=�e��p2F�#F%Rk�J�?S�k�����F��)��u'9�X�|�Uw\e�����:a�����<��lT�(b�h����H��\R�hj�{���5����k=a��FFl�r�����F��;n'���i(�s�o)�q'���M
�'/��������;gF)��V(����5LTW�S�,
���5L87��Y���n(n�Ect����?bt�z���GC!�[>t��������G��{�����&��6�J�JH.�v�^8{�w��*G����P|<c������l��9e���ocU�IZ��2a���-<U�S���=1�4�i�7*�	������t��������3p������%����<3�sIE�C� O�������������w�\���k��'K�����1KT+���FX�s��o�����Pr����	���L�f����c��_��f�w�Z�r��FZ�����VY�y�����
���?G�7���~�����H'���s�����d�	#�2�;~zPe�>t����E��Ic8j;��dO�~��y��{E�@��7��ak��:�$%�Z�?� y��%�g��w��Uci5�#9���s�����3�QSe��3���G-������JCiy�M��T��>�g
_+]�,;y���U���Yq�-:�����;K(�gN����g5#��8s�:T����T����
�CP
:s�P�����)��w�*��#�nJ*I��?�V{����I�8�V����?��qG=�Qu����p��qGi(@I�y��GYk����6?n����[y&���b��w�#����8H�
�Pi>)�~"��A�'�a-~�Sg$U����B�[���P�<~��C�����|N�-�_eD�����G��H�CK�.�{�_���^1V���m?��N!w�����FF��Q�|���S����J���9��������0���������'^��z�y2�c5f��l8������v����c���;`���g��(Lm�����~~7�>���)�������+����QV�l�V(�do;���9���[|�Qv�uL`a(�o��b�
�a\
O��^��L���p5O��p,������K�VX�����w��}�}�	��K`�����^m�����b�R8��Aj����+�4E���gi`M����;Ic��q������v��,�����~���|��%�'<�'����v����-�Q���v�I�?A`�"0�����;�.�A��1D:�������G�]��v�K�$��Dy��%>-�Od�����&��4����	!��QPX�I�<�Q����.��Z�AJ3#G�:��A���N�
C�^{C�Cc���jB������t�)��R\�M.��7M�5���������{��Kr�}9`P�VuA�U!S�i�>�������N��,y�F�|�dg����j���M�qw����n��f	�d*s|0����������y�����-�N;F�����]�?{���+���s����w�9��c!S�h�l/"�9�
�N���U��y��p����U{�Vr&�x����.)Z:����TI�����Y��r���<F���^��X����S�\}�S�������������W�!�����/j$S�����/Vo/�oj�VU����&������#�4<k
���r�iG� �x#��V�g��~��U�dzV\�
��v�T���(~�	))k�51����4���a���g��	��n��UiO��YQ�5����R0�.����V��8��K�T/�"	Fio}���>�����3v�n)���	U�@�qkd#����O����z0��<Oe���5(�������������"A='�PW@��j���<�jN��5|~y���f�'��Z��� :�y�5�y�����0�����:��"S�7&=��Wc��^^��}��2>����������"U���N�L!����	��^�C�����Hk����	kC(�*�)2'�fz�P}�����j�;Kf�%/�d"(Z3��n
*U��d(�#��S����b��^7x��U�2r���z����J���F�6�����������������Un�v�b�i*��J�8x������#�o2=�FzFY����H"�-����V��v��Ro�DP��&���*b���f��C�jzv1�sxv��bN���!J+��Krz�v�Y���t��<1�Z�	vT`�������	� �[,p�@��5�Wx���R������JW�"��1��o��EK���4A��U��?4�Mm_��ny9��8�H��=��O�P�,�\�f���$-'(��� � �T����2�����X $yt���?J�u�����O��vl����q�I���E{�q"������i������x���a��si��N	?�����P����`���Bs�������R+Anqew�b�B;~p]�R4S�r�K)�d�@�8��3�)��(�d�<�!r;�#�\�!Z������k��+N���=S�����D���e��]��#�HKi��K�:���*m$3��n�����Q��z��F�9,Q�Ki�$�Bq�K��Z�(~��}�����k����t�upl�k�ZJq��T������t����9�����j������o����K{�":�`#�s����P^%5�=��.�uR�(�f��p���U:����t�Wc��H��)�n���/
�p�JT��~�J�-��
�����h[R������1[�s���7�1&���*���a6���~����wR�X�����f�{0J),��r��/���z(�?��OY2��[)x�4�)f�;f7�S�#���[V���w>6s�>C+<p�����.�O4���Bnen0�j!���b��-���,%����
4R��G�c��!/k�c��Bq���)fa��"U���>���h��6i�CU�m��?N��;�D>��PJ��uc	
��wx��ji��9��&��5Hf�oM+���e���9��XCeN�8�q�2��������m��!W*7���\��n1��4VSB�*�����_*�
E��]��s�-�'>��1
$"�m�2�H�P�GN,�H������\������3��C�6����+�+hfj�����V����&�J`�V�co	,�4.�E��\)���r��DAJ���d�V��T��m!i��G��A�������I�NQ�}��T���t)��N����L����i���5����*�x8s��8E0f����SP��OqJ�:�	�p�i>�S��I��8M� 1�	d-�x�Fzg}���i�h{V9�I
�s����}���
��a��;m��1��O�����H���M�������L�4w���@tN�R;`.�#������
�����t��x������K5!����J���������������eU�&����^h��JK�]������n����@U��W�:�Rkd^�z���-Tmn�r���}��Ga�"�u�CM���U��HF��$��U�fE+��H����J/]��8�M/���8 ����B6v����L�Y��y;�?!e���x��PN�	]��[u���}j7b}�W��4�
5�U���M r�{�!��Ae���Z��X�W�G�5G��`6�x�\��S�RQ������	(�H}�U��L��K�-��A�5e�T�u-���d�r(kf�.�nr=O-���;L�"Z�^�&�1g�Z<aL�
���M��Lt��(7f7�(�����;�p%���9�����Q����*�'l%���[�!�e�X9����
[���p����PF1����wrH�F�!7*��}��������F�����'o+��&��]7������W�?��2x}���_�����y�����������~Q�E{���^��{O����=��6l���~�����E��I����������2��#v�S_��"��7j����I�De:�%
X�B�I\�{�S�����M��j�f6�;�y3�.U��%����
i�4m�����K4'OV.����iS�^W�*��e5{�< >�����������j�[5�ncnT:��;'sO��P��;y��������C��o�W���5��a�w����%��.��PJ���Z4R�5�(&��+:n���3�s	9�m��[4@�,I�q�����q��	KU�ML�� ��o��apBoh�~0�1�S6���t5e���s�:F���+����u���������,��{�A�h-���4���RdA�M[>4W����,���.#FV���X'�;B����3��u�����<�P2PR9�iz"������TVY�KQ��X����<T;�xa����A9P���C�eZ.����C���$���4,���I��<d�������#�6�����h������',�TVIP���<�R����8��*^4�3K����#x�C���Q���Q��f����G��zx.�n��)��Noy^���IM���a�0+DK�OE���y���5�m��j�K�����y�J��T4��BU���Xn�e���:e�S�=��0,�{��"i(��)S�����(�,Q�Zqj�_������'����Kx����)�������G��%<w��
��G6��[�����������T\�"�H�
�PYU@MK+�I�!ZO��dv:��	��c3��A�qo�"sLj%���Z�4���O�eARy:�X����'u����"���8W�����_ ��h�\�^�dL-�VrW.�\u���+E�@��F~����.�A�"�rZ3�x#�:2��>�J���X����.�X<|x-a`����
�������-�<'4�|#��@S�MW7{�����
!��d�g3�p�c��H��9��H������e�	M�:q4U��,����a|8J�_`�������2)��O�/����^h�&�����;O_������P�H�.������q��u�
z�f��zWZ���Nc�����Q�"U�z�3��������_|���lb7�m8:��R����Wo���!�����.j�w�5{U�&�C��RV��e�\�`$W��J#��j��z��P�KV��l�����$�+8gx�T
P���KM�-� i	���o#
/����J�j���t
���Y(#�6��~���[�]�}1dS9H�fwnz8�%~h�%�5fA�Vv��	<����O�=�����ZK��|�N0�m�B�=e�R
�7���T@ul�)�Cu'����(��Kt�;
�z��z��o�_={��@���?�$XA+`+�����k2u�_��������[�	&����MNG��A�J���W�������O�>����o�(�a��������������E�i����0	�-]���r7���h
9�h�T�
M�9�z&�O��OQ���Q�}�"nv�|��PQ�/�u
9r��H�#U1�������WO?���M����5�������~�����!��<��������g/�e���+8Sg��@�)���Iq&���C8//~�����h�O�mRJ���fC��
�����B9tVe\,5w%������-����/ ��E��
�uF����,�7(*AH�r�-kt~#R�s��Y~��f��Z�_R��,cI�2��������j`!Axg	���������7�����Mj�I���6vy)�\l�D�^y�t>0
�A�V9�8%H^�kl�G^F"��Fy������������4���}V�ihl���Y��!�s�.����&dT��?��yeT�>����$L�i(�s.��~Z���������@��O�sH���S}�����tcp��u��B����,�����_�����������S_���E���V��\��g�
_���x��K(o��t�pm@2���Je�yN�"��Lk����/\���J`���N7 
�ZIY�R+o�5a���CX:;\�m���"�_
c@x��=<K5(�VBF9'|!!�����i�/�M*�\�v�V>69�o���r���v�c��YZ4�C����!]8E�#��3?���OT�4����p1v�$
��\�����gr#���������L`���O+:�ZO����[���j�������V��m�
h��W�b��n�SM�b�T
�w�&���}{7]!�	q��"�
l���_xI�]��M#�����/(�h�{����-`�<�E�xFq.w����g������r�����g&��_/\�^�+�|wm�Qk�������>�
d�.��s�"���ZI�+��'������,�e���	NX�l��r�����@y���(d�EBC�����"��|��a����_��g4z�o��y+M��l�LS$f���8�?]B�2_b3�3>��� e��S:��	��2�K�j�@HC�SN�	�?����!�U�%��"�s��E���G��,���)����'��������/����T��x��?;���i�E3=�h���V�(H<��S���k�9��Ws��GZ��kv7����������<����6�?�V����E�f����"X�����s�R'�3��n�6������?�>U��gC���
*�����^��90�����4A��O@�A����|�����ND��h�/����V��bn�`P��6z4��������@���S����K����k���
��V2cQ�7�
��#�Q��Y��'��V�N��s����MV|M��x�����o���q�1�[0�7Vp���]��#��O������d*�?2��K�����]g<-����	�?9��;�(06k2MR�����}5��E:�H���
���|	�V�)���b���M�>��O�&�b]0��'[�IrIP�>�
\�69����������� �����j��Q�$|���� s��R��_�'*�ac�>Gv����*BC<��gm��C���%0��V�z=��O
�)>Aa����9T|��������p�szg8������S�������C^d�J��K��:�S�	(�3"���!W�
�F<s@/�v�E��@�B��V����c]e>�
P�^U��?�
P�E?j6������
P��������U��x�k�Y\C�
PH��d1j�IS��#j��Ii�
PL7o����S�m��RU�������^h�3����;I�_	�
.��x���������0Zk�=^���8��.`�M����#m�
�S�^3�������`2(��a��aApI>2�,J��BX�����k��z��Y'O��-�$cih� W��V�v�mm�j����K\��Y�'!wq�=�9x�X��Gz`_X�%�E[o��R9���:�}�6*u**������z�u���m/���S��!��RdBo�)����r��`�*_0���jk����r�o�7)�!z���6�9�q��"��>Jd�c]!�,�)����p��'���)tx��"�].Q ���k����1�#��'���J�Hq@[�N�jM^�)Px;�����ls�OG
X��G��t��^$h��a7
{��������K,HU���U�������I$��^J
����Hk�XX���c0�i�<�������*�`D�M�����[�2�3�w;����|
���*h�u$]���{��a1J�;���Y�V���x��7��������}t���\����I{lMA.A8>������H�mAoI:J��E�al&7eQ����#e����=���M�mnRJom�-t���bH���d��K84�9�pj�6��3�sB2n�z����j�e�����.��^����5��2Di�Mv���-�����u:/'Q�3Ly�����������9y�0��u�{��x��g���)#`�k��'���kk4h�*O�s�=�n|O���H�x���/����N�*[]li/.g��-+K�"���:�e��E������I �&]�����\���z:o����=Jx�@j�0��E\Jvq)������YD�,��}��������4	����Q�U��<4�s��,���]F:�������:�(t�)��`>���3�O3GE�$��1�Ky��3��H�%Dd-�z8"���?�(��o��I�������W��Q�^��">n�gC*�����C�U�������E�W!5�����)J}������:|�gE��d���W>��?O������^4�3��X�g�M���V|lO��C�3�?+r��"vi���YG��.,N�YF�c�u���q���42;�aj��G@7o�������?[��W��m�R��^��90���}��
��`Qugd���d��tu�R��c����(��F������`�������Ia����>#�d<�i�t��d�AB��*�p,���5����
�%�b���
��Mk��^�O
��&�+��{{y�2�����y[�s�+#�2M�m@~D�p�P`5e������e���P�i9l�
��O? ��IF�O�%��-Rr�l0�f7(�e��"�#��
1�����B[��)�NkE��v�%a�Rh�r�	�v&T���p�y�q�pj��`����l1��C�l���{�v���+5<��zHS�p����X��#���V�;�Q/��@Gvf�B���,�Ca@����*PEj��%w�;��$��0�H���A.s~ �Tl��`uA!����~���|Z�R=�-�
�
p������D�[J+A���R+��i��h�u���5U�����
��������	N�����4�~�����7gcvK�0.��)����uK�r�O���$-��I!�T�w2!E�Y�_�z�������>�����������I���l>w��OVC.q�6\���j�^�e���?�����p�Z�h$��Fz�I��'�������@������hQS�����hQ��dQ��������� �����^h�U�����h�o�G 9��DU�\R����@�}���"F��~���48�/�';���[J��E3V���4���Q�@�FG����$������R�D�O8"�o��\RtE�� f��������
e�nS��$�H�
�[yI;5�K�!M������6�B������&�O���J��C���wKOvu�-A��G�d���U�������*�]��7i���L��Z)\}����<��x���CE#?s����Q+���fo�SG�&���t���<�y��u��0���	/����[DL����w|���k&F:2�o�Q�&W_�Y\B�����;@��P���3���[�Rr5�g�����t������7���WxK�������m�����^h�&�����;O_����$�P�A��em#@�HT���#��>>����D�;E6:(
������~~������}���/�_�������nJ�p
�����W?��_�������"�����{o8���*J��!��)�/-/�%�#�E��	FX</E
V�j���7�����7[a@;���-\��>�������)qr(�tp�6SL���1��2�|)��+9 pP�������P��^����3d��>��iD��E������-�,a>oA�v��b�D��&�������������_L����j�������#O�W��	�&ksNy�
���J���� %xG�C��Y�}����goQ��|���+�ol��6��vM�n�ku3B�
%]���T����7�3Pm��U	Ww��.��?��������������|Q��~�����w��W�w�y?0��V��j�@���aw��%���������)�X��c��i��O���!���l
�h���:k���	��!��6e���:R9}�����~N�_I(�L���,Zi��#��?����oW����w���<��������g/�e���+8Sg��@�)���Iq&���C8//~�����h�O�5�\q�����NhV�����AnQ���_�a���R�?���|���A�v�������&���	�R������W�P�����X����[R�1L�lR��A��*���8�����"��V�����z<����y0f���V���8�Q���`:��SL��~�%����P�rh�m`���"3KQ=���o��Es�5�u��_�`��*O��\8����H93nH1��� ��W�"�`p�4a���W`AR����"S&�R�k�������.�o��_�?D}x�r*{h��_�i��_FQe:�s3�P��_;V�
;�7��:��_K[%?�R��
��
y���N�[����R�����_��5}C-�~���A�w���0�a������>���WJ��vF�m�*���?"�ra��>h��H{u��������@�e�����}����P���
aE�56t�Od�)0Yb�(�m��jC�*� g�+����<
�c���|q~[`��e�'���zgcn��k��0:��28�m�j��MY�����}�\�| 
K�cFXTEl4�
���P
;�"/Ta����#��m�`����/������`2�[VAp;�E����r��Y�j]4i}���D://�rx������qA$�R
DH��X�t���cE)�����+��iA�8�S}�
\yf��SJ3	~��#��
�n�,������_����������O}�h���V��\��g�
_������5�P��w�n�;��d*�^()�b�����i�`i!��w�hF�tF�!���q����^R��)��/#�2�4�	��_��h�:�KN.��T���qheC&�-)zC�7.��v
�c�P"�ZE�X��|�w�c�c���9L?�r��(���hlU���Q��T��-��O+CF��pS���l�m��bwE+��$� V���<�,�q6�����
yyv��f��n�v<�������������[�<�x����=B�m��/d���	�����x�C��'����4�y\Ck�.���.��Z���o�����g'e��nf^wmb�
�^Z�u��.��Uq]����b/�3�7�~_���2�������;��B+$]9����vW1jw�.���7�}������6�������E[w��)U��FL'�4^Q�����]�4gP9`Z&s�����`����m���e�=9��k����:-�A�����B��y���4!������^��i����&���;&�!��hf����?�P���}.��d%�S�vx?�h*~�RO���cd�S*��).�u/���B��LP����v���AGZO�=���b#���#�vBG��
�-jK��m���~�V�<����J\i�n�i�,��������g���T����]�;Q�G����|���7�������Ix��V���N����zTd|���?B�o���B�V�W4C��z���]� �?��_�
*��c*�M��U���A;��!���DGo�F5���������Y���L�TE�y��W4s�R������^�0v@�����{L� �4\���Q�ei������0}=���\�>�*�y��"���9(�����m`T���]�r%��:1j���h������Ls�J��f�?n���K��C�����\�A��K;�"�Q�\Hf�%�0�Az��`�,��@`��?#3	!�V���%p^���(.���EP������'K��y������.	��$i���VP%�9"T��dQuJ;&x5�eWL��("��<����h�/�e���0xBfc�p�
�h��[DM�!�����R��:���I�b����b���B���X6�����z�V
:��Z��
{���R��a�H(��z���"M���d�
7Uiw�N���r�TB���6��9_w�c.W�%DY��N�Z���pcc�3T�!��St`O+�A��t��
9�3`t��8�����B\�r3���<����Ot$�P������PZO0���Ik����,�����4YqH�cY���/+���l���P�% ��~���*�("�V���y��4A�\(&*�!�A����"����\�sL��`�]K�n�5*�T!�j���4���"Kgcl�=����c����f��# �Qj����j�F=*��4|s��s�	/bT�	����
hX�G�?�\�����7-�{��5������}|x��?�N��yn����~�Gk�g���h�l7=����A����H�T*�H�?j ��X-[���Y��x���Y�[�Dr=����YQ�=ee��/����p��V���NH�)R7��S'_�Q��K6M���c�G�	�xK�-�����|��?���hC�G���
�u��R�qw���#.4x��KDj��t����CSv����HG^����@�ud�
`,"���c'%~Q��+/*��{�G�[�P'��(	W�2.����H�����HGY�Ue)�����	�P�� T������r}��^�{ed����E����-��d��*#$*$Jd���PVF%����������9������������|=��|��y���?��������ZPP�F ������}i��j�F"@b��5�=a���i	~�Y�9B�v?1k�h
 �������P�Z���)��6�?[�8|fp����9
 A\%����^�OEC�
 ���.�h�$��t�.
 �p�	G���?Q��@
z�w��8
 �X8
0~��7�m�?����[�~T�aU�o�W���^�Gx�����Ec���n������?"7�?~��f�������G6�?n��;�?��
��'�O�8�f������_�	����`��_6�����7���&����o����el
���~7��� �w�@���_�V��|Z������k�/��&���A�� �b}�n����� �~<H��Z0�/v�?����� �Nd���.���/�������_�o���@�Akz���BA0�Br��/����@ a ��8�\|�&�����8���+������[�o���/l��5|�Z_Q���8n���.��D,�w
B�7j6�_>'�����
m�����������T��6�_~���Nr|��i���A��B`���l�����/0�Z���u����m�[�����D�2z����Z�����0Gg�;
�+��/Px����&�7V���E$tk?�_D��Y,�o�A|������	� ��	�����'^�/����_"���'X����
�SY�����>��1��s]���?ahDhk��!� ���K�_��s}r08����C���F7�n�3���|ko��4�+i]���?�h$	{H4�`�����O��#�(X�������?�����7������'
��F���8�/M]���?QP`G�>U�m�����?��>��'����p�{�b��L�7������[�(Hu��f��.���?Qh(�����{�O���l|����	��F�O��[����?���G��&_��C����6��-���/�M���.����Qo�5&�����w�������4�N�`���
��	���.���>>'���(���;�
���?����o�x��������/����#��h8
�]��H,��� H�?����\@�����`a���~D������%��?P(�c�J����a���d���D�����FPG�p�/���`�?���Or��0,�I�6?&����!�a8����P��<����S�Ap ����O��M��?%L����3�`��P(�;0��`O������o��C��A08��05}[�
A���O�&��7��A��[6R��?�!�l��&���&��&����������n��o����|�t�����\��!7��^~���~ ����A���6�?���7���&����o���DCP�u��u�;���
��i,���Z����`����l���|@4r��/��������h��(|#Y�o��A;�ZG��P,
����Aa�(b�D�E~jz����q�������?X���~H���m]o�������]{Qh�0�@7r��??A��*p_�>�^�X�?h����M����\������c���o���.�6�X
���q&�
\�&����_��ZI����m����|�8��B���&�������r�0p�C��Z���9�r�{�z��ke���/����5�P������k������?��,38r��g�r�~�=��l���������m�����\$v��;��o������`��?����E���������%����hp�cA.�5���w��Y�iL��9���?���A��w�V`�=�S4����N�����h���T��������Y4����X�_�H�����A���c$�)��_���p
��M��/���c�p�A6���|m����J0���e��`��v������ pk4�����������EM�b��"���;���*#��o��A�1*�+?.��7���s����?���/�U�o8t
���l����o���&��{�7����0�z��~7���0?����X��_������X�
b������O������W��Qh0�v'���~�A����|k�70����(XY����(4�.fm�&��K������D�?�����6 ���b ������>��DB!(�z�
b�6�����`Q����p�,�7�
�\����xk�&�{��D� �8�F�G�&������5k��X(p����!�&��w�o�c10��D\�k�����o�����@��6������f�	�N(����{������o(n��+����M��_�o����������
�V����M��O��`�!�XW6��k\���u���(`">��FaA�i����^��!�a�_�?(��Y�����#6������K������|�D�c�pp����
�)?�[c����n����n�����p8��?i��1�?��y�����(4�C�Q��1H��P_�Vh�G��C|��H|�0��Opy� �E�fpr
n�5�g,�H���GA�P�	�VcQ8AA�h��X0��C��(8i��*��wO.�*���">M�@�������C�P�,
�y�A���d����pf�C�
-�S\`�|}���~��@��l�~�� � �>�����~J+"Q0�o�/ ���R7>G�1_�7�*�7
�����|�N6��_3w,"	w�X(
�����6�u�;���M��0H�^O4����S����?����`��
���&��8e�"��[����w�~>������.���`��������\r4����%��������X������4+���Z��u�3X�_F6ak{�cO�5�3h������q�|,��p�_�:��<�a���|��Y��g <	��������1�����lM����.L�y�vX:���Q��0�?V6�0�������
����k3��w�_<,>+�9��O�������XM�������@kP�jf�6�O�e|�>�s�?;0���a�x]~0�8�i����.�WF
\pk,��C"7|�����q`��!?GWf�P?Md]����}v"��D��/�����a��	p7,�U�	$�A~B�"~i���p���]#���(��`����8k��O��a�?~�w�O���=y~�p|6�ZO�aAc���L(f��1S"������/���a����W�����'���W�
�������6V�g?AvQ�� 0`[`�b����>�U~���]�h�EN�/���.��
�5������������z������g�Z,�B����`�(����/����">�Q����!���y���~�|��/
EA0�����������]�"�i����o��FA`���B��'��A�wmrkLG($	�B7�����b1�q��p(�@a_�J��O���qP0��Q�6�[�_���{��h���~��:��_�I����0�90u�c��P����/��:D$|�6.����w���phnh���/
���kj��f��g	H�]~����<�H�7[�u����~�������k����}����h,�����(����w�a���?�Q���M���E��a�Y����J�z���� ����D��]���Eo��v��$��^}�1O$�i�~��/�V8��4�����/�`��s�P4����~�
&�k�����/7�Vt�>j���/h<C��nD��0�F/>�����H��F�����������kk�lo��p��c�����?���7�?8�/��C"0�M���
�-�xx�xx��c{O��B ������;�s�Sv��1_�7;ng�7u�q��;X�Y�����!���	�#��--]��w^����`���a~����13;S[g1�
�`g���HE��>T�JrZ��U��URP��v�a������=��=���^GU�B���5��h�*�0�����W���I������lH�2[�����	��
"MZ9My9������W$�
�|�j�IQ��s��������n���O]�z5���������/geu�#*��%��,r����&]�O���4����a�Y���Y�^�N���������������j��9�4�e�b��������j����nS����D@S����zq��6���*��0Wf�Y���0�>5����xk�g�+YO��&0�K���Ros�[<z����VXd��y/4K�
���k�|�� ��m��B��E��L�����y�x����%�avaE���+o����Wg�V���Yy��y�1��Q_E�p|P0�Q_���|�mu�-&�
�������������z9��6�{�d9�YC����,�H5��#���`J{9ZLA�3u�"w���Jy��S����tu�~���]�s����3u����Z���F���W����K������P��d�������������.>,:�kz����s����S'�N8��`��dp/Wl���$�ti�<M�O����]u�WH��Sn�����3�����6~{������7
"O��_�����,�����_�<�w�Q�����"m��sk��P�X�����l|����'�
W�*z���|/���X�����H/�T�R���~��N��C�#����W{R����
W_�|u�������������� wE�4N+�'m��O?,*{����0�S%��y%Y�[�O�����yI�FG��&_b.��Q/�k��	V/��%�*9�r}(B|��,tg�8:��y�"G�C�w��7�������y�:�}�mW��KvK���XZ�K�^Yx�����R���8���p�9�f��H�E�]3���v���g�[S`�w[%?�����4_/T��R��O�{O��z��d]���U��gD���
'��+�X�d.
��F����{�O�n2^��w{z��A+��
��nG��-����>{[T������#�oLh���zh`��S��"��Q��HS�"W0�<S�SRJji�����K�������sj��b��.�b6��|����i��7Y�{s��]����#�um����xT����3��w��J���BX�_���|)uiD�����2R�_Z����u�����ea��#��
��:�o�e�Zj�M�I��N7���I�z������z5��{���f���.5�A���s�f+\��������d����v.�{2G������W����[��:���3�u.���%we�m+m�^HW
R/�]��:�RW��B���o;��}��e��O�-[�<T�-_��y>7O����Z�l�
����P��ZL�L=_���#y/���y��v��~j&�Xb��`��w������0���^� �7��d�1?���a1m���4&��6���>-�2D���'�u]UV���oh�~9r����;�fB>L�#�#�o@?|��+��o*]^5`*��~������v�
��r=��Y���YF*]��������'��1!j+z��a�����S�������KH����R�6I�2�V�WOT�����q�u�I�~����-8A+>y�Cn$u��2��E;����e<�������l�J2� ����}��<�7���j��_.����P�?��1��U*`������<\K�X��i,0� �rDN
{�:2��������'*����)��q�}<�d��)�����#{�q�;�����+����)�k:��Za��c.[,qp�xe�x�h	��o�*Hc���������^c��\13z.#O���l�7�q�Tt��k~�9.�m5�~G���)c�&�gi��V4����/o��;
U-�'0��:]d� ;�]GG]8<nx_�3p����/9
{�~x���,��������0�
\=��fC|�]���AhX�&DW���QY������Zl��R����*��x�"B�;yb)
��O�B�o6�������}�����	�MNgeX"�-i�'m9a���/r���^i7���96���Z�TA�]��)�����6,[|�O.�F���E�q�J�!r@�X�va4~��%�8��������gR�>��pE!I�32]�������_�u�y�P�F\�O��/5�[��������-�?-r��_�ct�q���vy>yoe?t�u�{4F�d���;�����S���~=��`��3�����yM�s:����Z����qN|F�,^�Wr���,�^���[Ze�*���������p����������l���"��K������W���r���dcs4�7�Ld��3��'���<P��?�C1x�5�������d� ����R��D�N������|c������������V���l�I�]�Y������[H�Ed��/��(+���6���\�T����Xy)PD9^�0v����d7�?�x��G�s-�>�Y��R.w1n����3�L>�t�d�V /�El)n~d[�2�zsZ�1�N�?�as���)-"�9dG�f]A����Sh%���}���]U�Y�����S8��}t���J!���8�]�e����;��U8�����_����dg�w��=�&��c���x���.�$���P0Y5�j����X]�W��(����l��;�-,$(q�<7�.k�;��g�`*�b����
����T~�^W�yY@�
��B��<IN
��Pf]�vt���t�����#%�6�"*�x��~�w���NG{U����f���4>C��
�N����`�N��'����G���T��wZ4�L��
����i��U\�[����@����%G���4��8y�][�;EK���N�	>G�E���'F.��:E�b�u�
��.6=r��c�))/���Mg������h�@�t�>����}���o��9$Y/lJ�3��q�C@q�|��������2&���M���(������/��}V�h�1�v{�t���]����`X�s'L��?�w��I70#�����&�����9WS��5�+b( �x�{�ZA���3�C�'^|�� ���*�s��5J�c�2z0z_����������i��Hh�����fZ��K�1#���p���%��Xa��%�[&��6!���?�u������3Sw^����Q�����&�^��5r{l#X����QpJ2�,���vAF�Ac���	S��H�6����vn�mk��e\�����������v$:�NH�W�a���&��Hx�&�l��dA��Y�mh���Xk��!�{g��qT/����}\�M���7����CL����1Q���s;.��P~A��)?�7?L����������W^�}��;_�F������l��f��q�"W�p5��7w_���9q���-{�<�_��0G�,M�?:�J��e1q��Ng����{r���g�3h�XH�}~q�5��^�W�[|~s��M!���V���$\"{YX�9�S��JE�*�u�M���W`�K�������z'^�E������$����V��sH����/��7��"'��#�%;ug���I���HL<���W�+����6%#�1��9�l�:�G��	�Ma'J� �sG=L�_���_�Uw>\"5����qx��i�6�d�4p��C�$�|1�%�M@&��2������]��5�NEX��Z�#.}�u���Fumb�����[\[����G�1:�����(6�m�����v������-��O�@�Y���W0��L��Z�K��Kn�U��.Y�������.E\
�i�j>l�������)�A�uv�0����BT
��1��&\$`���_�8��F�����a�
b#�������;T!�%��
'���;��sj��A}4��KT:~���^Q�i�e��qE"��wF�nH���$.��������*�q�=�{g�4�V��V����
�RDSg���v����*�5N'��H@��ZH�X�z�?CW5������`F�U���tP\36[����7�\r�"Z�U�5aIq����>�ge$m��B�7xO��k~���1��
�h�T��P0C�����We6*��%s�u�V���'(`f�x������{;�����d�����v��5��{J�N��E��Kcs��0j�N>�`D�o��j������e���-|�%m����6�����J���d��#��<Z*��qV��n.��U��V@�|��p�o��qh��!��K�<�5l�-�_�Q����x)��4���m��i,D�{�lC��"��g�%��=�.��<d��.8~`;���q�d�B�z%-���D8K]N;^�&��|Zr���]#1%H�nMO�'^v~=q4������SwB���3�;����������r_����~������9g������j!.l<�9���T3x��|3k��:�%���{9e���h��P.U���h�x�������;w��qA���Y�O�N���a�A�b��l�md+��t�`Vw6*�5����;��X��>	��9�pkr3i���jA��~�X�;I,��u��4��3���}�v�:b���WYS�C�����;�t��^�I��
`P2����#*�6c�E�\��nb�5IMBg��o�/YN6��M���W�<!����N����l�u�����K��{�)9��:)�M�Bj�-��0����n2��o���,ZS��0d	�^��s�����&%�j���'��\��V�j|�����-�
Z��8�j��]	(�>5������%<U-��yb5��3G8����j7���2����$�
���W{�����n����5��6a�_y�m��^��19�J��M�{���a�#H�@��_b�:?-wJ2q����Xnez����Q-�/
�����@n���7��}f6���,TX���� �K�on�����
Up���j����6�>.�>9�����J�����sH�](}�dBf��B����]�TS���&L��1�q:�
��{<3T ���w���v%k���7F����������s�����;�nW�{�&��v��pVM��m'�gw/��W>]yW��z���
M���W����_m����(��K�Q�����|�\�n����2��F�������s���i�f^����}���
�K���uh�I�URYt3�+�,�Q������Nfl�%�Q�����r��2������/x�<�+�[����/���k�X���M�������������M��r���5>���<I�zI�z��I��h�^���d�x�\��
	�����~������G�������W9�I�_&��N�_w��>1�3S�}1����:)�D�{U������%�,�p��%(Q'�!on�X%pe���z�~jR����~��
&��77qv2��Sq��z��l���\AF?(�>:H\��3f��4����#"b�����/Z���Jj\;:>�G���5��io��
g;��������*���D+rUX��,���&�A)�&�����IT�y<�~"J�w������������=�};�^�6V&;P:�I�/�%�'5�r*��=�����e��r�|	�l2��lE����>�.����A1�Xg���A	�Z��<:�%��
|�g�����l��)�q+Y�N�exC#��#l/7g�M�������(W�0��l��x�%��#���TZ
�_5	�������v�=q�R��7��{e$�u���9h���
_F?"+��6M���!rH��t�=������uQ��%p�'��PV,�*�?Pd�W�	
�!�U��,�{�2A#$y��y���}�'�/���0�f���4����=f�	��h�+���.�W,�3�J��ZJ�b����qd�T�-��
rF�j�����'�0�<�v&�[P0���q��&�P53�0d��F|�K�>�R���)���P����ml����������AY��u��!������y���T����X�%^|:�kb�K�V)Y�J
��Cn��'��h��A�\�|�r+�C4��3��������O�2��D��p�z��/�����r��-������jb���%��<�$�}�dTz����3$�U�7s�H
����9���P��;!��orp�w���wg�tH��K��4Ku���O�����C�f�f0#d���tl���LK��[��wi��F�rb����L�������u�4��so4���kBRe$��^���J�j�'=
����@6s�����K9��8����Q���e���o�Zc��{���`y�+�}���a���}��L���0�� ��*�+����2����9u��w�B�d���B�e9���I��TK�QLU��`�4���Q��-b��-oki�K��^���H���RO`�����k�E���W��e��&X+#��������{�Y��+�c;L�G�<����1�U:���>:��X="�n�[do��
Z����[_d?(uW
0�����zP\J�2��O�>���J��q6%�q]���W�I��$�GN/����B�";TtX|0���R�l��b#�^�6����u��Ki�i�3�-o3��:�e,��v���%`����l2�y���3�IU����"���^�r�������**L)`�q���D�p���l�+�N��g�_�j��r�(�L�|Kl�~�L�����m�j�P���|��)�f�\!z{�HY���������fY�
%�*\�<j2H�&(�;��z�>[����{v2]d��z-�R��Q��n�'~��o �2�f���W�(
L��2sn��8�}/l)�
��}	�b�����������x�nV��U�~��n7bjk
�[���#����!�K���g��4�;���������WV����Cl=9�).w2Z�l9OI,~�E������!���}���s�������v���UR�n�chAq��b�B��t=y�Y����M��)�"*���Z��1�"H�^�W�Z����q ZE�v�3���0�KI�'�� kN��W)��{�Y���3�#UKG�i���n�$��%�"��6��A���Kp���j�������y��0d�m_�>�\.��b
����f#�'�0�!���ao�uUV���Z	����pb�p�$����]����<�e��G���vu$����d�()3�RT.�����il� 1IR2(/mu�f��1f���������6��S��u�f�V��yLwW�1��Q�1�����a�M:��o� ��w��.�=��j�P�x���i\��k�`4���Iy;��������iv<p���b�(�v�D�9�s�L?)�������D�-cT
��b�c�(}���g���]}c@k�r�r�l����/,r�"��e�^�<��D��Z�$n����6:�ZU�^3�m�1���p�q�!���}m���gYa�Y��K�������Z���e���p�Y��5��B��M(��������N���:��x=/���3��a�
��Y�5��b[|���Gz���l�UY�i���������.����$��w�w���w6
[�"��8���B�&�H���,�nes�����2����a/�b����'r��e�Yy�������+W�"5)��+���74:|�Z��v4V[��0_M������`��c�:�����b%�g��P��!��K�-����&���i�e��!;������-:�I��BD�o_:fI�H-1-DG��H$�0�'���ZiF���]��h}te�>�����TH���	*6f<��A���'\�&������z�]� �x�e�0�r��h��3�D��v�<d �Pyx��%34U�~�eN�|�E�����wO�W������w3��t�=T��J
�O��:`��Lp���c|}���Jr.q�U<�oHb�5�^r(
#��#�u�IO�MOW�<����+6��_Sa�;]�0����r����v%���
��IHd�t�aL2����Jr_TpQ��y8,�6��A(}�7#l7�lfW������C1����D�t�hd�ltAQ����7�����������{9HL������E�J�.��y+�-����;��fG�����!��NR��6��2u�L��������]�O��g_<X.��)_���B����6��^�X���L'���N�E�9�3��I\L�Q���2�1|0xi���F���t������&1y��� q���!�"n����S��"�t����m&���=�u�:�8��V�hU<�^��r�r�������fQSC2��g%�L	��\_gy�^����Kn'�>0:�D�D�{�o;��3���X���w��(��P~=(L;���}o�*�$0?|����zy�E�j-�}i��I;����B���r��7��������I�C��O�n:�\�b��`_`!t�����!����m.�F�k��(�"��|���Nm�?\T
����7�Q�����8���X�k����,C��\R���9Xt�����z�����R��S�8x��h{0I�y�������x"��fp���>�x�`�oUB��H�����b���)�P
[�h���a���$?YM���;��"HL�{������Q��C�m^��!��y��	Qs3H�$v���������p��N&�w�����p����/;�#'���Ua�,a0RX�kFQ�x�	����-�C'���(k�:R��|NW��B�<��X��S���})����r����j���-�{�6_�\6�V���'�����~�����n�V|�kc�w�'E���B������W�*o(��1��!���yE,B��as��Swk����J�Uf�|�Zg����n�+��<�yO��.�y���y��`2�6��y� R���������T���J��e�{����)���(�>O�sS����B3U���Lb����z{U�U�����
C�J�+Cx2U���9f�	��4�����g>B�-O��Ru��IZ����4�`7�iN�q,B�7���dw��a��g�1\�kGz�b�w������>�=t!���!4��6�3G�}��wgC�Y�F��=�&�\}�:y��s����&�|��4�q�1�xi�4�1����O���U�eq�q���D4������!�=U-�P?�)�������#����������������/9��lwP{} ��7a��)��_��/��p�)&}�� �CF��A}���������nk1=N�D��������7�����)4�T�E�0}���������x[���HH�C���Z/��I;��/�X�=Lx����,��n���"'_�T<:XIX�f��4�e�����Z�7R���{��7�������_����������7�v�L����<O'=������8��>#�7+O%�[�����01��
�I��(�-���1��x�.:��"�Lu_xL��������E��~�����+y��*�9����,
����ZB��^(_����E�3U:�6���%y`u_&,2�|t�����qV+���f$�CO���4r�N�������	uo��	eu?�0U|��4F�
Z���L���j�z�������)��f�������Q:%��������{+�/����<m�^���.Mg��r.+C-�������)�>R�~��{1y����@����-�
���'k}'�F:`�D�
3
�t]�=��C������Fbm.'�����
q���=�/��O}��q����k:DJ�)�os�]E��3�@�z����q'W�'�q�FP�����<'����O���~��"�cu_�����J��d�Bk��jB\U�N��rw<?0�Z��Wd�!�^��`%q����3�1��IMc,�`:>M������Jba���5f�{q������.8�)�Y���k���}�^�{!)|���i"�;�
G��?b�2,8��X6��fj��������(W�v����)�*�_�R	o��&���m/���n%5%/w���l.��W��-�����,��m�t������:�����\�iA����th��8x��I�>��4E��C�+x��C8�7�[p��,s6�]�������.������YQYZ��q�w=��~y��$���bWj3w���[4�i�s=���{_pA�)i����(����l����:nLHzA�"�����'��L^��~����c�����2��y:i���cw��C�����B����9���G>lH��������d���}�M��1m.�K�x�x$D��[Y4���^�HM�V�we�����J�%
E��t������V�)�+�������V�S]�x��������I�����G�p��GQ��r����/��Q����3�2�"�]���2�#�
(�L�f���
�)�8��O���`QD�
��[]N91�b4KW�(mn~82�L�Uz�����;Iv���L��2�N����V��s-T���{��c=�B
��%2|	f�%��ja�T�����H�f�V���^��7��
�����OgQ3�-�y,���Z��K1���r�;��x�����N7{����;0&��Mn�oOk;P���:�]��"��K��y�z�
uB������|:?{�
{�����jB�����]@������XN���V��^C�2�!�v��#V��v�&�hX�����c��y�I�}� ��N���UO����?3Rr�:��`N����������J�S����O�\V���
c�]�����t���*��!	���xt��W���s�]�E@�G�*$]*8����������u���'��G�������y�{`k�(���3]O��3���-Kh���m
�c��egPA����L�Z����������/?�HX"H�:�*m��kO��pd��4���	�R3s'���9������D�Ghn����p����r��}�t�pC%����}M��&��G{������A�E0��g��s�k�#1en�t��N��+����);i�IF���	�q��B]o� �h����u��X��Ju/]�V_9Y��Y�x�>�"���w�M�����bz�}_�1+��k��.�����HaK������e���iT�����N�����.�6<u���l,����)���;���Z�F�������J���������Y[I4G&���fu�f��
�k��G�j3i����4>O#i7�z��n��5�w8��i����
��������	#[7aG	~�S����-ts	l��.���n�����$X!��DE�%
2��U7�����<��	��]{��{
�3d,$��y#S6+w���L�t������.IHu3W�b'-����:�gk4�ag�]�{o0iA�(��P����|���!,������<��-�EH*l5RF2d���M��31W=��� ��nA4�o����8��^����T��,��D�"���`y%/���;=~(3�pU=�����v9���wB��3[D�*&%7���:�J~������;vb�Gy�U��1d,	���v����w�����S��<]Z��h��|
�W�����qL/K�b<����,V���z�������g�xb��	�TdB4e��$o����bN�*�:�h�|�k�rqJ��I��
������d[�K]���R�ym���	�����a<@��,F{PA.�'j�^���c*n�6�463kl����u�.�9�__y������=Q�O���MF�`z��)Kje#�dKp���1FK��$�F�_�aU���=>Zbz�|�i��A3���A�L�:�K���'c:ev���s�C�����m�4K1�������p�R���+�
��/�T��Q3-�J�R�8W�of�H8���&�v����OZ0O)�����i�ciI_��^�P��\:����"f�X[����u
���Q��tT�p���G�tJ��'W�]���:��MUWEL#���}?���5t�,qW�!~_��d��������O,�
M�8�f4)�38���.�!#�4�1{�hQ��O���?7Zv�n�K_��#Bx��rgOK���+Po����-5H
���MWC�I(�}x/y�C���#�����K�o �uf6t�W�E��k�9_Z��h8d����vB���/���UC�������]"�'��	���$���h���PW%������ ��}-A��;Ire&�������)��,_�m����&\}zF�7'��%0�.o����!l�����������+�������&���9�w�F2�0V3����P���N�S�.�g�y�K7
+�g0>�*tav��;�T���K�h���/c�D9:x]�x�Y�����Q�
7��T�~�d�����H�;A����y/���{��1�x�7����SnaM���V|B�"��r��5�i��q]Te���&5��f�Y�J3Q]�z����R=5OM�����.�������7Ds���5��l8��Y���-ns��O�[D�=��-������6#�,�����EO�t�F��;���T��~�����G�Z���R��F��$���0F�?�8~|7�3w�X������?R�����'6���+��h�Z����ac���~�$>����j�U�/Sn
A\�f�/����Hv���Y:��������n8� ���ma=�-�!l��(I�3�QR@R���M\��s$�Z��Rq|��p{�3��%}��jB�O��S)�v������� �~���G���}��[3�3Q��5�;����]kI���W�=G��9s|��W�=�����X��!���O�	��7����X�
�����oC��j���;��1>�F��M�7�Tg�)�a�V�E�#i�p�[:��&�����#�i�w���t������j"����[%�����_4���Y\n�w��~���/],/�W&p��X�� nN�o�S(����Jjt��~��J!� I<���^��<�6o��:W����Z
�I�EW���-���?x������P�;�A���:�-�������wOp�����B���siE��C���d����-]_��{~�zQ�@
�Z�R:�F@��3�>������u���@g����y��cz
�4��)�Yw����R>����_r�������m����3���uDv�;�)����������+���`D�\7�iU���:�*�[��:t����T�{��GKa�-#7Ij��Io�K��m�a���,��_�sW\����}w��A��t�^S>A�&3"1iW��Q���&��&J��G�0���~���q3��K��x��g��*gP�s�P'�Wax/����+��G�9J@Ep
�^E��n����v�p5��9c4�F�xQ6m�L����(}.��#zo���V}|j�%=��*���z{^�����>|��e�V��5�|6���[�zxPCK�K���ywD����
��Ztg�����Q��:�����sGx���'pW:�,�Vn��~��6~�D���������r���?$�fP<���<���wpW�g��k�������������(C;""B;�|��Ko������z�Q�����#rCLq_�k�G*���~���v�?�����=|Q_d�O��HjAcQE��3=��7,�x�v�W��{>?b8)t�+}��L��|g\8�L������#���y��s@����M������r2�haO��g�6���X����3�����7�_�p%R�^qJ�'#��A���O��������#�VM=��n]�O�{9|��q���K���d\�Y+�������q��#G��ps�Q�22��U{���@�E��v4��(����G�2��*Td�Z����	���t]���r:��j��*�.���M��H1�o�[lns3��������7p]9��>\�W�]��.M<'��J������N���<*���a#�p����:�'l��{��=u��@�N/�>����!p�>Z����k,���W�^�c��<X`���1;��i�cO���������OuX;h�������N��81���o!���P�f�oq��J��g��9�j���/B��%����^�/�O�M
���3oH���e�<;��������{>�0aY�J];��s�
:
t�
�d�:,�!���l�>{��s��?�u����c�I��x	j�j!����&E$;��n8F���B'�>����M\0�{�����\�	/e�v�����~���Z����E	q���3����t9�]t�9��Re��%��z*Q���3��y����]XiV�����Z{^�sf�k~�>���|9�A��r�%q��:���q��<�f�.i��]��=���j$S�?�.��C�[T��(&G�4�|�v�z_$�����p�H�)l�C��Q4�U� �tRfn�n��J�n�L��8O�Js�>oM���P�jW�j����str���&����/�m�E�\�q6_7A���)�����5N�V{c������~���G���B��#����l.'�-�������q�����}���K$�o�I�i��dJe��A����#�r~.Fv���
`��GX��3�'���~:���ui� ���*~!%�������ZPa�N�� �PMKN�J^��'��G�;@gm�����m�����b��m���b���m��������������q�{/�5�>
-)l�Z�4Lg�����B}u���������@��D��	�2!aC�i����6�$��j�AE�����a*��X���%���T_���
E�g~��������a�"��=7vV/�|K1�����F��Bz�C�I�%���2${V�-���U��"1|���k�ZZN�h�t�Ob2�Z��o���yA��+N����pC��&�4(p�#nK�a�����3����u���������d���H}8��
H��2���N1�
Q<����N}@���
���0��{�!����kL/�BEX������
~G�tFI���n����������K���u�^�N�b.}�����"��,�$���N�����'�E��H�1����V:��e>?�x��y3�h/���4&�����s����������m&RD��YzfS�5T_g���zd��^��1�M�$�(l^T�$9��TWj��A�_�����L-U�&m�8[���5��h��O{�h�����8�"7�
j������r�t�X�B�gCaO�;��g:%�c��m������Z���/*�����]ey���r�f����	���4Np\�c�]�M�}�A�B�<����������	�U�^�U:����`>��E�aYS����S��(��/��7x��V�Y����Yo��(��4�2����m�����Y����6��{�
��D�)	X��o���]M�h,�%*��^���P|R�5������'������l��n�L�&u��M�{�V��x_J<��K���p��H#9.?��A���"�lN(�3z&���w��
lv��N'��������&�������OY���6�4�o���`�Uc����g�@$�����B�2�(�����pk"@�����Yp��j�-�%���E����T��@Q���2z�p;	F!������y���|�w�	�V�����>��/�������Q��~"�����U�W�*O��;�Z�M����i���r8+��L���#]*�3�O��THUOj�)_2	�����J1�V����z���tt�������1 %�O���`SVk��#cON��]�<���m�����e���q��7�� ���l�����y��R��LLO s�:3��H��=��)a�4_��9p��?�W���nx/���r���#~<a�t��y���{WB���H��e���1����J�:���f`��c��7���1b�����J��xQ�����BM)�/	!���*3E�
�q��-���wM�_��x�X%�N=O�s��#�@�uo����94��+O/�����/g������2���T5�0\�N�.K�����4O�/�ErV��@�K�xNw��
��I��`�+Q��}����s�1!�i����c�qF�E�Fgk�gD=����pH���8-Z����$@(}PZ����I�d�6�pr
���������i�b-��F��c����zK<��(.J8�k8b���G�s�.]��nyt5>n��p.���\��w�&�M>a$9�)0���T�����u�je��I�KL8��e[�@����0��{8q�5�	�E�~�]�a8�|�n�����QP6��/�~�����.[=��)��vmZ8V�i�3�6`���� mAt��fe�a�@BuW�7&{4���h�u-���X�ul�6�p���t#�1;���j�%���xe\�&I�~����������U�{R�lL�^H��g �~Q_(�`(�I��}l�;z�����Ap���B|�����9��d<�eYKP^=���^^UxR���W�x���?�8|�)"	���0%���Fa�p���I���~A+��cb�"���y1z�1c�s*Q����X�AHq�����:ku����b����a��3�5����|Wp���/����Z	��#���FG���R�l�|^3B�0D,O1���<W�zB�mMy����P�Y�N��O��z]~��u���)��1�L(L�-�U[j�a��+Yg���rd�8�5����K50��g�aY5��L�g�h���
']������yx�!��u�����U4���������U1�Sg�m /"������?hy��N�wV*�6�����kX5���Gc1��^2Jg�5M����j����3��c������b*|�>�X�N�i�Y����w�v���*�d���V2������ �B��F�������L���8fb�N����C�fj��B���q�������#�[�r~��(z��I9'.���W]��� �f\�M ����A��fM_���<�f;���.%P�be"Q��>\�8}���E��/@(?�C�	S�g��9����j"v�����EweC3����n����9���7v�������|c���]�2�!9k�X�`$�wB
�����&�M�{�w�����[�TV���g�_�?Kg+�P�H<���aP�s!�`���I�L}���.�X�?����^����LK ��p�A�*���u��?��:�
�;5���;Sbx�����9Q����UMv����v��xZ����V,ja�#(�	��Ce����"X����J����H�
�0��b����q�1��*�(�y/OZ�E<��OA�`hCU�qw�
��0�Z� ��\E{t�xB����,�+���S�~w�!�.�X'k:aDN����ui��Fi'�1o'�M)Z���-c��B#0��d���)g�^��a"|M�o�d2�7��n����FrO�U�ts��VtK���D41�P�D�^ijh��
����������:"Fc	�V#b
_���T:�u�e�(C��i�����g��-	fb/bo�6�������[��7���D�Bx�������H��@�q@9����#@��F�Y��������^�x0����%��R���#]{S`7p��K��1,5T��\a�s&}=�k9�l�}��,w�����
E.�K���3����U������5�P��ZR���2B�sc�*����Z��JX����u�����P�X��:�~�>$��pd���F�������KPBi4 ZvH
.���O:�K�6"�m�R��d�B�H���}}
�)0�h��_aK�9�n$Y���:����G�~�o�U��:�y~���K7���^�����v$2��./�
��t�"���)o�3d(��>����G�d2k����sYU��>vt�~$������M��6�U�'�X�%J9�K�-3�[U�t:�	K��������0�I�!���h����C�]��&����
�@���YZICzzlcr���m�fd���O��X��r:����3����dZ�![����|7e�FtYf�%l��?�:n|�G.�e@%��w*���
m���H's�����)�����T�k$��[gb���xR��G�uL�dJ"�t�y
w��]
��B,
�~���n�g�M'u�	�����2�bs�n|����6�hC���id�V��b�_�mGb����N������}f�l����De���FX����^�6|�/�dj0��%��k�Uq���T|����m�N�er�Q�%��-|�w��r���VG�+	�0@���K%�#P��D��8e���=�"�$%t��E����[��i��f�	�o��o�����A��x<W���'l���u�!�zR��H���x���o�~�+V��;fF���<&��8�9!(J����@��{q��r���C3��RW'��c�
{e��.���L�)����������+��^��H�������U��\����"T��RN'���}����cw��F���Ld��O�a��l�� �!:O%�����O���x�������;(��"��t�7u@��}� �$������J?���t��]:��JL�`x�~#�8�8�qD��&���h_��~�|��o�[AX@<����3�o�k��sG��P}��P����0���>y&HD�BS�0c_�(*�NE	�Ir������|c������3�h�7A���L���<:�S��J�_������I�	|*$�8xd�/�\DNc����j"��H��:������o��w�j^Q����%���]�Lr�B�y�������IS�����M�5��EB���K�����@������
�,;L��
�a�c�'I�3�#e3���@Zs����S�5:"e�O����P49~�'k�<�+������u����c�%��d��A��/JO�Ol�1�X�7�0�C|�6���5U�<���������7j�,Z����(�Z�?q�7<�q������C�����h4�FD�V,�Dq��hW��������r6�*��m����EA��6S�}������K���a����2��I^<yp_y_�k8��8Pr���pF��_�2�.Y@����~�P�T��/g|H0���~7z9k��*���t��\G�/[��)�,���>XB�He���9<����e�ea������Z=����.���hx���c���.�����N�dZ[�e�;O��n�.������dE�*�5
m�g�����0�`�!7�(��&���?'D+��e�ZYp�YX��Qi�M�����!Fm(�m56o� ���5�����S�Js0������m��B~��#�e�FB���W���\�	L�wrpz=� %�xQ�I@pl~�C
��(�����
i��i|*�O��:���� �������	2��>l*
H�i���*�+��J���������{��$����< �L���Y����>l��\�6��z�zU
�'�S���Z
�5������#6+y[/�,�����>�Y���&�o�,N�d�����pL�T5���_+���*�&������-h���*�vB��Rl��>���Y6]� ��+`��~���-�������`����i��{�%�b'�=�����K3��� �ws0Q\P��	m�GHytz\V��1����H�:�\����#��C�+�
"0$����,�2�c�����&e(JwH�_ih�!���w�!yE�pD��GT��PJ��l���BE����d������{	JzRaw����S�V�oNu"]�<4��p)[Y�*_���D��Cx�`��~�mz��@��+�B������_�}���n������e�z$Q��J�Q���]l#�����L#�B������_��9�P�Q8Y�^l6P��W�_���T/����yI��e���Y�I/QI�)Qou#!�2�W��OX��k������`�a�i�p���
����7z�It��a��(}%�m��[�k����}��#+���0b�I�G�	����<9(J=`�#�����P��^��
O�K��&n�k�����n�%���>���h5�����-������&G�
�����_���NP,�)q��,5 �0 ����Is�n�	63P	t���G������af� ��:���7c@ENhh�F������I�Yz.�����S���R����������V�3u��S���K�-.��b(<��{`M�U�ahW{����������)���6Cu��I���0����=��T:�$�A�&���k���X^��L�t�f:�S4�Y���������h���9x�sQ(�[5
����
��}3�P#�!�t����~[�| ���	)�b����Gl��>���V�] �a����_�����.P�y�ed��f��2M������0#\+�b������w������K�G��n�J?��X�)�������sf�����'Y��_�"=�,G�������U2�f�)*��)�v�f�Qk8>j]�����'���)�I�'<-k��P�C�m�~�a0��#�����!���q�rij��8_vTp~a������	�b������l1��u"�}\���hi*���!�Bu�}eg0�d%������C��
�1Cxs)�����"�O�O�+m��T�����C�'�+��/������/kq��@����>�%|�UN��%��	N����M'���&�R�8�s{���QX���r��Q->����/f���n\R�b��2��m^%��IS�uK
���2�[M
0�/d{�,se�����:[���DV�7WRZ��	���n�\8b����m�P�������f����n;�i+�?"���T�'����zN7=��={�1�$���<HH���z�W?�:b����d-�.��+!D�TW���F�<!�9c[���5$���
�F�?O'R��{*f�FZS������F.��8M1�$�rA���rc��	�e�������<g
�M�����^`�����v"v�3?���w�f^�5%���Q�$�KU�6�����a��V�-���u�O,T����+EVb���,�;�\0�2Vc��z�������"Kd#+�igO�6���Y�
�;?5w-��p���>�Vl���6��f��_+(���.(��	��J���%�n]���s�L�j`���,� Qa�	���o_�%8ZW0���1]F��m2�"�1��H��x"�Y��w�a��=3%O����M��M��k&��n��l��%���� ��iM����:�����b�-]u�i��������8�����F�xk`���LA��_E��������rU@!�DHy����������2!}?e�`L��t�[���N��SQ+�A"��������S���G��r���%��Z�}�j>��qk�*���y�X#m>��}���vR��{��aK`DRh��=�u�����cs]_�D�MW_����!����Hy�{�J�����x�=E��A��]���b&Y"�e�l��]��
���iYu����5I&��_l,�W���j�����#�p� }�����.t'�y42�������>Bt�nza��|�l'�3t����(.��S?�a���^i�����&~�7?��i��[��8rO��jF���H�W-����:���^/�K[�}��BuL?�:���t��z,����4T`��FY�;���)
C^��e.w�c�OET��o�+N/�F|;X[K �Oi��m���%J���I�����<�U��w>����m�9l����k�m������)��>��H��jN����@����7�3�R5[�|��e<9a�PH'����Ft�Sp�G���r��G�t��@�m���S[�p&\����������~-\����u�`�P�b�\��*����w�?��C&�	k[�}+d
A��As_N*��C�/���E&�G����Na(mn���
2����Jz��s�����&����_�=?Oz~*��5M��?��j�������?��
�2��AS"���yuj���#YJ
�1���
��� ���N�_��u������*�	%�H����=A���Q�\�O�J|j���]��l�������#��s���[3/L�}�����v�
���^g���3��(gQ���ECm{��T��.�}�9p����G�|��^|��'��g+�i!_�����E��,*�QsT^��S�����Q����M;:\q<����b����t�_r�� �f��6����ru������P���y���L��+�?<�t�:5�����`?C��p���?}����Y]����$�e�Jtp�3�yA���������D;��r�As��R.��NV�����_7��������j�f>|���X��,\3���;�=�XJ�j���;^�_���Y�)���`m��s'�YQY�M�F�Mz7��~N��e������O�/O�O����;�Z���IZ��J�~}�M��c�����8�R��:h�e9�t���U,�L�$�8�S>f����'�)_a'���g7�@���������$D�w���G�E?���L�����$��L6����2�����}1��9����f��`~�z�)X'��6������G�0/Q�6*��x�u�v�����4p4T>�qz>�d����8W�?��y��?$s\�{�o�����_�,��@ujLQ��x~�X�u�n�t�����o���tk� �Yj �@c0��������F�on�����^�/G�8��.�*�����6�X�6S��}����>�������s�&.�����g���-F%��RR`���M���3o�b����SQ����y����{9��CV����3���������95&5��
�fc����#��m�`�}����3��W,������fv����b�����2n�2f��������,�f�^��>^�zh�Z�\�T�1?���� �w����MxK����o�f�����x
|q�{���Og4�g�	�ld��%�kj9'H��I�g�-o��Mk�3�E���)i��7�k_��? x����bu�[N���./}����$�[m�s��r���_���R2?��*cv������Qwd�s���twNIq�Q���m&����P��h�E�svQd�Pm����6���f��������nst����������\=����
W\�1L�go�]�ha�g����%Dd�+u����9�����u�u��������~���XZ��)�/�#n\(�{��O "�5���3���?�������E���A�5g�Sm���!ys�Y�����~�@��B"��\a���!����U1���|��1{��*�1P�2R�������X:���pa�#;$_��=�Q��M��ln�����O���������xxx�*V*y�94gY�,�;���v��#92e'���+"��wT{-���]b��J���e�o��-uL����y�~,5���*`#4���T.SeS�,�=}G�&�dM���f���hq}�5�~%�Q=v�����o$Bj������!3�-NV��E�.��,T4�a��}!�c2����t�*jC��`A����#����A	����!��qz�!,�I����������K'�������*(C;5����!���HB[9`p	���q���_��l8k�h�!�-JRw��J�y.A���1	��Ikr�����E{'�*4�����Z#�*&e,?G���Dcx)f���G��:���l�P"����_l
�ei����j'�P[����R�t1��sdY7�1�/��F���� b
����TH�Q�U�6g���[���P�+�Q
�C�(�0�,����Q�5��]����$
�xBAf��!�D���1�o��H"�.L����4����gm����Z��mR57s0�)N8��s^8���x�@l�]������;t�4���F���U4�A��b�ea/j`%9��k��#	�@z���b�����k`���%Y�K��~L�e�����lGCR��vO}��&&y%����cZ
)��nT��M�P�d�1H�1�� Z
�*#�Cy&j>k1�'6��;y��$�7\c��{��'���
��{G.�0��}B��N:*7��i��a2��,�y{�xo"�>Y6����&g���5>������j��=~!a�@���������y�G����~bjo����>�����<���������why)p�u���}`��8r������om��u��<hh��4�z^TM�J�}����>�J�>�4�Bj���	'-�TC�M@12^E���@��\V�u������
������~u�;ae�F�E{@k���V�������2��~�z2��q9V�{��>YS�������K�5�����b���	�I��h�I��������t�G�8����&tuu|�.�����q��I�%;�Tp��a�?�(�k��9sP���7T|kh_9�j�Z��#vzPF�e�\�w
VL�Wv�_����1��y%v�.Y��V&k���{��2Dz%����'�K������A�lI���oq�����U���_]:�	�
^_��DpY��h]h��w�Z(�Q\��	�e�@�S�S��5o�\g�����:��������,[-nVS��e �)u�k5a;�4"��k|���\��||��Xj1<)���5����j�=S�`N2TZ�nz?l�~@�[�7E��=),8&�@�:p�7!��$M�Y�u���;�c����j&��HT��hW���m�0����a�V���G��W�VCW�����+�UOo{�/��&������=`�����g��xn>��!�N��s�Y�eq��5�&���

V���F��*�mGN�c�d�|X���I[2Z{�B�{�n{�6C���W+i�|�M���W/����A`
m�l��$s)�Mf�yfd��ee�Yu��������)����b�Hz���b��m����)u���_�p>t=�xu��R���K���rK�j��s
qb�s�F�BK
5��-Nu%N�dDR�e\XA����L�+yei;�eH���#&L�W�22�N����J�;�SW/p�6�
�|��v/Gb���x�3�v�w�����W<����c�����5<���r&�`(���2�*5�Sa{����~��=��������{,K}�z��:.�s�{D|u����.c��m}Tnd��������p��V�9��}V�Z����|�����i�
�,9�]0�>u,�Z���7�����E5'�]g*w��_~�����C��L)��m���������,�
Zv��HZ����������a'��k��b������n�W�p�]��)�t��6�W�M�,-'��&�9��fWR���
}X�d��O�����A�%�D4���^���Q������C��D�i��_$�U8��<�
8�<�S� P-'��J]���j.k���o�g"�J�N
	������1Q2X�������Q��
�&���4���M�]@"d%if����x8��~����x������x	~���lG��B��$��#�%���N�?x	��@C�gCYF�����?�
���&���5@YFH%���M4W��
�-�H������F���jC6��X�U�� ��B>��n'�H�=.��
�e�A����|X��������HDN��U�[�x�Q~X�",���/F)��!�1:�����	(��A�����,B�&����(�t��+�9�=uy��D�0��w��x�8Y� l�!���Q0��.G�CY�I��_���
�J�b`���R/�C����`��^��C�/�p�/�3�uq�/��
[l��0���B�Rgr_��b"7m����E�R���rK5�����A���]�����\dE�������j�@Y1����xPp����������t��E�#PP����,�������
��c|=.��O�,A�a�T��4���=)@�����-�p(4q�d����.X�!��jMu���|8Y{�M�����Ru.]�Ab�[|B�����Cv��Gr��&�#QH'y������SHr���b��#l��2�P18��H��``C�^E�$��2�j�������R����G��fWmr@�sz��I����1��G���&qwS$k�hB�j!6H���5�{�����y�"d=8���N�#Z�q7�2+#q���W��?0LK�U~��X��L����6�[<>V����Zi��fe��9z7��-~5WUp|Z��Y{�\���JIh���p���a��P�3US�t�\a�����u��l�"/��PR����MyPlj��m�R��x���"��d����d)�����e�a�{
�?JP��gQ9mJ�m�)�w�o�I�|�h����a$�Cp7���JH<[�����!Pz�X�,)_���F��:���rf�ze���5�x�lP@����)��e��;5	x�I�)�"t�K���.�����tw��nr +1�N�w�Xn��{�l����m��r;WO$�3RE.v?�y6"�����(O��;p�'��?��[ ,$@���D���Ye�����"���W�Q�7(��U�~�
��&>�.����^V�������(��zG�q���<���[`~l�]�v5�	7�U-������Z�������R���y�+��cg��L��6S%+�
>��P���T��@���p[�!Ip�z�9��S��c���p!�}��Qj�E�~��R��GR�������d]�ASF�F�:�]=t���`����3h�JBF�w�%�U�Z	�u�N>���!%��d8K!u��AblKei��<l�*S2xI����R+m�
���{��������K��P�n��������:������H��u�������cc3U<����=�p����[�!��;nH�F��^����:l��vH��)B�IN���HgNaa�")l�3fW���_�[�q�����u���������x����
+��F�����������>WU~,�v��u���f�~���N!�)����z��
�W�sE����U���D�	��^�-
��������@����8c�K�=��:E@%�I�L�5�*^$�aD�a�@#��?��F!+�9�+�
��a�h*.�dP���[2�j�|��Y�Y,�xT�j/�@#Q�
��Brr��N�T�mZ93j����H14q�����������������i����+���,���N��,��(+�B����I�A�7���#[`�c�������aS�vt+���r	�d,���g������1�J^�[���t"��lG�H���w����L���&���k���"�Yd!�����]��@ms���n�7��6�Bf�qT���E?����J�}B�\~cD�e�iu?�	������\���	_v(�"	@�����6Sp�*U��qL�)��u���
N�eD�'"�y�����RE�����v�;���5�����zsG�`��\p
�_��Hs���?�[��9U<|c��E������)I����$�q�k��
z��Yx.��������qPqI{�w�F�L�Z
E�yaO�{�6V^[�K8������X�w��E���=��{��$�M��]�.�{�-� a�VcmJ�R�����}�y��?��1�K(�0(%��G !��
��7-0�X(�-�Z�����d&n�k��0U�\l'&h��$��7�U�o�;�����������63*e���/3aX�A�eV
^����P��G��D�V�)@2��6Lyz��,l�_�3��*��"��O�R�j��~����$;y�qON����$\A��~��`����U���_	�LA+U�T��N���!��*�l��k�o��\���]�a���'��2%1-��$��W��NB���x���/|C_��������E���2w�ZE�S����%��3
@��I�����q*��8�$7���A�jw��nwD*��Q�b�	���DFa���f�����R��C�bRS�����:H~�P!1�$�����,�����{o����)5�����w�������+3X�z	�c,Q�@7����
x��
lRq��0��sR������[���|��t�0����y^���'�R�r���$������K��I�x`�G���AL�w��� ��������;S�
�H~I�o�Z�k�L5��[2HD�t�`���Z����KV���lU�8�p�N�h��`���XN=*�:���)v��d��uF<���6y�^��� 9������H:���P�y���e
���=�&�jZ���Z`w�<��f�90a����\E�8U�8�<��u�6�^����8�������c��%����J MSRw
���#�I�b�.�x��fi�&�d��'|�\��o�_���I��y$�=k���8�}%�W >hJ}��;���Y]�v�����A~^��]���Y��q�T��B�~2��[����������K�Et������4aIHl7���"�74��g���N���y7����)9Ew'��:�c�^2j����2@�V�i�b�+/�(
u���0S%�5���(�b7��ku�:��������(��|h�\P@I�&/�i���5�v�P��{�{yt����h��j�A�l���"]�g���z�x����dl�dS��q��|�5�k�kejdh�@kg`4?f�O�p�h8��O
jG>�l����
Jw�����C���8{v�R�i#�.�L{(��W�2�>���I�g+ms�
��b^���a6e��%�vR 
��Dm�K�
�\a���8�@]$n�����&��6�����IN��>O�gIN
�n�b{p~�/$��p��?v��T?S��Gw����N����%U���zL���f��s
�
�D�s}7����
���s��V.���w�����:'a��|�����k��bU_��{0���������U��%dmedj�h���Q{F:kGKC+G�������������� ��o���-��R_
��KYX�:����s(�#���oha�����N��������y���?��w
����������	�g ;kc;C�������������=������������G����;�3���H���u>�'a�F4��^%�O0*(��cz��EuV
�;�c Z�����K\�)���8m�a�>�?X�]�G��Pf��N�3�Ek�:�LW��������7xF�����U����d�������j�x��k�E�Q�JJy�m��\mKQ�����7	��=im?h-����J�
��T�����dY�q��6�R��|�@9'y:��]":Yb��5��
��4�����(��e��9_��D�o�)Xe4�����`cw�7�������K�9`<�fo)c�!��@�p���	R�������5����	O������_4t�5�3���\����4��Q�i����VZ�$y�7E�,ES[��,���BJ������]3�!e+h#��4�2,��y{?������F��c)����s�RQ
�����AN�qm���&si/�����a3X�s�{1�����u�B��L^]�e	I�n�K���	��n����07����\O�����pH8�m�-*�J����i���z��P��l�<i���t��/j�������<������2��[��9v�[cR]n����l�zY��������5�#A+����Sk;6�r1!]��P�k������U9d�J���,�s�npXFG�A:����M[�E���&��W��^��$4��e?�����d��v�{b�(�"��u�uoAii7s�)l���O�����Hh�����P���������c�W��:u����V�	�&��]Mc��A�������&����x��40!IA�L��aW:#�;�K����7���%���0JQ�Gj=�s��Y&!<����m�P<����8�F0WQ���	���4
�m����2��j���Eo	�V�s`��Yp�o��y03�q��v�!&&,`>��9tf�����}_�m������`u8�s�
����E��L����l:���`E��o��_�D�/��4�hb}H�n�Y�%���2��;���?CF#]���l���rO���@d7`��o+���*��v�$	��~B�k��h��y�DUhZz��a[g�k87��w����� �`�713���J{~&�Cv+G8�.�~�f����3g
�w4�Z�:Gz��#�.�E�������o�c�#�q���wg�������=hZs�Vc���������\p�.b���j'������0��,�M����!)e&&��;�������L��p�^(��@/�>d�Y�~Xz�;���-7u_������#��G��`F�v,I	�������1"Bp
�Z����g�q���2���5�����[Hx�V�F��c��^)	�6�!�L���b����>�a6��{��+��H�!��c�>#��	=$�����'[i�vBt�)����R�IS��H��n?~�gN����"h�\�lu�S��$_kE��]��3~S�X^�������l�:u�d���5��w����d��UT@,��B)l$���!2���?S�������&�J�z�����[_��ok��g\X)����ii�,B[t2x�*���2`���H4R��RU��pN�|��7���;=1i��[Y����<s��Z7��b��X��[��	�G�]��v�r�?
��x%�L�kUH��J{�pS�w�{xp�w{��;�����'\�X�T�7��r0�Zv�w������Q��`�X�=s��	����2F�����@��`r��`&��W��e,�wS+���i�jm����}9r����iT;��,'�s�Pt�������vo��s����,O����W������N�?qT�M��9����b��t�@�x^/�mm�I���_���L!����	8H{NJ1#��A)�f���T_�>&�X�:5��L�����SRv���\J����%�tn_���F������"=dSJ�)���#�F\��u'���K�|���5<�tK^�|z0��sX*^����8��&\���K||�=��b���{�E�
�u&r�B��KiN���iBsuM�FJ/.�n��ae����;.2��6<�z��p����)��x�@�����K�5[z��+��;�:�U
��Z��d3��6�������Z����k�9*G,������A5O��*��q��$j����x"������w����u�Eula}��=������/
***h�`G�*HYd(�h�D��^klD}�hD�O�%>��'j��
A�Fl�����X`�/��H�;3��93s�s�c6��M���������N�ip��g��-i}n������}�}����9M�h�����L�+��+-����	)�L����?xg7<)H_��������|���sm�]��r�O�r����c�����o��N�\�
�5U���W
Xe�����F�'���#*~��}������?�����#:��N����r���I�4;}��:���X����u���R&��"���*HX�|��}�>�2����d����-��������>�9����C�i�9��S��7o8��yF\�T
��������������/o�v�����w�����M�����������z�$I��Jk��
�:���[����?
�I�8����[���P���S=�6� ��l���A�����~�����8_q���ND�X��c�e>��w~����7�Lw�5`�����P��{�F�����}&V���|\+p�'�f�b�"T��/E����j����e������4�c����b��e��f��=S�W.fY��]�^vaC��;2���Lsn5+g�_D��u&���>���kF�����ej�U��K����"W�O��u����'_����|Y��Q�G�����}u���[;'��>�N�����nI���y����z�vZ��U<��o�6�k~�y`���sV�5'dl	�Ri]B�1�7�����?����0����������b��/�����t��t���\b�?�yp6�������d'���(���^������S���w�F��qw����8z,�f���]��?�}�x��>_\����{�U��������B���������|�qe����[�
C_�pm��|��B�6�+�\�?7���s�u�3��'gK7|���p���,����~�Z�}9�N�S]/5;�zgX��F+�;��6<����������w�kx:���)-G���^�a�]}���pd���]R�-5<}�������Y���Y|���72�8E|�����;�f6W�z�&���c�����G������������������=���\�j��Nj�`Rg�#]��u�����9�~�$?���A'{@��a��gY�N����R�;:~2u�����I��L�w���������=���disn�[��)����k���HLf����N����7��`_h�����>��a���Z)�v.���S�{g����i�|L��O������u>��g~�#.�M���w��[s������{�d����q�6Nv��T��R�|�����;�rX,��a�����#�����=I��6����9y�3"/�
�Y�s���C���^|����=��f���������c����[�a������Z���,E�:�~��W��I�j� ?j���
�M���������}6o���5r�?x��D�O_���W���F���9�=O��{�v�l��M�� ,��_M���H���z_�V��.���S��y�)�����4�WU?������F�����9:m�{�;��cNz�������B����o�>�[�r�c�=��{�*M��J��O�t#y��=Kn���+����={�|A������.t���E�M!u7�u��(�-��g��5#R����2������&��Sn�J���:5�]�~�]��!Y����eQzM��������\�����zdUv����I�<��k�3����pj��F��7���~pI���iU�u����WG|_�R���*a���9�e��������C��
N�������[?���O�������9�a-���G.�j5r�[_<�xo��Srl�n�jK��IYO��w	'
n������	+g�6�|r���!m��]�q/����������������\!��z��[�O�����U����i�����2��M��d~��G���E��S�������8U���g��p6�}�aD����6���Q>1"��Y���vn�����'�jH\�;e����O�)[�������g����W�p�p^��O���r�~���?�~�f��C6N�r�Y����>?���t�BV�]17�
�������F�y�V���_����<��z���������u�~��o�o��������j�����"G�j�a@�����K�cz������4��s�����$zex�<��s�|�����.K�������������{��f-�&�kR�������V_��VQ�����b���zu[����N)C������C�m�p�$����q�y���i�2>L���u��_.P�L���Zy��{[��T/8������;����}�B��_��!o�g�hj3����w�����?��;��Tm��vN����t���	��I��zx�`>+����1-�����J|�������\��0jA�[����&N�2���m��=�~R����~���~����m����[�K��}~l,��{��r��}�#����o��_F_	��e�����f���{o?�\)��/�>�5����������/�s
��uFX�4�`��M���#a��W�
�
�XJ�����y�Z��jlV���J�����xi�"�?JiE� ���5�Z�;����eM�0|u���0����M����nm��[�2u�;�(�4`lY���l�����1o)`�g�����-�'mk;���c��.���e�����VZ%������VZ1����?�8���Z�g��"|������FM=�E�����7����.��f-�Xi���\f����Yc��Z��[�`����pe�U7�e�x��Y����H����Y3���S��H�
���.f
��Y�^���Se���7��B������(�(UX����$m��AQ��|GG�~��(���P<b�h�����@P��6�ko"��*2���J���a���Q~�n��AD�^~A��	�����3�+:.ZK�j���E�)J$��%D$��i|E����F�����M
�%�`+�rb�I*E�P'V�N���3���A�U'��$4K
�\��a�r��>c���>e���S��M�k����c��U�ZU�J�}+R���"��n���;>)N�������uR�������s{B��J���P]!��NU�(u�7�U'x��%htT���}h>���
�P�DuR��(��GAsYq<��#d%�H�9�d�sK�,Y���HI�W�i^�9+��I�&��L���4F�b�y��4Q�	 Sub�6���B�`�'�"#�H��m�9N��0#��"N�DF,�9K�([�h�������`��L��RH���Q�DEL�R,C
�+9I�$�2[�� ��IF����$�����L!jJ�����X[���U�����a�(��::��0c`�fx�cIV�0�8����%X\H���0�(�+,���$�W��eX&�[���[Q��BK�3Z���	K��U�}Uc��a��:����(��H��u��IAa��8���}��}
O#����XI�2-��`^�����0�:�����@��$9O���)X1��67�C�,��$�S L��&?��������$�2���D�b��B�M�`a�{�RT*��m��:%E���Px$����I��|���O����8������b�6�&�e9I��x9�N�.P���<z>�e2�1����<]pI
�+�|hg�X2\%���j�:r_� ��E�I���9�V�������f�@���H�X"����"�3�K'��U�1���y��)���)V
�7�J�|9�3�.Mc&�K)� ����P��7[�` BF�*sK������B���P�2��^>R�\���K<,�x	�K�n����j2���&)����T=�K��>���$���`��d)kS����vc0A���� �M�B,)
\�	G&�i���Fa��Yn]����
���}Y���P�R�T�5��l�����E3�[NP���n�e,}����M#��E��G�^��`�D����OVE$a��7%
f)�[wF�g���*'R��-?`��?��@���?��`�r��^=�ba�����2z�,}|���1��"Y���_�K� q�������
y��F���F�W��W���1t�X�V���}/�`����
�-M	 5I�E=�E{�Jv��������V��������
��J��PC_��V�>m[J"��B��"��$��O�a,�yF��O�$EF0%���D	��+���$��d�Kd��A�m�y[L�$
���d�������S@�4g�f����_�Ly
�,�����"2s��*�E����Z��.f^��)���u��q�����%��k�`���51^��(Q�f��
0�Y0&�%E:xU��RQ$����L�����q���e����M�rX�,lg{��*vD ��b=�����I� k��+��l�����5���G�&�.�K9��A%����#�k�M�W��$q��"i�X�����>�OX���"d�	X@�e��y�R����1��$�I`�Q���Z�R R
�Cy�f�����_�$�;�
��D����R��H����@G��Z��#��p�W_|m���.bu[
���z�y1$��8���$G�����,���&�Rz
��,LbA���K���IR�H/U�X,����@:�����$���A}�o��%��_@�"�M]G��oB&��rs���������B�9����������	0��VJ/$����5t�Z�V�2�������K��������@9�p����'i�~XI$�R@_��/���V]�'������i;��+��p���	������n��<���������+p\���N�������?�<_J������G|(*��?����M��?M;��/�J\���l$t�HK�����hnn��s8�FRWQ�E����yW� ��>��D��|I�1���9j�����������nG�$%$�	�P���JX_oZ�'��G~��"~���7����gZd9RD��(qH y����BS�N0��8Ii������o#+f#k������A��SE���!����	?�yE38����/#V�J��904m�3�q�=Fp�!(,m�4��cu�[bE�����L��7��B�����p�b��I�QEs
b8�F�5� ��(��q{��^*i�aKt��7�v	0
?901�.	���-�_�@���@�/����:���_���}P�~�{�CI�M
��@��Q�)��M����[�C��76�8�d&KZ	wSE�(|8G�%X]������B��kB#�q8D��Z��%RS	��:�^ouV�%��E�k5o����W�g%$	�{�6*,�1��q,�k}I	�����kYK�byN��)%���H�?3aC�����#ENZse�G��07
����sq/�Q�����p��K�d�z�Nwy?S��%��>��{���������p������_� ��m�������Y���g��5	*���F����}���(U�E{��hO�V���ID��:}����pP�$�����������/��ZZ�$�������U��S_,_�/w����_��)�sL��E�!Y�������nugqA'H��^g��h@��s`b��C��6��Y4�qJ���k�����Rbae$9�:��k0c8w���������������y��;��4#�����������ws��E�M�s��W����;PR�?������
���/��ZZ����_r�o��%���'�w�E����)T���R�d+�++���M{�a�Il�`��h��.��KZ���L���o'�gXK����B�_�D"`���E�psSIRa��B���q"��d�<GY��L&��pVf{"^�7b�����8����������(�(J�8q�@�%/�<�h�eeM4�����k+���Z=�"Z���K�� ��E��"�h���r� �v�����<F2�5/����-�$z��dq2��Zc��<�-3�b��Q<���rR�&�ZV	�V�eA��tf)V���N�1[�p7(%����@_iFH�E���0�ew���s���iy��f�=���u�H$���
��<��t�L0�.�\�A�Cj'M3\i��Enflh�J��?26l�.mN��<L$�cg�t���f/k��`hF�3Mq���R	��g��������<B�$��,��
�@��i"#r��&���4�l�B7 
_Q����/*YA��q����@S�I�o�9!{�m��'��-���pB����7N�K&2Y�5�Ly}�	���BI'V3�QYe�EVf06�(���T��pZ�bp����$�-�m���6�����NCg�0�(P_"�K�b�/$2�X�Y�R�'$���TJJf��yC2��=��<��}�GS���5z��]��cG�|])"v0o9�_�:����Jg�]*$�G����R�i!�.�N�4	Z��k.Z�B:��)W.K'�Y�`@��l��H������������u������q-9���Hep���$�D��n�h5Q�����;���������:��2�:7Y0����_��?�
_]��B-Bu�.����C�)�<� �����L/���?�_N1�_��?�4�����+��q��,�0��@���9��<W���u����/�G>�#1$�����<�+���?���r��q����W��n�V��

#48

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Rafia Sabih (#47)

Re: WIP: [[Parallel] Shared] Hash

On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
few regressions,
Q21: 52 secs on HEAD and 400 secs with this patch

Thanks Rafia. Robert just pointed out off-list that there is a bogus
0 row estimate in here:

-> Parallel Hash Semi Join (cost=1006599.34..1719227.30 rows=0
width=24) (actual time=38716.488..100933.250 rows=7315896 loops=5)

Will investigate, thanks.

Q8: 8 secs on HEAD to 14 secs with patch

Also looking into this one.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#48)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Mar 14, 2017 at 8:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
few regressions,
Q21: 52 secs on HEAD and 400 secs with this patch

Thanks Rafia. Robert just pointed out off-list that there is a bogus
0 row estimate in here:

-> Parallel Hash Semi Join (cost=1006599.34..1719227.30 rows=0
width=24) (actual time=38716.488..100933.250 rows=7315896 loops=5)

Will investigate, thanks.

There are two problems here.

1. There is a pre-existing cardinality estimate problem for
semi-joins with <> filters. The big Q21 regression reported by Rafia
is caused by that phenomenon, probably exacerbated by another bug that
allowed 0 cardinality estimates to percolate inside the planner.
Estimates have been clamped at or above 1.0 since her report by commit
1ea60ad6.

I started a new thread to discuss that because it's unrelated to this
patch, except insofar as it confuses the planner about Q21 (with or
without parallelism). Using one possible selectivity tweak suggested
by Tom Lane, I was able to measure significant speedups on otherwise
unpatched master:

/messages/by-id/CAEepm=11BiYUkgXZNzMtYhXh4S3a9DwUP8O+F2_ZPeGzzJFPbw@mail.gmail.com

2. If you compare master tweaked as above against the latest version
of my patch series with the tweak, then the patched version always
runs faster with 4 or more workers, but with only 1 or 2 workers Q21
is a bit slower... but not always. I realised that there was a
bi-modal distribution of execution times. It looks like my 'early
exit' protocol, designed to make tuple-queue deadlock impossible, is
often causing us to lose a worker. I am working on that now.

I have code changes for Peter G's and Andres's feedback queued up and
will send a v8 series shortly, hopefully with a fix for problem 2
above.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#49)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

Hi,

Here is a new version of the patch series addressing complaints from
Rafia, Peter, Andres and Robert -- see below.

First, two changes not already covered in this thread:

1. Today Robert asked me a question off-list that I hadn't previously
considered: since I am sharing tuples between backends, don't I have
the same kind of transient record remapping problems that tqueue.c has
to deal with? The answer must be yes, and in fact it's a trickier
version because there are N 'senders' and N 'receivers' communicating
via the shared hash table. So I decided to avoid the problem by not
planning shared hash tables if the tuples could include transient
RECORD types: see tlist_references_transient_type() in
0007-hj-shared-single-batch-v8.patch. Perhaps in the future we can
find a way for parallel query to keep local types in sync, so this
restriction could be lifted. (I've tested this with a specially
modified build, because I couldn't figure out how to actually get any
transient types to be considered in a parallel query, but if someone
has a suggestion for a good query for that I'd love to put one into
the regression test.)

2. Earlier versions included support for Shared Hash (= one worker
builds, other workers wait, but we get to use work_mem * P memory) and
Parallel Shared Hash (= all workers build). Shared Hash is by now
quite hard to reach, since so many hash join inner plans are now
parallelisable. I decided to remove support for it from the latest
patch series: I think it adds cognitive load and patch lines for
little or no gain. With time running out, I thought that it would be
better to rip it out for now to try to simplify things and avoid some
difficult questions about how to cost that mode. It could be added
with a separate patch after some more study if it really does make
some sense.

On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
few regressions,
Q21: 52 secs on HEAD and 400 secs with this patch

As already mentioned there is a planner bug which we can fix
separately from this patch series. Until that is resolved, please see
that other thread[1]/messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com for the extra tweak require for sane results when
testing Q21.

Even with that tweak, there was a slight regression with fewer than 3
workers at 1GB for Q21. That turned out to be because the patched
version was not always using as many workers as unpatched. To fix
that, I had to rethink the deadlock avoidance system to make it a bit
less conservative about giving up workers: see
src/backend/utils/misc/leader_gate.c in
0007-hj-shared-single-batch-v8.patch.

Here are some speed-up numbers comparing master to patched that I
recorded on TPCH scale 10 with work_mem = 1GB. These are the queries
whose plans change with the patch. Both master and v8 were patched
with fix-neqseljoin-for-semi-joins.patch.

query | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------
Q3 | 0.94x | 1.06x | 1.25x | 1.46x | 1.64x | 1.87x | 1.99x | 1.67x | 1.67x
Q5 | 1.17x | 1.03x | 1.23x | 1.27x | 1.44x | 0.56x | 0.95x | 0.94x | 1.16x
Q7 | 1.13x | 1.04x | 1.31x | 1.06x | 1.15x | 1.28x | 1.31x | 1.35x | 1.13x
Q8 | 0.99x | 1.13x | 1.23x | 1.22x | 1.36x | 0.42x | 0.82x | 0.78x | 0.81x
Q9 | 1.16x | 0.95x | 1.92x | 1.68x | 1.90x | 1.89x | 2.02x | 2.05x | 1.81x
Q10 | 1.01x | 1.03x | 1.08x | 1.10x | 1.16x | 1.17x | 1.09x | 1.01x | 1.07x
Q12 | 1.03x | 1.19x | 1.42x | 0.75x | 0.74x | 1.00x | 0.99x | 1.00x | 1.01x
Q13 | 1.10x | 1.66x | 1.99x | 1.00x | 1.12x | 1.00x | 1.12x | 1.01x | 1.13x
Q14 | 0.97x | 1.13x | 1.22x | 1.45x | 1.43x | 1.55x | 1.55x | 1.50x | 1.45x
Q16 | 1.02x | 1.13x | 1.07x | 1.09x | 1.10x | 1.10x | 1.13x | 1.10x | 1.11x
Q18 | 1.05x | 1.43x | 1.33x | 1.21x | 1.07x | 1.57x | 1.76x | 1.09x | 1.09x
Q21 | 0.99x | 1.01x | 1.07x | 1.18x | 1.28x | 1.37x | 1.63x | 1.26x | 1.60x

These tests are a bit short and noisy and clearly there are some
strange dips in there that need some investigation but the trend is
positive.

Here are some numbers from some simple test joins, so that you can see
the raw speedup of large hash joins without all the other things going
on in those TPCH plans. I executed 1-join, 2-join and 3-join queries
like this:

CREATE TABLE simple AS
SELECT generate_series(1, 10000000) AS id,
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
ANALYZE simple;

SELECT COUNT(*)
FROM simple r
JOIN simple s USING (id);

SELECT COUNT(*)
FROM simple r
JOIN simple s USING (id)
JOIN simple t USING (id);

SELECT COUNT(*)
FROM simple r
JOIN simple s USING (id)
JOIN simple t USING (id)
JOIN simple u USING (id);

Unpatched master can make probing go faster by adding workers, but not
building, so in these self-joins the ability to scale with more CPUs
is limited (here w = 1 shows the speedup compared to the base case of
w = 0):

joins | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5
-------+-------------+-------+-------+-------+-------+-------
1 | 10746.395ms | 1.46x | 1.66x | 1.63x | 1.49x | 1.36x
2 | 20057.117ms | 1.41x | 1.58x | 1.59x | 1.43x | 1.28x
3 | 30108.872ms | 1.29x | 1.39x | 1.36x | 1.35x | 1.03x

With the patch, scalability is better because extra CPUs can be used
for both of these phases (though probably limited here by my 4 core
machine):

joins | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5
-------+-------------+-------+-------+-------+-------+-------
1 | 10820.613ms | 1.86x | 2.62x | 2.99x | 3.04x | 2.90x
2 | 20348.011ms | 1.83x | 2.54x | 2.71x | 3.06x | 3.17x
3 | 30074.413ms | 1.82x | 2.49x | 2.79x | 3.08x | 3.27x

On Thu, Feb 16, 2017 at 3:36 PM, Andres Freund <andres@anarazel.de> wrote:

I think the synchronization protocol with the various phases needs to be
documented somewhere. Probably in nodeHashjoin.c's header.

I will supply that shortly.

Don't we also need to somehow account for the more expensive hash-probes
in the HASHPATH_TABLE_SHARED_* cases? Seems quite possible that we'll
otherwise tend to use shared tables for small hashed tables that are
looked up very frequently, even though a private one will likely be
faster.

In this version I have two GUCs:

cpu_shared_tuple_cost to account for the extra cost of building a
shared hash table.

cpu_synchronization_cost to account for the cost of waiting for a
barrier between building and probing, probing and unmatched-scan if
outer, and so on for future batches.

I'm not yet sure what their default settings should be, but these
provide the mechanism to discourage the case you're talking about.

On Wed, Mar 8, 2017 at 12:58 PM, Andres Freund <andres@anarazel.de> wrote:

+static void *dense_alloc(HashJoinTable hashtable, Size size,
+                                                bool respect_work_mem);
I still dislike this, but maybe Robert's point of:
...
Is enough.

I this version I changed the name to load_(private|shared)_tuple, and
made it return NULL to indicate that work_mem would be exceeded. The
caller needs to handle that by trying to shrink the hash table. Is
this better?

On Fri, Mar 10, 2017 at 3:02 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Mar 9, 2017 at 4:29 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Yeah, this seems to fall out of the requirement to manage a growable
number of partition files in a fixed space. I wonder how this could
go wrong. One way would be for a crash-restart to happen (which
leaves all temporary files in place by design, though it could clean
them up like a normal restart if it wanted to), followed by a very
long running cluster eventually generating the same (pid, set number)
pair. I think I see a simple way to defend against that, which I'll
write about in the PHJ thread.

I am not expressing any real opinion about the idea of relying on or
suppressing ENOENT-on-unlink() just yet. What's clear is that that's
unorthodox. I seldom have any practical reason to make a distinction
between unorthodox and unacceptable. It's usually easier to just not
do the unorthodox thing. Maybe this is one of the rare exceptions.

In 0008-hj-shared-buf-file-v8.patch, the problem I mentioned above is
addressed; see make_tagged_segment().

Thanks. I will respond with code and comments over on the PHJ thread.
Aside from the broken extendBufFile behaviour you mentioned, I'll look
into the general modularity complaints I'm hearing about fd.c and
buffile.c interaction.

buffile.c should stop pretending to care about anything other than
temp files, IMV. 100% of all clients that want temporary files go
through buffile.c. 100% of all clients that want non-temp files (files
which are not marked FD_TEMPORARY) access fd.c directly, rather than
going through buffile.c.

I still need BufFile because I want buffering.

There are 3 separate characteristics enabled by flags with 'temporary'
in their name. I think we should consider separating the concerns by
splitting and renaming them:

1. Segmented BufFile behaviour. I propose renaming BufFile's isTemp
member to isSegmented, because that is what it really does. I want
that feature independently without getting confused about lifetimes.
Tested with small MAX_PHYSICAL_FILESIZE as you suggested.

2. The temp_file_limit system. Currently this applies to fd.c files
opened with FD_TEMPORARY. You're right that we shouldn't be able to
escape that sanity check on disk space just because we want to manage
disk file ownership differently. I propose that we create a new flag
FD_TEMP_FILE_LIMIT that can be set independently of the flags
controlling disk file lifetime. When working with SharedBufFileSet,
the limit applies to each backend in respect of files it created,
while it has them open. This seems a lot simpler than any
shared-temp-file-limit type scheme and is vaguely similar to the way
work_mem applies in each backend for parallel query.

3. Delete-on-close/delete-at-end-of-xact. I don't want to use that
facility so I propose disconnecting it from the above. We c{ould
rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to
FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT.

As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts?

[1]: /messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-shared-hash-v8.tgzapplication/x-gzip; name=parallel-shared-hash-v8.tgzDownload

���X�}iWY��|�_��~cZ�e��������Y����dYR�3S�t�����n�h�vO���T��7�.q#��:�c<������a��O�j�v���4����G�;��������5���^��hv�^����6������(��'��P&�i-h�F��e�����Y�������jq0�i�&�$�j�`��i8��V������`u������ir�������h��7���5����}�_En��[N��(������.{���n>���77�F�p���w�|znz�a��A�niO/I��.�n�k�SO:��>����|����y��u������Q���<xl���1��q�S���K��qP��I0����wU��yx!4�� �1n��G�b
����d�x;t�?�����qB��6�~��sj�i����0N8+� ���T���SH&p�`P��,���������v��:��y��M��inZ;�/���c����IT�&8�7��oz'o�_��w}����;���I��7
F
�T�"��B�/,�(�q��)�)���!�����Gp��@�D������?��z�(�O^���~�V=������`[���K|����;/��}1
�a3��3�
V`�����m�S\z�V�p�nv��_�t�|s�;\}�[x�e-`���g/�����Fg�<�Edm�V�-�f�RY�M���z�f�]m6�
��x����8
O�56���S\�����m�#o�a
x�^L�sxC�x?<�
��E3<�c�����Mqu1���f�����Y���������y�Ep�<�����(N	-��_���]x���������.��Fe�U
G��VE#��/@gn����j�yP/������6t��x�s����=��S.��^t�?M�x>���!��p�8A�����YR7-i�����4�]@��VU{4i��c�rsF�hw���v�U=D�P[x�N`����������v��,��8%@~�h�}���|<Np\�;��?��E�vh3:q��e[DxS)�����
�w]�Y�,� ^)����cp������z�;�(IB���z<��{�!P�d�) ;U �[@�}��F���j�`cA���6m6���W�kV[z�N��(	��]�*l
�9�jJ��)d�v�;�5`8��B3n����k��A���H��Ku~���)s���f~	(<��t�+����,�j��u���$�5���h:��=��?}�C��S#� N�����6���y
��Y�1��@���N
w3v�f|FU����_n��;?�]�������+Qi������g?_l]z��3������v���'�M�i��`��?��jWv��e33���\�y+�uC�J��q���_�����������W�����?�?#���w�'�z����X��b������	:�_��`�����=y��|!���h�K�&8�k����oD	�C�F�g��`HZT�Yd�
�>��Ix���(X�f����[����f��%�5�xsc��x���1���Yt�wi�\��%F���#zC�hR��W�z�n��RF�$�b�!���������t8j/��'<�N����I���8�O����O"jl��4q��s��k�&�?�@�+�W�&H3�t�]���'��y<	}��������5��7�' *�����4
>��AE�f�������8���6�����;m���O<��Z9b���d\���j�U�&"�-Z�3_T����/��	�H����XY?#W�2-Xzt���"��MK�eb�������;Q$4�A��<����Y��K%�"Qx�8�vG��^qO����P�����l{��F�z�/Av
��7������md��~]���W!��#���t}��M�dn�L"�}[�H�|��`����T��r8U"�R���Q�
���_.Y,^\a��/�~���\i�������xI���i���*H�R���w���&p������&Z'�%_�G���{\gq�)���7D:B]��/zS8
,�V6�:P�c�)����G����?��)��k`H�������t[��fH=P�f��@�|7��hK��wM������	��^�-�z�>R�"<m��I��h��[����6��V�-`���%�x���C��'�q����gf>�S_��,�a��{�n�[m5���C+���j�P� ��������D��Q�����ko�:R�8����`�1
�>�x�O>�U�������,����
����Jl�b��p���f1E��	�by�R���������%99�`�9�d�`���@��Ec$I:~��Y��u�B/�?�A��`EYl��u'�;Q�.h!*Xp���o�����a���W�.�aT��Z1Ck��_tuHf�0y���^�o�����|I��;�Ns�y���v�P���;�-*hg����
's��mZe���)�z{�8���iF��E������.����x=������;|
�������G����&@2$��t�q��K	i9��8�Gt�������"�������	� �����\��0��i���J?��1��e�fa���`>�!A��Y!:�
�,�e�E-q���{z�Hi�h4�mPS�[�����?��g������I���y��������o����F��O������.�k���Z]��^�K6i(S�6�F����A����w>,=Z��t����(J�����<���
U�4��=���OA���T�x������o�,G���I"�R���
	2��4��D����0��&0H9.a2A��kn�N�g�7������.����w���%�O�v���)
��6�ce=����:}1�����"�W�'?x��7��T_@W����+_���aG��G8T��1����~�<�0��0����ek_��	������u�-��T:���(����Vi�Q~�H�F3��Z���?Hi��� ��;���������T�xx+���N���)~���������nDH��O���`#45j3;n����;[g7����������?���6�����6H����ms�a{���0:���3y�����.�i�eN���mY&�w+^8�����<�Y���0k������N������^��x������jy%w�<����^#��i8eQm0G��s��y>�i�������H��
��Z��m6	������qm�:Cc�H�5�_�T�����9J
����c��MY���-�IW��\3�"80���aK����{T�o����"&a��'���NR;7��C�:i�e��U7�CTH���~�z������=���#�Y5�*��tZ� ��O�6����<�*S�h"����&��D�+h
�nQ���@�Z��
n����a�-����|,�Z����3�**.m�>d��-=@�P��]*��:�fZ�^�^*��q���pb�� ��E|c�����z�/�z�)�WW���\o����>�������(��j��V@�H�����d4���)�A����N��~��X'j��$86�A���%��-iy�D����l?&[�iy��FZTchO9e�g�8h���m�6g\�LGN�����e!mN��tQz]����x~N�[��A��{�!�b2G��M��x!�FZ��#�^�k��(�������8��������)BW�DNPz�K$�i�h�T��������bE+#6��$������_�lt�	0�	������e�=���5�4�]D�������m����C����:.5���%t����Z��
�E�:���	:���+��~��I��_�����&������������(�3��H��O�=�}��+�sW]xuo�t7a�	z���?���w�=��V�w�\��]��<N����i�j�\�J<���\[5��C��������9V��m35��z	�s�n;J\�k���Est&y�F��bJ]�����<���%�4#��/�o(//�V���J���H{
�Z��� L����iV[����/�)����+��3��?��7rG;�5t�4�d�m C�1E�zF�����ax�����w�Ab�:B1{����so]��
4��e�@0��`!?�27��
�r�N��@��,u�}4����Z��z�.	���A%el������W��r(��p��JD���9�������(ow�0
�1���"�+<������,��:��v3J����>�Z���#�z^�6o��k�����k�;?&c�,���eP�T����j��pQ��e�x��vo�*��v���,��=Wv}+R�N�9���nA'&���#Y-u����N>S��jctW��Uj�5w��M�T����%*�Sq1�t��$�M���\j��p���������E�����"�0�.\����X����(�j{nX�E�r�0�DZ#��?� 3{4����)�mk{�x���C�3*���L�%�Km�n#��*R���4�1���M>U�6����-&������I��������o����K�z���m�'��;2:]��p�-{9�Q1[a���z�����S[�����n"2�@�X������bb��4���v���T�q�b��ph1t�@�(�T����#��*[�l��j�s��S����I����$Y����n�+����Cub��+6.�o_�g{E-��i;�?���������%�?Z�V#����v�����{�\�)����Un�!�q����T{����p��?����1���f�����9���K�rZ<��CQ��E�6����A������+������)]U��w�?!i�U���M}8��0�C��f�����z��S�������Q;u�[6�f���*�a_�g���L��_G�{�I��K�����ZM�yw��F���y��G0�7���������`<L��}�O���#��~���c�]D?��0n�paTff���
@���<05v��M~���y�+���T��������]���NK��h��!���VI%��e�����3y����|4 ���*1���d�^8b�?�&���L
��p��j�9$��_����A�X�*����q����`�e=^��(��4a�e
�����e����	\����
�/r�:����N�����Z�M:aHY�9a����"�{;��?Y���lmP�Q�X������S�>�k�T c��0N�*�>'I������@�)+��N�YgP[�hZC�V�uz��'�T[GGvv�W�o�2w��I����#"3�Q`J��)@��\R]J�@ga��������K�/��w�����y<Z�
�Z���T�����x�r��[t�J �!����NWX����Ew NI j��w!�\���+Z0n��8�{����3��UW>)�J��xGK�Vx7��]D+9������<�5����(��pP��7�?�qp�X����B��y1x�7�Q��q�Rt�E4�]l�6k�,�M�(�x	u3���+z�dIyDi\RW��#��n����<�����M�^���`�9����O����/�8	�fo+	PY����gcJ����]1m��-�
��On�hK�o�����H+�E��I��8 ���,�,�LF�"�����4L���Wea�XMR�`J�%:�2��
�S��sP����3�AG�
��69�Z��4��?kg'�A�E���J���>(LQkl���%2�C�t>
��W�w�~A��[�Ol�.y���5�)�#;��4�����~����6;Z�uck������^���v��2nt!��z��.�j��K�
P�Zq��Ta'�;�N$�6:#��(���+�n�M�W��bL��
��!��~a�:��OB!zKZ�B����@KO��s�N�t�<\s
+#
(��8��Q��(_������f��b��=-f�2S�M/����]fF#cv�������Tx�"���,^��30��
�V���Z�V�A� M�I����N(f`���.�x]�wA���n���q��hb^&���&RM��K(�W����G�>l�)x��N8�`dRQ���{����v=��������I��!�,���xM��-�S�%�Ca�9�\�]���3�������=����8�x}����$�w����f��������!�Md��Y�i���(��tpG��/H�D��a��D��r��}���p�Qj���J1�6^Qq��{_2 �����h���J��%p�$O.0��$0bT�p��)j�EkB�r���w��g���:�|N����Yu.$hh���X8���/��\�
g��g=E����������8��~�7�w���[��p7�
v����9��M������V�~8��N���H��P���E����c]�y�a���&l������!���w���o��?[v�m�ze�v��1�z/_�r
 ���Nf�z��_x���1i"o6I����������_�#(�P������^r����1�k4}�M�,��.}������i����a@%�)�5������&*D	�&x�I���F��u��A@��;>�N���1o@iT�1��"�f�6}m�ss5R�z!���K�;)�T�����gM_���(��1�L�H�'l{���N��hv��"Io�������	�W��`�0=��s�gK �U�~w�E��O���I�9��!|�zXL�/�T��se�v*&�I�������#P���	�QL�E����w�!���a�oL��n��rA���,�T����.��V�)���.F8}0"��O���%;AL����`��Q�wI��? �i�c�����-��X�r�G�
�	f���4��`s�V���f���<?�Sa��gG@����A�w�����S
��I�.�!)��Z��r���R����l�S4�?a�"��#%c�,��Jk�$��'�3����q��}��~$&uY\^0���[�K"���X��HH&��=f+��h�	F�|~~�y�;�z/	1N$W
�8c
K4:��&�<%?h�'='h�������+�>�$�V��wg��sV�JC�a�&��c;&\��7|�h�����O:�4�z�e�Io��!/!	M���
�5�		Oq4�0^%kxd���-0��|�;(����j
���A�O��s�[�#�=�iZG&�����wc��:�a4��b,�g	����p?�d��������O� p���-B�
�C�sBA����E0��mt�7���H�9#(��c� �W��S��*������5��s8P���H�t�r�PX�?���K���d����[Oz�GV������5:��e�N����,�J�H+#������tb����F���G4F��/a�B�%��@��1�����V�;cRz�Mah����!��1�.AJ��t�	�Gr,��� �>	=Y�����FbF�N��4<z�!���3�d��5)<��4�@�2��4���K��n{��B��,<H�M�a�����SJ%��������@����	e���'��������4�E@�t��L��{��� V��S��$�E0�)�J���>fh��D4Mu�T��R�6�-�#%�#�b�5���@{�����(*�W~�P�����G�1O<�|��JL=I��	L�;��MMd�j9���z��1��>�e�wnQ�m��_�'b���_��������U���e1��W�=);x�a0�AH5�dI���`����#��y!�@�"c
E ���=z�����+��6+*�:�����A2��
����C�W�7g��N�o�./����")��������S��'3��%��'�5"5~��$B%�I96���bR9]�%��)M t���[.b��sD���j�9^#��9�����sV������Onl��&D����?�w�e���Z��d[����Ri�h��g��<|�^
�k/&�T�c�W��>w`9���'�����E���dH.���������'i���ba�G����y,��X#B�T2��)��S��b"�E��H�V8m?��OE��o6��T	�M�n{��=
�F �=���=K��0���/�I�KJ�����aRIB��*����/�;����G
Z�D�F	>���_A.�O0d3o�����wffW�/��u?��i��[�A>��YKX�2��d�FOF������mXp�N�B�m,$�\�I��-
S�ES�ArI8�HW-��
����O�6��>������J����#I�������d���49�Mr��Ba�����Q�����Wx���XG�x��y���c�rV�����g�q��*�J���2-yz��k��M
=��J
W��i��)$I�/���
�R���L��5�L=2��W����hi-��3L����O����9����6�`u�5TETFs�������B�h��8�� I-'���M#�gF��(���h�����$�����6M�f�drc��	�k�D�� �-�@�[���p��b�_���`Q�6��1��F������/�TJOQBCM�����|o�Z�Y3���#�T&v58���w�8AJ"��{����|*�S�8)��l	\��h�{�����VL�#�e\5p��(P�b��;��	�sr7Z�^m�L��u�������=`H
rt�m|dm�ygD�=r��y�Y\���cPT)����1H7VR�_[rA�������ss,����>�aN�B��������k�x�R[���to�Y�)L2T"����XY��
�F�U�����\�|��|G��s�����KM6lo���=�H'���_L,Hh�)�}�6��8�9L\��O)�<����P��[r�.I:�`V�Yq�r����R��V+�Z��&�D������,��Z3,�&r9��
D����K-�q��V�rl|/m,%���"�����j��GP����J���OX����������VDH�R������]�Ai�Y��8�8����v��T9��n[��'��$��ZU�%�A�����I�Wq��������LD%�0_����)�x���
j��Y?�O��s�n�`E�d�������%�+���5m�:�y�I���oi��y<���t�p��,d&9��
u��FJ�zl������8�A�x����������v;�R��r��[�^��{���o�JY���Ie��p4Eg���WWg�W�7�����>����'��TF>��"����z����i�'������$$��UU������U���H��k������)����t������I������H7$�=T����J�ET����%9�����@���\P���o���-�m��A*#�I{����@�Kd�������K�pq�����v��h����_������c:S�[��z����/�N����n���������Q��@��HB��
]����>��y������� ~@Op�>�=��p8��YH��K��Qkh��`\=ej	L0��G��{4�3� �:V�]�M�;V�E�����p��o�����K�g8j��=�t
*$Q���O�N#���8�:��grh�uF�JBN������{q��\`��oV1\��jr0DO=,��v?�QTeJ�4QIZ��H�')�T`����y��-8���I�{AU�j0u;�zg�?�v��>�E0���E�(�l�zr ����N>>1���w�^��|�����������}�/�RO�-�uW/��E�v���w�x��:
o���6��i�H��5�ap���$6K�^7O�s<�t�@��0:!�����7'��9����!���k�
�d��!#����@���t����
���[���5VY���		��`�����
���cq5�Kf1�th�'�&t����D��'W��1����t�5����5[����V��/}�ujv��u�})_���x�K�4[JM���EgS�����]^i�V�Ub��f�l����4�@�������L��9.24bu�z.��8Z�s�=�n�1�7��(�B��6&D��?����<JU��N��||���x���D���q�����o!�c�S\F����2�i^*�?w)a��s���@5�l!�X��4�u0J`��1�+�C�=S:
5��`��s&�kkd;��!OO��C���}���<A�����m�C{']y�Y9#���J��S�P��j������m�H���~N�1�����L@���)����7~�\���1������-n��������Q��D^�a���^s�����)���(�������)Of8�!�M�	�o�"�y���$p�X�+]�T;�_��N�������#J���C�����S�H��N�D����g:t���O\?J�����FSk�TB���6-����2��#�+OL2yiY�V-I��G�5�_-�r�\�Cy<Y.�������:����4 �mL2C���E���������l���/u��R�1r�Z��|�OO$
����*e<�>��V�X*m����,���]�c�v�����x��h��)}k9]?B�=)�����u�� )),�8��c�
����4����u��;
��Q�7 j�������W�'�������)C�|��������9�c!{t�����B�?�vL
( y����3P~���\K����^�E#S�9��T������Q�Xd[���D�f�!_������Q���%J��������V�����_-g�
���*�PC��RdVq������J%n-���M%��BK���L�'�����hwH�3"R�~�����������^LK�TUQ�d���������/Ee��@�P�^�d�U� ����������n���?��s��qr	t�������n�������bR���j���GH�t�b5?n�<��]S$��G
����N������;�F	��b�cID���R����d��M�����7�����o�������`��������X���$(��.h�1�N�����������<Z>�/�sC9�ZZ���$�[���d���+�9�i!�h8��y�8e�V��(�0Q��������k����x�M|k�2x%�J���je����\����z'�����ym��Q����U{���'j%��'�I�������|`�F��,
�������S=;���}�O�w,E�
0��#&n=X�qL��*���6rvI�
�w�J��,����}�J!fOEb&��{�W��E�"�N����+)���v��p|Kg��~y=�w�oV.[���L!h������P��z=#��x!�����o\1].GR�t>Q���U�"{���Y��g�Y90�A�|��:�����k2^+��9�g���������i
u�5��$�������������~c?c��o�w�i��{�(�?R�W�oA��%b�����_����(�E\�Lp'�U����[r����%
q
1$��*��Lzj(Q��`������u��0�T	���F�Y�?:F��>�Y��,���u�d.����� 
ry�s��*����D @�{<�<z��/��py}���^^c�m%���l� 	t�{��`�:38j�dZ������^��Z3���d��)2m*�z7�U�m�Aq�m����r�U�S��4��n>���z*���ak[���D>�����nO��T67o����Qhgv?��>Pf�=L��D���K��9��[���;�����g��x�o�[3�m+�7]znz���1�:A��R�S�C�:7�-�����Y����(�X����Y���`��D
W?�����x�����/��\:t����N�����L�u�������l=��7��}+����f�~���d�hR4(�$?�L�CB�����]f��^����a��������Y/��
�����}���s�d|�T�����K�n�Gn�a%3c�6��lTz�{�jc
r�E�Lu�7t�x���-[H��yZ*9�5��E3�OJ�)�EP�p������a����gZ�:h����u@��^Iz��V�k�K�p_��)A�A�����Y*��W�h�[4]�Z��CJ��`'�G,�La`s�4��������b�R`�����R{��_�?�A���=X`�/mmc!�,h������x����`��� ������'�Z�x����T�(<10�q��C<�����v�q�V+dh?f�_\�0�V��Vp8u��z��?�7V,n�����a�=9{6{\��>��TTD/�PL�:��_��a�����%t�|(E���i2�eaT5���(�o�Zb�6Tp���)���������-�X�"��>����{�u/dGY]��j`�
n���acf�K�`�!5������]�S�zT�~�C�������������A�><`��������v\^��5�:P�H����s#j�a�!j�}�U����&���S�1(|�� ����Q&�������D�xx@��,�:�T�$������,-M��BR��J7T|"3����i���i�Yl��	�y����h����_��6��$k~Y�P�9r�X����rl�����>����{����|]PK���1������4�����oO����`{��we�+��cn6+P�����cm��hx�gu��W�����W���6b����3��q�f�}�z/�V��T����d�*>���ZA�^�;��
�Y`KHB�9�b�K��*�[�KT:H��9�Q�]�\*Lx$(C����X�,�51�>��[��Qbn%IO�9Sy���AO]��4P�"�.�������V���5s����Ar����G#��A���u��Z����n��UB��{|J����������eu������|�C��3r�y�	���|������^����[�|0KN�iHBN�EG�~�������p�3��T���
Pt"���O�gA�^�(X'�+�7<Z��F��y�Y����-7�qC�=t?����F��>�4�fP^v4!���o�H��Au����O��]��j��3�t���h\�l����T!>�#UV[�����{�5)��C��S����|�?R��b)����M�g�d�uA)�L�O
tv���r�^T��h��������AP�7���S$g�mAQ����+-����]�\��i���$����W�Tg(�{�?Z����5-�$��l.S��
�^\�:}yv�������w�'�R�v5L��BRm�p��>�����ppX�i]e	��v�:;����
1�
F��1�D����*���1d}������H����0��Z���c��������`��m[�VNC��f��������~�q���[.�%�6&z�E,���R�h4+S�-���VQcZ�z�`����.���X^[�h�r�������������u)���Y�q���s��60�~��7���F���J��`]�9Q��A���5���J��VV]�)�R|�:�f��<|C<�xu������?
�V�$��=j��@����a����A6��'������=A�l�.J
�E;y�J����v.��h3Yp�Q)��
PW�e��F1�"V������.H0�
)�u�D��$g���u�(L�s��v��b!��F������p����~������7
�;�����QX
	\����D:���J�_c��T$+���Z��q����|~�ZgI!O��O��P��� ���Y��������Z�}���0��];������3����s<j�A9�i�=�h��`��;����y���� ��$	�i����:]�����O8L�J�vl����C�SucQ�9U�
8��?�B��Q�D��j�����[q;����se����\����"l1N�.R_�|�1�R�l���b3��
Z�����;�8���p`2R`��&����@)��L�54���S����_��S��<�1_����$���:\�6�7
|���TF�*\����#��^��@�J��Xu�u����$�X��(7'F��$T�'�����
���'�\A�&�����������8��g��f@9�$������)��	U��l�03On'�]I�����=I����;�z}���� c�)���C�c�lu����[���;��w��8�u����=�S��V�O^`�m���������U�oS�p��(��i0y�������e���j��{�k/>r;�����f�6�����9FT���h��=j��/T��0��>���(lMC����O�Z1G������K_nk��AXg�i���������1��H�Z��v�����`,��fy�H��}���=K<u>T�9�?�p�� ���&?���r`?��Y7�������Qo4���O���l��"GE!M)r�
-�
��:��F�G�l��1C��6�N�A�O�A{��Is�_�6��
����w<r�o�G>u�JS�M@j���p���z�������J�����+���f������p����������Y��kf�7�5�A�q���yO�|e:���~����A����D����G�t�D��
v�� �D����x;��CVk�p���/�����0}��w&�N�Q��H�\?���t���-��%�o��B1@*z��[�?r3�����,��������*�_/1����^���Wq�������}84~�3�VZ����T������I��4��A)(��*ob)x�42v�'�2�-M��O�Y=��*>*EM�
'�.
�����?/�(`PP>�q�:t��JJ���V��7;-3���q�++��2��i6V����d`�/��m����.]m������m>r
���=�.\��I��h�U8�����~��@��yC�y�E^������c	u����4��?�����^gw/�������O������������������X�
��!X-����qW�i`#���p�m��q�^��F)��_�����,�v��4���Q�^��{��p��H����UZ��2��[YI����9��P�����8����,�V������vwvKuQPld�(��Um�����*E�8j�j�M��RfL�_��KLG�'aV����TJ*��|6(��7|��w��H��`B
�$w����$��eu�m#�9�M1�S\�'/QC$�o����I���Rs��t�J�� �bv���x���D������(�\eM���Q����Sr�OJ�#h�vF#,}/8����-��u���� ����Z6��`}
����a�Zn��nC�8�*����|NApsjk��,����)�����-)�N����p���0~�Q��[e-,!D�������,�L����6,���E��{��B�9�����z-���}�F��
��<������J'ME�W��3���" �,?Q���h�����`cf��{�!F.�R�%l����C��O.��E��vYh#n�l4��-��TOF`���������)���>h��
�]�������9��8�8����*�^$�?�e�����cB7m���A*��%u�r�&�?�`�]�;Xi�d�Dy�{�FI-�niSo��I��P�
������E[fW����/���zTcT���d������O~K��u�x����2�J2���$�t�rC���P9,���0�������~��p�)������UY��5g��������5O2���I�[R�lU;���T�����^��
�&]�S�V���������7������?�]\�]��3��9�v�H�KI�~��X��Q/���y���������tD3>�c3r�e-�P<[��Z	���������nz�<�������J���F�P^4�����_��#��R���d����%����-�~uz}�6�*����n-�������C>��>�r�V��2z��GX�'���K.z�y�>Kh �kc�A��7=+}~�@��������=���[�H�f�a��>����4�H%1O�X
�Xo��F�!��S���25
�
���c��:'�g}��R<%���f�Rtt��G�=��c82
�:�&��ng%���pQ�J�z6����r1AQ��G ���*����=�DnI������W�S�B��E��(��x��^����@�trMp�Xe���O��ld	gS�E-���JD�9���T�Y�n���P��`'&*|�40*M2�}��Q(���$�F9�'*H�ti9����2���
�����C�=(��"���!���.�Lj0V�#�(�����b_�=Z�E��w��=�j�E�\J��������2��<��,��� ��
��^vu�='����p�gb1@"�p��A4����������"c�&����l�3���_q'�]"���g����a`�������z�N�`�O�pz4n@O�q�5�up�|yJ��{_SNL P�H`hW\B��
�
���vk��G��j2��e�Wi�@s���b����r������W`XU<C�#R}Sl��Fd��������h�����jgz���wW�/��fb��F(�@��b$)�\*��}�T\���#}oa7�n>���S�T%R%�*{HR��a��I��$�����*RjFr5w�'���9��bt�^D�i�o�_��z���w�r�$O��;g�X����RXb�9����&>:{�K�w��\�����^���2���_��"R���4��	��U�&����C^��#����,�~�k2�K����T�Wy?S�� ��c0D6��)Ek���6;�Vv:���.�s9�H���e)��'y9b�<q8�� �]���D��A�I8=K`yH	�/��p(����p����
D�g�d6P�C0�z�RJ����dZ�I��#��Z�h�$�u���:�Z��.#=41����m�W�c�����l����^����~���>�9?!,�q�#��a����TK>��w'0,"��N��L�M�gp�g T���n��CQ��^�r��4��T�YN�rT�w�*7IS}B���\�[�tGI��;R��������J���D�����L�����kL�*�,�^<
�Ow�8�%�q}X'F�U����W��X�������<G����0S�����X)$�M]� r����6���~,>K(�s��i�O������J�)�����|;.��Y�G��C�eG����Z=7��<����K��x�������&,N��~�	3���\W��S&�:t�d1�WU���V��R�\��h�mV[{��8�h/��[�z�e+l6O1J,-kR���?�U�KU!���D��J�P�����_������6�{K����Oh���w�����r��)���������yf!2�s|�h@h�-���&����];��N?Pi�����x{=���Ew�d��&z��	hq���?�������q-�U���@<��EA-��"39_L�����BS����&���0a��_^��|�3,��i:�"�^�u�����*���+�$�Y�?�k��*���-q��m�C}����T�\���k?~PA0�|�
_9�E!����W$�"���R���eY����J:��1RPY2���wm�``,68X�O>#"!N��T���jR�M�����	�����BmK���W�o6���oHs7��x��~�h/�dK%��u������j��E�b5�r���(R9��H� 88&�=,X�S-b<�lb"�<j/��X���N}Q{�,�5�n"[�EH~"������PE�;��	� }���f4���>:;$7E#�������nAn���9O���/{}�P���r�LU�O��I��(F�����yL����E����������c�ze���R��\y�"XV
W�o��R��,�y�l�,J�i��>����I@�^�Y������lX��3,z�XA�U
c)}�+�ok�&E�x�5���R�'Z�[RV�s��Y�Ujx{8�u���FEFt��-qP��I�R@�-uB~���h���B�=*�x�J���K�|AD�����L��$�$��D���W�C��7 P�;�C���d��1�
RB��
�YQ�S\yd�#L���F����+��u�IDwv�kS�(c�Q���(s�j�����J��%`}�4���{=�������	i�+]�:W���`rN��T�����9�w!3������9�x�k��5�z|6=��H�-K��M����U��c�(u��Rf�����o�^]]^�;�V��'�?�|E�������������zrz����ue���$��B��Xz���L�8��L����(��s�
��?��[����,����y/�~��t�U*����:��bz(cn=�����������'\V���f��pIL�"��f����i�.�q[�=2J��nb�T�N���������$+��.ZH���2)Vc-��9���H,�-a���\�xi.�VLs����P�*���b5��`XW_���������#�0^���L]�W��/����b]��7�\
�y��j8�Z�+vX,��T���r�������2�.3�W��Wr�JM����2��}PRs�[V������)1����/3A^�����b�-n����+S��	���a����3kF�<	p"����B���&�b�-����k-����Eu���-����e������������K�}�\=mKd������B���2i&f���R�}��QH<���]��h����b���@&?����Z�����%	�e�F�&�d�\��u��*�)�Y����M0��4��E`J:,�T����<}��-8��mE�h;���+^�y�54�,�w1S��
�5���#G�<��0�������K��q�pv�M�j�W��GN�b��G��Qx��B������	������P��� eFr<����Z�
�mL��=-�������[���t~N�����[�3+����"���a�w����m=q�q�������Ty�h���������!��q���q��<�~���<�PxuFo��������e�����������~Tc&a� M��^�,��M�#�k�o|)7'BaP��)��8��<�s/���T���������������4q)�o��5,�bm�����>��
w�x�9b��X,PB����	)�>Wz�t��Rb_�uD&2�����:��
��o�����D``>��T����Nt��r]�,�l@"�A�!�����4���!V�����=�c#��'/�qv=����HE��U/wl7�[)�R���-���6�A���6`}W�W��I��V����*�bJCsm�t�b�2hr�������i�����mKn�
�(�U��cgE5;4�(U�YPZ)8�"���g*���=�
���D
��+����:.6��%��Jm��v��"��u���P���p��e%?xZ*�_�{o��G{��;J!� ��P=�\6��`N�I4����n����7���h`,*�40�N5��,�(�e���[Fe�#!��
�J��@A�y=c�8*��}�������9��rc��P�����i�i���U�� �c�c_�P�%��0���Qr�N8��%�f�:���xCH��c�9TE�tO��oT/�Q�;��\��[���������������
�J��R�f'�'ul)Ko
��O��Tk����'F&�]>�#}����8-�K�{v:
�6F�DF�^M&���a����;���j��S��4$��-���F4E����k�(����`+�o)Q�����\�l�t�R9���:�KV�����|�.!���4���u���f�cr�d�s*q���!���uC)d4u/�� �9�;`�(��#'��D�����@��6�b���[H<Ll���p'^�d�D;�eK�{�sMDYM��O���n����f��`�]�5��p����,�z������J�sYe��dg�3m��3�z�:}y��Zz/�GA�>#��R�2�k�u�(t�]����
�i+ws��@������M>"��<�"�T��po�uZo�������=R�H"pj�Q��S��?�k����N�-�����Z�B^��jsVo����|�s��xR�Drx��TG�R�p!G8!sS�8
�W9P��~����k������F��'�$AZ��|��[��������X���'Cq�E��1f��H�	d���AD�������p
�3#�\�����k�rC��`~\Or����TI�ruC-eHK�SMF#��P��l q(���#��g���|��N�-��#�������`d��j��n�����Fb:g�����GC�_�c�5D/aeO(��&�)���)~I��b�R����RS�/eGC��,�~}��o����N����I:-j�%_��lC�"��B������)�#��_2�zt"���9!L���n<�D��K�{,�j�e]����h2�����''A��%8m���|��+"���-,_�Xx�s��	6�^�l�h���J�+���1�:�b���+�'��B�%=�4�����
?�6E��
;'4`Y����}��yu��o�B\��Wy�5_CB������0�x�N@v�x�*[�|p���~Fya�V�(��a����j%��|Lz?��r��8��%�������8(cBey�(Tr������\~���F�.��K"�32v���	����B��YsI,%���i`��e�r~aU�(|����Xk�w���+���2�����o�E���x���6�*�HYH�hw��n�!��Jo�K��_����F(�9���I�E�!8�����2�iQt�-Q@��t����h!~�t���#��k���QI�<1����=��A9d0��`��� ����k�l��C��D�>QD�QL����x�+:�����������v�-�'�;��G8����`e��Z��H��{k��)����$]�;e6q���t(2��H���wY;��@�d: H�]��`��1�O3��n�D�Y$�@8�
)|Y�UUe��q���S�6gwC����U�9��/z����Az]�����y��|"���ek�~���8�']���e�q����V:@E9�%�AC�7���^���*��>�kJ�)u�k05����w(|��BZL���
����+^~��K\���.�	`T8���z��K9Y��p
I�\�2�R�wP[��p�%������ �4f�����Ys��Ea�D���)mV~#E�#*4��s�����5������
������a����X����mG �$������5�a���r�8�7�;�������5K��B���`�:/�\��r���2��m�*E4<l�R
"�;�8�+��jU�mL���H���	���?5Z���.�����phN����\ns*���Z�%J6
�H;����`���h
k��W����C�������H��)���Tz��Nn��8��������G���C��h;Y����������/���r����i����W8l��7S=z{vA����k/��8�nK"J�8�l�3}�*��������U@�l�<�?QMI'��E7�f�u�'!k^!k�,m����!�1H��� [�j��p.�&����&i�9��"s��f��:zL�ad��v0���������AYf��zZa!�
`����mR����2���X������fO���E�����>��r��8����2M����Fq'���N�����S�K�������R��"e(y���D���1J���J��n�q��v��y��C�wJ����0��_��@[9���2=����'�xxO���?�.;����y�s�y4[����s�u�kV�G Y��$�4j����w��*�+��G�&���~�e�T~��0HD^LJ���h$w�	|q	��3�/s��}G=/,�as�qm�py���$��o��ylj�\D8���?�}���<�#����JS{p/`�������P�)�n�d�iP��(�3b�E1$!".l�'�f�yH�|�`���1E���Z�;n��w��>fr�4���H&3j"E������T6�����p�W����Y�x���^�S�8er,9,����
��dT��*��5�Z����AY��q(D�]4�����ax$���;@�d�c��
RW�A�0d
i��9��������%�?-S�`iQ��u�8��u��f;��V��x�-y��=J'�W��F���VV=��9x���(?�2�"J����l����F��-��OP��&p��6y6;��<}��F{��X��$m3��DvQ�F��tu��Qb�������{^�'9��p�#U�b���}�}�Z��%i���o��_�]��(����4
\���dA9����8�@>an���q���\G
q��d�������#/��W������
�9� H���{f���������I��z�l{�e���3�w�.D����cL��-D7|e�8����uP��r��7���l���n�����V���F��7B��~`g�����0��pN&���������)o@%_!Y����6;���2w%�s��L�f������-e�n�V`��2��T����Ar��G{��Sgw��a�.X]n�X��<a�(�����`8�Q0�@&O;2�g
0�p��5��
��<6D�Q*T��h�����k�/�"cg�=�0�B5\	���#�����S��EZ�Z��xM���W\3�#b�����S�X9/(�F���[��&�A�Y��f�-XE�
�t��F��vL�R
�oGr�N�"��Z�!�
m�TMS#mk�����u�����5t��������c�`�$�����kT
vi��
��_]������E��}�X�4�E�wdw)�j:��q��A�����a�p����*��Y����/U�['���g����W'���~���qs]H�������Y�c��=^����:��F���I`���;M�:�U�o��yp���t��^����G�s�O*4�X� �4��7|�E��:�%�x��I�M����
������[I���]��T�[�ZDE�,��.�ADy=MmA��#%�ll�6Q������jt����u�N�	�tCPel����\�U����:�����Y(aQ��V8���O��� ,Tpj�U���� ++��1�U�Z����AXUqT�Rt�T��1d�O4�}����7&��(�U�T��.T��&�#-�l�������V;S>hM�U�[5�����������02wU��$V.TEA
/?������sL?��o�5Cj��|q��o����X�u
e�S����dP�j1)�	�R]���(�:���i���-�e��3�c����>f� q�n��|��\N3����`d�'%���(��krS\�n��m�o��&��[em��������6{��}�^e����|�^q!6���<�	�^��I*��6����r��������|Y�
 ��R�0�������X�k`�!�k�1o��"����@i�\S��\��v$4;�"|���[[��-[��V-v�tC���T�!����UrPK�%������l,a��
D^ubk�(g]Tf��`x���l�j5����w��~��fuT�y�+WS�!+���K�de��w<b��=j%���F���.�'�P�����6�'�H�5���n�����1��:�f��-K��)^�Y�x>�,hD�Dr�Z&+�G����yw��Pf[����R�Ej��E������N_�?�)x�P��x�kx��_�I����#z��w?�.�k��:T���)����%��U��������v��)�X�L�m���(��{��L���v0�ysuz������u�����ey��E��u��$x�����]bi�����������4�fK� l��#k}�V��������X�NX:������N���
K�c���I
fyM#��6��y���<)��1�W�E:6Va�Ef����9 �1��#]y������+<}���lb�a����dv@�,�nq]8# ���]Q��)�}�	+C."��QR�r</
�����L/�W���T/����'�/-	N������2w'�i��R�F����'�������&�(�7rsu�����\`F�l�"�n*����8��$
�!;rhU��r��]sn�����C ����(A@�F��0��bU:�� ��%Cdj+��]3xvY�l�J1>w,/����3�.���p��5i/.����H�^s{��^��(	R�����
�o�C"�m�p��J��j���E��B��gR�B��Z9�����a��D���R��#���"N���������m���@���`��O�TGY�N&��\�X��I-�\qb���"�f6t�Z�g����w��=����W����*�������R)�E�zn�������e&1�\6w�d;�{�Bj��UH
	�<����u��gy�.8)��h�Wu�
�s��Q�� �R���r��C��I����z=2+��L�_0��1�9"7�{6�)z���.e������Ak�n���;�����-�'����-��+T���T���X�++x�w�c'�U,�����f�����&`�I���%�[�?*�!��9!Hb�%�AL����0�,y�	�OL!)��O%�����������\��N,]�����/r�`88�����a��A`7���vE���s��	��pr.*���t��X��F�A���(\��l���J�	���1u82v���x�VH�������.-/R&��~ ���
q����b�hY0��6�)K_F
!�������3(-dN\���!n���)GGV�|�KI��KA��dl=-3���8�Sd)��|f��Rpn������g����*>��0a���p�
��^/m���
N��R_���LN�UN�Beaq��R4^m��Y���7�!�,x!��rN�*Fb��+�
SGy�������b5]��N'�[P@3����5����@8'������ld���%oY8���iz�*���&��Oe���!����p@��,�V��j`���%�K��5?3���m�����<}u}��Y+%Wv����f%����O��Z�=9�|��L�����\�e��M0�|x�T0�6�����e�LoXG��}�/R`(r��<��&�z��o��������� ������)�A�R�-�?7�l��Z�G\�r�:G#�V����M����Pv���`�]���SG��>�����d_��~cx����`�o�u�f�������j��m�R���F4Y��*l�sMlz�g�I�S������m3	��?������q.�x���d�������C�o72��8��iw�9(�q��Z*��=�7EK�Fq�r(M��z�=[�o��d7,��z+�'B����-/#q������M��Z������S!�I����s�.K����Zmr�
�l&��������������-Zb�oS�'�
�g�E]a
R?6��*F�*��:��+#?�TC�~���d��).�6�l���~�u�_�����������z���a�RT&���9cs��~�O��p�����n#�gj+�%��#��c?���>8��w�XB}D���f�w�����p.��8=}�|`Tt[#?�:<kW��^TN���='��Jl{���eO'��}od\.�����f�+�C����aT�����Yc��q�T�)����P�n����1HS�Ik-�4���T��0f�g�[�m�1Pgx��]�2��/A~p�V��Q��q��t�i�U�n���/�����._�]�,~����^�3L�����;!�0{��Ta�F�
����)��E0��!���y��r��p��y�����~�:����&k#)�
�@�"���b@�1!	t��u���7�E�[��T]Y�
?q@�G�FY)�o�0a�*Pm��{�a�k8��#��c�����40S��$���\�m�������]2PC���^J)���<���6x�<?o�,2�-��,�a�|��`J�`<�z�nf�����&?5�x����J�1��x{|s���Un���f���v��G�-���L3)'�����3�G����	��A�G��)���
E��������d�M������������=��Rn��~���\��"}a�,����,oi�MR���f�f��IV�*R����W�9�������"�5�U�8<q�S�N�8�Sn����&�=�\;��o�+�Q2��mU��Y}���� ���.��TIs����#��:+�l�����:Y����P�|2=�������n�L�!������&�r"���*�+�N���$�F�hmE�_��)�	7Z�����h9I5�u����c�������[��]��, �b	@���H�/�:k���3�(*��f�����s<e+.���������X >���O/nz��������������Y�����]�+l���B,7�WS�Y�_����Mq����������mmF$�t�<Y+OoD"^U�2&�|���ht-m�U�R>-�C&�d�U����,�*���Z��h��R�����)��itoK�v��p���(ZrF�����!=
9[��&h���_���\�������gI=���iP��-�l�IuH������`{��m�2�[�D�`S9��1���'xJuqw�~���cFT�S���D���=>N�_���� J��{�W�����S�0�*.�>��{������g�,a�`�*S����B�3C�n�%K��!:��K�X� <�0���$��@�@c)��t*��h�DGp[�Sc��N#k�������zxm�$��-8Hpa�mqc
a\���at�d dQ�m/%N>�4���Pu���|����R���A�
�rAor�������3�k�Q~#��p�Q9�����.8��E%�O}E/�s�9M0g�U�RVG2*Z�����W��Rx���rx�F[LM�J9�&w�^
z����E[Oz��|�\�����!)��t�jJ��LI�W�o1a��R,���V0Q��u������2��gK����0���F�	�T�#?�����T�p���J��N�����xO�����\�����$������������n�R�k��3Y�Jbu�
��a8������G4����y���p=�r���Q�����h�*�m��>U
���<�B��-���q>�k%�K��XE�o��,�v��G�����
����c��f����I�L���z:�0N�
���7��z�>�9���bD/���Me]6g���k���:�������������y��9��+pa��i[g���0�
;���S�W�
���GI�n�?�'��CJ1�������&L|EY?��i1�d�\��U��+���w oT����{�?3$ �zl\��~*
!��|;z�I�l���F�)����KI��:
����w�������W���[���=uR��O=�����
E�
����M�(�"�s��0�}�}Y4)��hC�b?�D��A	��t��<<r��oG��0h7��n����/���C�{u��p����[RZ��Mp1�j��x�.n(��\���������\�N�_ma"��)�������8��7�`i"h��m������X�
��S�oo��Ua����������������/�@��[M������/^|xNk�b���t�&�3.��P��Wg7������%[�
.�^�l?��,e��-^�_�����=��zc��/8�[5{�N�Uy�VX�e/��~�h����n�;�,�B�f�m����b��nK������p��N�_-Zsz^���x��S�����./N��#�V�x�v�w������������a������=Zp�!����-��=����Iq7=U+,	���m��@��W��������O�
��O{'��7G�}�O�����U�.��������s��=��G��a'�(�x��������y#�wy���W������������������QkS�[
�� �^]^�������E���;�x�Zl-����+�{}
�U_>�{m�0��M��/�]���/��wt������Q������w��{������8Ol�"i����9�D��h������sx@�0&	� -�����o���)�37��O2���F/�����
N�CU�(
�|v'��U� ()���555�. ��*��@�5~��=�e��JG$�A�_ ���	W�q|0��P�
���zH���Q.+�}"���.�<��n��9�"���	�T�;N%Bo�X�<�%3��wR����+xd�63"��)�^T�8�S�+��q�Z!gT�!�7����f��c0X�(��mXtw������3��?�)��=G��'D.
�M��@
Ad$�-{�������3�� 3T?�j�oJ�z�!?�C���}d_�n���{�%\&TW�����
Y�6����)p(�����N���!�����2j

\N-��,B\p�$�N;�M���������q
�W���x�X����4&���Ug��j�K���?T�P��<=���$�?F����q��������]����	���zp�An�%A���,��rP���E���4pz�`�������a
A�������A��1�
K{$�^�fB"aE,��m�s��g5���#�%>P�f�1x��	��p>�����3�g/�/��C�X2���Fu�9W �+;F����=�F�r�v��2�P6@V�������N�	hl���%P[j��g/P�4���'�6���E�@'������S[!�<��Q75NH{���{��4]�y��+���3�����jd����4%j�i��8���{�Qa���sv
=L��5r[��`~KM�g���c��}��q��
����k�#��qo�������l��3P�	����vd��������t& �
���JhU�d������6I�H�w+��!y�QV,�7M��nK��9p�y��]<0W��H�x*��a����"k�s�I���9��*������#�^=1bG�!��C#xp��^�.��6���Q��8�\�PY'`/����A�NNfm�n�������d��zVJ5R�=�{&Q����l������-B>��2��>��=��C=qo��3��bO�@(>���0����jR��A����`�Ar��a����@��\��������!�2�U��i����H1�/�Q��C���=����9F+a�8����'B���RZ��E7J���j�b��~�/���a�i@��]h���:N��9���.�����ho>LA}��&�?A�<��.z��l�E�(�M,�r��U�C��L�����=k�J=�8
q�O�(���I�	7cO�=��>�^�jl�`c
���z=="�+8�����	)Z�������������$���M��%I����L#����{�'�#9��}w��A��h��3e���������i '��-�N���BgHR `�!�������Hl��&a
7FS�
��V�@Y�+:l���y�mr�.*�OkN��B�K���Q��}�6��h��%]������������A������Nf��>F~�wK	# �0Z�L)���@�a�2w�"%=�7)h*6����l�Q�����U�L�g����R
8��
|
�_��*������2G�j��Xb+Z�.C{T�U�yHG�u�Ik�����_�F�!�����7�FA���v��W4`�p�D�Qz:�]�
����"�P�l���k�i��Z��J9�|���X�y8z������2A��Y9=�|��m<s�	X��#����c��5-�inn��Fwskce����R��U�:�TC��
�w�D��i:����Z�a P-I>�6o�!P"!�k��P��B�:�0�Y4��Q��h�O�=���]����w:`�bkq���v(����m��<{�����������L��z
v�q��i�P������[_��p��A
^x�K���8V���,�&j�e}��AG�����#��s��Y	<���}����������d�	����G���R�f�!]OAVz��;�~W�(���a5���f=7@
�(p{1�7��E�d�����O�����-����O|g�m���+'��=ir�������C�n�������7`���d��
W��+n��
Gf��]4��o��r�����l�Q\"�=���o������a���	�m�������EM�,��U��xF��s*���$H�������x�a�NP)�sN���%39U�5��~�r"
"�i"��Ns��wjwZ��*�b��Jel��/��}[jm���zk{c��
�t��J%�X6�`@�ih�HwX�~��X�q���X�'��i��n���@_D��(�����\�5����(��B�������I
O����j��%���yMj++�	���Xf<���Y�(7;LM�M��N���,�L�lz�3?�K&4�4��������&�zH���[5:"����(��`�|��M����'����Y��������
�tx�c�Fb��D4��F��	Cl��6v8F��$�, h����p��R8�g��D��1hKCnCT:�����E7_p(����Dzbb�"��D2��GjFy�Z�;8��u���M����f�F4�G�q��"�HL�P7�����
iu4~����p����[�������b4�6�G����1jt��Lb�.5��P�4Y��w�a}��)�����)B�6^n��!�:
?F]}�}�2��P�^X�
s��d%����o�^���c}���i7N�h��qUi���5��Z�j�����K������/[)[��?���.�d0�{F�x�,��a�;	1MPj�7)���IR�f� #c(M��� ����y�`�ek�W��Id^�wq�LyA�n:�J�;�����M�Q�UlX�8������O�Y*H�r;������k����2�u���/�&��%H&��,��s���e�Mf�G�.S���H3�i����^�mmaC`Sl��V���e�-O>P����l�U>/7�����i�����������
���z��OW6�A#��DR�����������9z����(XCJ�xO�����V����|�p�uut����x@�%�c����������u:�s�?p:f�u�
>�#�B�G6j�t�?�e����!	��]������v�"b�na���r"��~2��3����+���A��A�q)��/�I�1�"r������^���;��U����}N����j�-�v��������h�no���p%�Q��R�Q�4����.����6:Q�4H���;�s)��J�A0uM���-���N�y��Xa����O|�4�@�M���U�lzRx�0N:+���=����@����3�f,�h������k���b����G��S[�LNYY~^"��5Kv������H}p�w���/�Xr������������^8	�F���2%vFWH[��qH���' �3g��������vv��Fck�?h��V�����vf�,�t���M8�f[�#%
�ki>uM@ONm�xACm 2x@)U�X�StV�J�����`�����)m@�m(Q#t%��X���������*����OI����O?Il�����3�d�{��h|'����I�9�kc�����W������42We�_����L��u� 5����`�������B�aB~U���0�vs����=�����^��<�Qv���'�9C�f��8'�2�q��W+3W�pXR^��v�}o��	2���R��
++	�u��^�c��f,*�';���*&�|0Y��'^���Nh4���x���2���_�������L�pn.�l{k��;�i6�����J�8N3Ky���t��$�?fMV�
%�@jr��i(P����2.6��.��r6�D?T�qr�|z�k�y�d���B1���)F����F��g�*�jR�OT�l�����G�{�]%�{_����{]��d=p��{"#�(�?�.� ��_G���4���p�� r�Zm���<'-����QLR�a�w��d0#j�2}���_@�����]����
���D�PTI����)�4�p ��_tMp�5:((�� ��y�z�A<ZW/�(71tu2�����\��	L�����d���$�q��Jj�aG3%�� ��&�dEz�H��� �t8����&Y
��������J&k	��-��Y0�^v�/�����T,S#g��V�!iJ8FW��"������R�M@Utu��������O������F��p�S?�L>q=q�>K5�Y��6������V@�S0�\<��j@��7���W`-x�����|x���	��[u��O�Z�z�����Y�#�	x��og��0H�!{�-����)#}]���g=��@:��KN%�fuL/�
�h�-9���T�(\}n���9fu��)�l��F� hn�\x�7�( 
��l�����K�����c/Z�������c'"��t�4�G�Q�����d��H�GQ��W�CE��1]+����kKd[2	�s
����_"E\.�� pg�w
�_�@�_��iHaLp#"C
�T��D}6���cgVJ@�������8���S��v�h��l1�m3�����I���$����h�j���e^��.������f���1lf�$��4��D�N��fuqi�D:��������>u��������B��L�JmE���0]f�����p3=�	����m}�����kS&��$P�������-k��>�iPf��<�������9!�RR�"�&�'���4��a������0��y�(��=������
dK�������c�l�R9S�B[�����%�>5��a�����&a���!20"�N�sa2)'�?����H.j����a����Hs����KJX@��E�a����f���*�����(�]f"�#�3�-�V���(�lOhR������;e����tf�japx�0ph`Q����"��!Je�k�����%WR��Y�[�F��GF�E�3���\�vJ��� ]�m`-a_u$[lUn[
��s0�#�r^f��X'��9��:{�i��gO	�:o�[����F����l�����6��f_)�qml�0��8���w���nC��_���b1������(P�l���V��s��J�N ��������M���9�m �������3�"�m	td�_Cl3p��t�.i���sp]��"\�����O�J��a��5w��:����.�4�h����sY:��H��h&��I��j���*EM�?��
O^�Q������Nm�\��x�r�,��o��M������}.��A�_x=�of�Gw�{�c�2������f�1����JL�V���_c�Z�?P���s�V�5^�d�B����t������u������Z����}w��������� ����4�K;����K��
��N�xM���@B�3`
X����A!g�.#5�./���Z1�p��8�?�Y�s�}���9�z�������koD�b��5�'�\B��#�P�IITX7��S��1���������l`�K������
�
�_�4����^�.%�_�=��W���s]�������`���7 3}R{<I���W����.=YC� �=��X2@�>o�As��P����h�R�U�$k;��{�F�����|�+�*E�����'���k���rPvf��C��H3�E�\ab���X����8	�	�;T\�Fx�,f��Oz'�A��]��t����I�B�	�z5�
0��&7����C�yF�'>"���=�w"?<a,�#�E�^#G-���������oF-����@��4DEw!W�� �f���R�|kc;��+���Ky{A�t;���~00�����FS����0pp�CEm/Vg#�T�;|�*z��7�n
�����Qr���F�UX~�E����+���jm���BC�va��~f��S~��f�dL~�aE@�����y@��Fj��]��{�:�	�"�\|���B��	����7w��N\yz�[���mY���JQtb�o)�
�P�XJE>�-�C%�g���-aA�e��gGU�������@9������Y<l`M<J�6N9Z#�w��:�u�3W��@�u8���^)z<p-B���jnK�o�2	0DP�V-��.��jJ�SJwc��*��bb�}�O��c�Z�h�d!�+�;�T��P���]9�{�}N��i<Q'���g8������"���ypr��,��j�%4�����>#�o R������W���8/���;@�D��Su��x�����C8�d��[�/��Cu��%Y`���lw*5~�������,b[S8i��>�hx���H��$,�F�<�[��x��`���3�����9=�������q{I~%��8�k�!\!ew�S�b[0RW�������Q4Q���A��Y������3�MM}�����y������'xw����������]���w,^��*��h�<#/C������?p�������?���U����t~<�������X�����8P�.��A��0��l��Gys����fk���e���M�L�_t�u����:��N�����O�UO{5�P��i���e�����Q�j=�F����8J�5��!��c8�SE�;�I����d�	�B���a-��{����z�K�� A�_�
N�#����|�A�L8�����()��0q���'F��:$�#���|H�n�[3�9�Of���mrP;n 71]�h�h<�{��������t��e�����KR�����J2����|��y��!��&
�L�t�����=��M���=�Pn�8�5���y�$�/N���7a%���nq"��Re`3�L�%��Y�Yi]:e*p�=����e��[>�}�0F-N���w	0����}��	���K��%�1S'%`�K|n��g��9�
�:}N�mL�����-����.�g�M1�f�r}�H�y���v���]��u8�A`�����#�H��[}�9y���j�����'u��V���8nl!<�.�$[�Y���������`�M5�f��W?�?��\*OE�����y#o�	��,f�������i���]`��f�Dw��R	�?+��6���CM���Xh�A��g������-�h79�`�
�������^p�!����0��������>R��_cwH\�'���%=0Ks$��M<�$���5��W,�m�����(Y�lU��!�8��|1��5���&��6��v��&��|O,�W����������dk��7�8cK)�H����B-���T���m���F�����<�l��
�?��Bg����8�&��:��0$tq��`��9�����r�8R���=�3D�����b�O��OI�/���CI�v�v�Fz����-��o���)M��U�E��L6�x�Noq������tb�d��~0�M_	qD�-o�Z���,6���_��a�;4��{���Pb��J�VK ��X���4,W��
�XEhJ��pm������V�
w�VQZm���R^E�U�"D���$ O�S�O�\�vRV�R���ujw�Z�J��|i�J?�sy�������Bca"��YQ
�ij����m�,{�H���n�������h�������.������=���UexzQ��Q���L�)��L%�[@3�������@da�Q%k��l���_8�q�.#u����c�S��&xH��5t���T�����44v�E��*�!1X*_�R��Tj��� �Wj>�����3�e���`F�b�L��hn4��dx��K����D��2�����[�D�m-n)���7��V�����]���vV`���e����(�6����:3F�}�������}���{��%�.y��*C*sy��Yxo�F�:�#���/�l�D�Y����)��#�+u5$w��~�R����q���]'z7)��GW���$tu��])=}��$C��EB��]�)���������I��z1|���P�t>S�)d������"q�nm���F8����<�d�-��C�L)F�(���7 �x79b;��dQq)����d������,�g��
Yws��d6g4cx�Lowk���[~��)`Fh�B�����b�%�1D4Qy��/��N}��h�^����WU}�</�K��<t6�A�W����RE�b%��/]�p_�e��S��~_��'g������@�K����������A���]�F���xa�4��}@���*��T,����|���_.����4^��z"d����)���1(w�c�C!������������o��Sa�����}X��I�vw����d��f�����.n��I�e`Yw[�?�R�Y��!8��nF�FR��Ia��5��'T�RN��{R�����()�,6�
5�����a2Z�'���D}T�c<�T$��1��ZTlZ-9����5vuC[!�����������Q��>���F��4�h<O��Oq��i@��Tb�p�)�/Q ��Vsg1���n� �/���#�1�(�J��c+~@8L�%�Q�b
�)_�%d���')`*�i��{@~3��m ���F�FZ�4���t68X_���!��E���-���
��������]h���"I���1��	�������u��PU����Z@!lC��]����|��-.l&�L��G�=�i�6a��fC�Q�{�@�>>�"���D' �34���Ysk�=l�4���������Y
�3�9����?9��G.�!M_0�3��	���9�S�
v0��3��n���Hm?2�=�H��`g����~7�1�=��DYm�������2Pob.t���X��ik8��(��^��-�)�;�J"��/��M�dj�:����(��jm�4���3.A�=CC�9�.?����wu��hj�gc!U2���X�'������AW���a��0������bel�|���X�J���,��t��9��e��<���>Y��V��E�^��k�cm+�6J��\yv�+�+�I�:��������9o)`%/�|������S�����!���
:.��
�e������Q��l�-��������%����������^�9}��V����2������]�k���� .����^�_^��u����Jf�K���F�RP4�����7���e���Vp\�KK{�����S�l1���A��M�����[���� U�\)���K*/�� ��@4���J��TQ��'�����1�p����/�w3Yf�_����Mr�B&i�H���F�
��������3�)c�V1"�$�����T��"jj��BHM[�z1K�%T1g�S�x�����t��!{�` �����L{;���&��6�	��
IHe� e����R@[�q�"��Yl���{����9�U���T��}�S�l�e1�p�}Ic�<�V�Z;�rM���jt5���m����.51
{��'7\c����^�%�J;���w�a�7T�7�d1U}R���1�����I	q16G��������p���K���on��IIW�S�~�I��()�����jA�&o������
��>�>=��^�e�\�Q�����w���!< "�x����.Z���8�x+���������i�k�=�yi�^��p��aW���b2���e�>�Z���V��hD���(�)&�\y���;�Z�
�,��y<a���A�`������7������=����������T\�o.������:[f��
BD�+��Y����7��W���UR��Y�(�^��|A<.�����0�utfC�<�+����
��Q����
��[�]�E�����5��H~�c�E���W1e���=
�?����d�"�R��D��2�-T3`P
��G����Q�LA'��w^j�v�[�%p"5o�� 0����|���������y�������G��_��d\k����w
�+�J�����`pqrpVR��������A�wp���<�c�W�B�~���#$f����^B�V��m���M<z���^0��	����4���ZA������e�kU�o��f�����c�(�Bi�Z����&��u �oN}��!"�����|���-�+A�%j�tg?��/�E������?��6�I4��gW}���zj��L6|�-�f������z+����������������������z��[��%�-�6��B��ldH��X�3����}��TuT,�(�Q�f�r2�(�G��AC7�Y��4��$�0,�Z5���/ ��G�gW?��{?����h���"x���=^r�I�>�]�3�����h�6Z�pg���n��e����
..E�	xM���"�`1�\�#��3N�������/Z���b�Ot���-�����i��>r���G�!�^�
������_���F�����q�y������"�{�a���+H���}�7�xH�sg{6��7<jom��
U��j�4��f{����?A����o�>UW���,))��
�%�i0���/��`0`�S�*�!���4ACX���o����@�z��AFH�kL&��	'�G���Xm�Q�C�	����!O��5B����pR_L��IxC��\�'N4M�~��g����S
qp�'R������H;>4���@�����K��$�>�����a��Y��
���B��8��-#�M������h�v��VX�G��J1�[����cR���Eo��-��HM���^�T��x.�{$3���]j
�"_m���^8��������<�F��?AoHU��h�B�H��Z��u�!��R]�]����8�4�V\RC+%��	xu?lF�&�[n��m������%�j
�e�L�7�p���%��,I���
�p���q<37[��[-E��g�1�j��;"��.D+��8��%�}=
0_2��!�r������U}��}��Z(r��k���I'�/a�����2�]_���n�qzE|�!#V����.�c�'�wC4�	bR&��>�?����������#���#����s%w���,0��	e��*��8�<�}��7���	�����<d9����{�.��w��"��daRSK3�Xu���M�#@���������)q7��o*�����<P$���Q<��ua�=�����6�c���4e���3n8�[�p�����,�"'>.����&6�o"!z$�)�>PU_��k<r�Xw�:|V�Y`n��j���[��`�H���1�3���)���a��'I�U�W*~�O�gb~�f�Hos%�#>���j�����3�
	k�z�5��j�E�%E����:��S�#J`�l�������@����u��~�c%7"���G����t�S����K[���=a�pzd9I����D2�f�X�[��Z{���FPAQ�����?��*o��[X�|����k�Z���dH���rU���	�Q�<�V�dyn�F��K3Q�����
��W�\�epH���� o<�qa?�MH��jC�'�0J�(Bc1������	�h��	�,�Q���U�������B(q'�.�~�R�R�m�!�� #���p~ir�2�|f�l�]om���^�Q�$���U�=G
�WQc*)b��4F����������;��,��`R�Y�Ym���z>N�D����,��fJ�{2Pj��KL���2;h� �uk�]�xQ�[���R�n�|������yn�K1�V�w���;B�t�@�6��
����F$�	�������d���u@_Y�UG����-�`n!���n���DQ`��-�����_J��`]]v���G�?��{��=9}p��FjL'�{Xu�}z���[����!���?����{�9���9V�t��/m��u��@����i���Ul��$��Z`u����+�Q�^�_�M*��Di����p��=���a��bZ���8��7�?�����k�D�L�M�I�ob���F(��%��d�����W*D����l
�'=?��e��x������P�G�o)t�~)��J�J��I'G�H��� ��S��p��0��@���j��@S�Q@���:W0j{���QIj����<���y����)Q�g
�/T�	�4����R��'�GCr�Zp=4�5T7� q�{4\`�Q��'@�x���VP�!�,�yM� e#)����X��?�h�1g��4�� _\Z�+����E#5����� ��Ck���o�C��se�q/���	��G�d��zR���0�Y�G���y�"�����S�;<N�`p���E�)M�S-KKzO����@���F
1�Z/��'�_G)�5�)�S���K�:
�+�#F�m�l�L�xJ��1�/D��V-�����8j.��y���,���TU���rA-+���5�sW:!���}������t3����I"e6������G�����k���k������r�^p�X�^.�=����%�������pSEqC�C�:e}��E���>�O�����P!��bZE��7���>+�6T�L�+�j�7��L��Zx�N�����X-5��a�C��?L���V���y#�
��g���.L�o�	l�>U�A�F�l17g�m���3�_E�R]cz$`�A>��%�v�S�C���(xe�H������e��h7�,�i���l�(c<���#?dSu������@����lw@��������v��<��2�������,4��M���m�O|����h6�$ �w��;g��x���ZtT��w�A�?)y>���GF��;F*��A� ?Z���	�����c�O�����5�Y�R�fA�>\p��N��J�q�a�tO�+k�����;���?����T��=�#c4 �TgW�[SU�@]�HK���?3����C�Y�3�+��
cKCHU ���I�>�� QAGg��H��!�#l9���Z��K�x�5�f#U���!���D�]���1���bB�\�!%�)_u����M(�|LF���}v3����������<2�L:d�������D�	�F�����g������L		8�P\`s�Y��@�>������;�5�A�������*���u*�0�&nU	b�����:	�����_u�B��]�e
�|����� ����v
��mEyk��(I�m�ax�������
M8��h?�����t�\�l��iL"���	�����G��
��������z	z�����]���R0=y#Fu�}�>���qL���B����c�h^fU�*�*�q�H
	�F0�{���+����9B���
X��Lz��d�����~��wj�����N��X�T��a<��?�� 3����B�vs�,�<�����kd5|6��H�S'5�X���M�-v�x�<GdHa��w����q^UT��Y#������T-���������f[�m���4f��E�?Ij�$����[Pi��wd
�`�Hv��?9KC�(�}����Pq�$ D�A%j�>����X�GZ)xNk��P�D!=�J�]���1�P>��l������1�� �N��C�U��;�k�{v�z)�
�����j�6���&Bg����F��=��;�
��%�-��B�LC��8�Q�`��)�Ef&?���9M��|����S�o3k(��>��[�i%n>�^`����5����v���U��g",C�#�Y?��B"�8��M�a���l�	X1����Q��^v�1P��`�.��8}?�bR���J?��X`� ���y�o@=��)���,B���e�����G��?�Tw�J��
����R��e��J�K8�]����k�j��m@��(�k��EU?��t�;�������Ug��[��Cg�c�89>=��m]_]X �%�.�vw��9;�d@�>�x�so�vj-�������/�{bj��)0/�e�����i
�(�u�Iw����,�,o���J�5���*'%���s��x�����������[�l?�tL���e�s�ke�r��l�;�.�l�
�L�tQ�A���p:����}����l�1nlo���Z��]v����`K���}�Iu��)��C,z5���x �u88��K?An����Kg���KKf����etc"�X�&�����dg�C���A:�����o�������~j��t����t�T����(/$.���K��
4���6�[���[�c��F5_�\�id�=;��,0)�R�Q.���h��>Zw,�������,�Y�"�X�g�pp(4~��B/���F��n}���Y����l�h���������_�����Ua���'{4[_'�'�_h���TdI��[�0-L��u��%H���m��G>o$��t�Kt����.(m-8:�M
�2XF�NR-�	�f�Q���"m"fJ�y����j_�G��1�	�2�x����Dnm�5
)/���!n"NTpB�<�9k/ ���F�%��������y����	�������������"�Bh�Z��9��Z);h�E����{���7�BY;r�b]�.Z�~NV������CG�n*������.NW�=��k�W0P>�{PAI�|��KK���YA,����v,=H-qO�E�>{�;;�^{-^pd@����J�E/>X�1)����J:��������6p�����1rV/Ru����u���n������
8Yi[���31���3�}$����Nc���o'��V���5��o�;���p�v��s���Z��e�}9gJ��C��@���UwQ��H�m��[��_����CJZ����o�.f�%`B��u\�����vT-�5�w��" Y��1�7Q5U�"�SR�o�Rj��e[E���������E<�a���[}uz���-`}Sc@�R5����>��>]m��_c��=�����Q����nzq�0�P�k�c�%:��]+���.������M��	�b����g{J��pp����;�3j'��8�Ct��^����
��('?�����r�W�����l����9��r�U��(Y�a�l����r��Xg�/��o�y@����}g�)�8jw��ri�T�[BC�}%�d�������9�
��B��
�����V����=�!���k]S�6� �v>�s��	kn�[���~�������/�Z0LXR��hLV[oY&���=���!m-*�
�	���P#�PT��/k$��dn�i2w��Lm����V��I.��ko���=�}���E�WY�O
IL?\�1<%)	gD�?�$G�����z2�[b�c>�W)��v�jI3E�e'��2f$�k4���V3�F���A���d1-S�f?����xU�h��.^���nq���g� x�!�-Vw���yK�j�B$ap�L�gh��������v��l�� �
bG���$x�q�J���Ha�&�������hn���)O�O���d�!��0����8��:>��]�`���3��K"�@������T�Mt�	��)(���4������J;��(�;)��&��KFQ����8�x��%��KI�pt)�� fc;*P3�x'�u)}�'�b��@��uQ�!Qx=���)YZr����rWkp!���6����{���Z���Z�Ul�p��&��L������M�eW#Y9Q���T0��>��@r��{)e ��% ����K��^�AO�s����-}��9��3p�KQN��@�yN�s��']E�88���!tG-G/�#���.^�����E���+����IUq�����}?d�(���"���M�M��)vz�3C��C�I�u=�K%��CgIe���b��	'���4Z5�2-�d`�[s�(s	\3�b`q/!_�����+{�M�Z7�TQ�-�����!�L���.^�[RT�yi$ML�B�NW��1���4���g$RC����`���t0�:!�
"��,���P}R���P?���p������G�O����N/`��1�0�b�3.~�5���N@�y8��$C�� fI����h0:�[��)�h9��5��8��+�Q���/��Gr%������JYe���T���W���2���������]bBg8y'��{�����0�����4�!M�R�B��p7Gc��'sT�SJ@t��B��Y��~��>9�!�nS���b9L��sV���I-O�
��U���{����L}S��o*dazP?b��J�E�����	�	�����s
�L��In�5\�{
�v�x�.����qq�pq|����A�[�P�A�������fV�
+�Q�u����v@4j�r�d�������L�f���i��

5��pv�)�/?����������ss}��Si��g�R��	k5�H�y���^��h����i�������!���	{+2T\.���'vV����N<t!nP���.��p��:zs���r\^��{�9}��w���V|%z�q^-�H��L��t�D0�*�*������Q8_��s.���v1��(d��'���("/���@r�,p��jQ��i1M@�P����TrZH�l.Ki�b�:��3|�y�T�+-�-��%�*��fF'B,����J�,<�Y}
��
�!�7-.YB�/U��!Qc���)-�y����W����V��Y�jNz�{b����sR|�#�%�Q;�@���\�lV����y%���(R���b�'�����OT��+�1�P��>��<|���+Y�Z
�@�:��;��=�c�k��8	��cm*d�3���p��c����Y�O�G:hZ�b���I`�=HdN��^���H2�7-�'UJ��hw��*�q��m4?SS
=��^���rW��q�W@��:u�%<��,�����.�����Qs��C$*���@(l?��0�Q�#+FY�q
�!�C.6��e�Z<�����j�QW!�\�0�w:q��]�6y����u��dn��������\F�����O?�|�]��9z���%�q�=O&,����I�2�Q����+�my�\M���0��5�R���r�����8��F]�b�
[�S�W��/����ep���V�U��Q-���+N
��x$������8N���`��(1.��c�cG;����@h� �z�!���:��&1���)cE#��	���������tx��h�`agM������M���&��c���TV���[�9I�������\T��w������!��$�3����3!�K���u<f���eG$#�e�-]X��B�^3�@@��]��
��w!��������@��X!	�g��!!CU-�%����[��&O�
��������PI*�5hLI"L��1H�8K�Q#�����}��O��PR����!��WY�������nT0l��4�T�-�-����N�O��Fa��
f��,�Y!*�4�xb�6�j�p��s��7���I�\�d�����h��d+/��0
���F��r�l}�O �3s�tHZ{�*:#d�
U:�$-��T��cl:��\���E<�$���a�0g\�c���j_CM�7����G��B4�#�r�����(��^o{>j���������}������0���H�n�8��^�qx/��y�>�p"p��'�y@�b��,�)jf&�|8D�\@'lZ�=a��Jk��w�~���\}�;����(�R@_�&u�H�����R�q3|�O9��+�����Z�"2�_���$��|��)d������3�3��Q.��E�����|)�`�vFc��v{<�UQvVp��1�F���:��J,�����p1�0���0�-v���nu������ip�u�}E�u��M�2@<@v��el���`2r���
l����3�'�����^�9�0~�_e��`%d�<r�^�(�����1-N��o/��'8I������)F�{��[�n�\�d�����&+�EA�#@v�V(Y%\��r@��u���}����h4f��HE��W�!���az�+]����Z9���=xe
h�<k�_��t�T��g_�k����S�^�=_A���1��?���B����`mL/�����5j�m4gMW���������s	��z=�U�L��c�����Re�
|t_~���2��M)�a���(jFq:���Dc�z�)���^~����pU��&�;D�����\��V|R�����!f|c1�A��'�����X'��������:�C�����e���H^�IW���f������{�������hl�a���cN���%[\aS05�������$�����>��������Y<�K�K�p=�''P�����2�����)��H0���MId�wW-�6������g��%$�l�p�s�a`�\������r�q�T�=%�n��+Z"'���i������v�����������J�'0��+�]�3������0�\
��|R���)C�*����T���f*��8�!J�����)D��$C��khxE<��&�	�����+T��W������V��h��[����r6���9����IR�M���Q`|J����oE���q�!�
�%�-;"t�c�@c�|��r)&�#�bY�3��at8x-����	��=��1���z�RL�}B���M�d4,(vo�,F�L��A� ���N��������=D ��)8>��4R�M;�B�p|��"G��4��&5�K���3�{d�)�R��n�����,��������Wn���WV�������L����������o0
��A���9��{��x>��{��e�S�=���}����}���0�	cvW�U�/��*�(�]�����A��:����
�\h
��k9
a��J�Q����R>_���/�/�I�N�"�0�=�;���7-���g���H��_��<�����z�����n�'�������,(�*\��zC��JK�nPb���2������8�DN4�=�`����	�,���
��M60D,���of�Gw�V_���m���p��k4���Z��r�������K��7�����w�(�g�B^��� ��K!~]=�������]�{���<����ueY��������I������_��������}o�����������o��Q���a�l���'�������3��u��L����@� d%a{�[��3U��7��v1^__+���<��!B�(������N��j���n�������F�:|�����(��a	�1�����������W���A��������X�%o������� %2��P������Y9�����v�U�I��E�w��h�6+�h����_Zv����Vkk�[�_�%��������/���*��BS�\������������.��fH �<�U�D�Yo�f�����
���
��+&����A-�
�"	��>�9I����\�!�e_���B�6/7�p�4L� J��(	3�f:V5��e4�MV|k�+���&�|����.J~B���qS��zG��ot�?Z+h��6���S�3j�����	d������!*/�$��	���"�.t1��i���E�y��7u}:�8`�|
y�{	\�e����5���o�"����P�a�dL��:(M]W�`t�#
0y}�U����\@3�=����� �QS�����\[p�H���P�������,m�6������F~��fGv �q@�FW������?��w���D���������?K�%3���t�p?r�(r�	c���f0E�����EB��cX�����A������4E�6�g;R��������x�2v|~���������A���t�G5��<��9p����#���:���aH���#���1�D�������������O��!*�����2b#��Z��;4	'>��px9;%Gj)wG�^������+����1���I�' ���5�
����Fz�`
@�d�S��{��@*q�g
pA�����K�!�=@��������\NW�
��@%�	B��>)��m�������=c`)�.�8��S�z#��&�����3���3�\�o����F�����t	�l��Fj�O�b.G�o~w0$4 �Hl����{& ����DP�G@��a|�4v
�s�Ls���m��HY�Mc}���?7('��9WPLQ��B���C��&:_$y���<�Vy���J�L���+M!����G!VG��m;��T}�6��S�,�P�1*������H���"5�DD�����v_��L��`����L3�G>��
e(d���;�<�2:_�cX�/R��������x��S2y�SF+�E���B���ed@NK��o������
g�3�_�[5M����q
�I�R�5	u��� "=� ���]���9�[0� p���%�������s�@��X	�p�
ie*|�+��$�����[�'O�����������l��
������u�ps;6�D6�Q�8�gC+��$X!�B��t���Q���@�jE\�2-9W;�GP�d�j����"����j/0}�Q5��t�]�>�M�D#�	���I�����L=���l����;�D�x�����(��3
��T�)���P�
[�!`{�:��0�eW���!U��TH���L�x������%��
/i�|��E5��lOu�J�jI��z��d��E�3��/|s�X��yq��"�����E�1v�F�8��zT5�.�(�Pd�!\�`\aC������%�m���Ff��!�{EW%�b���W�%����O�^�JF�^R%+��Q��PV���>�mNn��O�
?t�|��}�au2�S����`F(I�B��M���w�W&r��#��j*6.�fo�@��'���BA-n��jl��1��%�|HR�a�7%z�_sW[��Z�b
k'���;�}[q�+��B������������om�����2�+z�D�!9TK���o���0%W��FHv�!��^_�5��T�p����pa�u ���?Y` l����+�l��0,�jN�hJQ�0;���4��Ki�^�g������A���v4OJ�������&N�����;�(���� ���8�����<��=�sA7};B�M����\���gA���8�U���$i�z���������
������r�n2�tV��q�H�����H������o�������'�Tc�o�))�J"z\<�B�r@/�
����t��F$��������3�N&��-��}`5�H������#
x����\�����M��>�Hh��!K�3-FLt�*`�LAS��Drz{�|�9b��&�i�B������D���z���mA'����,�,F�S�#	�@	c�}�&w�=�:y�(��C<
��V�)l�HkMn��)�(	�#�+���E< `�"���#s���a���E�o��f����������#w��`k���K[4h���w/������2x3#��L�����f��������T�qzS����-��O`��5�*{��3��
���������j�U������;����s�(�`�,[�G.�7������tyI�Hkr+����J�����4F��������#�,a�e�����!-=0����Yb�v������m~���AYi��v����Lq����Fd�I2"����`b��Qd�&�1�8��x���2+lf��������s"����
��6��6�����L�%E���Z�G*Wz$����W �;����t =�HJ����^�!N���q6�"]'�1�jrn��8A�|���)k�X�-��Ap$��A^��Vz���!EF�p[����+O��7����s�!���GeH�x�}bG��a$F��c���E�%���{������5��������������_�������[�/�x��u��.Cmr���?���q/�����I����h���|
�5�D�
fw
���qR/�Kt�xP�������|4�)k��`!F�y�(�=���1o�O��h5����z-�M�|���:P�59�Vn����8�b�(
�������2(����Z���A{qv��0;�8+��#g2��4�^�,�/j�
uF�����8-��$e��
����#m�/x����<!_�^Xk�e���O\����s%�[��U��}��s���}���}=C;��Tx��d5�+�7xBQL36�.�&��}�#b�k������Z����Tr����S������;�Z|�]�y�w.Yy���&z�h��&�����,��xLk*��+
����"��#F�#<n�G	�X��2��]��q�Lfp� yP=)w9\���%�3{������3��[�J����iB�-�yN�F��i��IH��1�%�"���}']cW
�`�ui��/���U����j�6�X��@�N���`>�7��������.kfEl����6�d[B�9�5g�u�m�q��k�zDc&;P�d��m�J&���1���e�D��NB0O�Fe��d��)�GF���#��N�%�x�=�o��m@p\���@�$���E^s$c���d��F#no
�����t��0�&���f����Mc����b3�{<H1ZY�%+.
�{���������������[�$v�S���w.��=�:�)
T��C<�������u��U��{�ys|�9z|��FS�S����b+E�����~;Z
�S�+0uk���K@+.5���K]i�8K��!���
����Fg��,��[��V��4�SQ�I�S�e�����8�g�.Z��_)�����!����
��0��i������
�w���C���?����.����q*o�\?�
~���=�:/*� �?��=�~�����~�qx�'xu��S���93���n���d��*����������������@��?�6v�7���O����=�e����CM��l(�G���,, A�`~�4bQB2�-���[�1MG���l�e�3geK�t@�;6C��y��B�.����,&:�0���O��Ze����Pk���D���T�sh2�,�������V�����A�)��FM�W
D��fm7XW�mm"4v�c���}���F�����w�z��n��W5a�-��fp�MY��?�������%��d#�O~��>�V7��������g����+�@q�%��d�?b
R�!+�>�	"v�n�56�3?w���s����DXUx�E��|iI���k�<'8c�u����Z�yY�-\FjF�����V����Ct�^���j�^�	7��&a����,�j����:��eNq��X0M�@���"ZD�6��ZV����;��J�D�c������	�2���C���!{ X�L�L�C%��|�������B.������8��V%��&��z���E/H���f�.�}�*�Y84������7�:������
��H�u�X�j�i����c��N�T���sDy�B5Yk�]��Z�>��+L5��0Q�~�������F�D���	�#���8���'�����,�f�����k`��x����;�|�����+3�	V�[���z���ZI��	gO��C��w��``e��p���x������������i*�vI��\q*3��Ya484p����*�fu�(Z��j�����+�=���P���u�#�	1��M��6�H���YvK�&i��@�g�1�GOf)��C����h��������Jm7r�����>Q5M�[�k��w�f�Ta�PW.o�F�=W^q�I4��X�����7�,�%
8o3=��E���3
���!���������[:'�mD���#U�7]�
��$2l�+��Lv����M~4d����*'���Vu�d���P�F��G��6

[���b�����_M�l���u���(�iw������o�@����������$�����<F�J0���VM]��w�6k���L�������'��8��-,����IQ��]�/�����(���a*�hP6�u.��)%OM9=�� ���<P���C9@�Q��	���h8������2�����^�����}w�{2����0�u9��[z4h��4F��]�*6f2__t�Z�S�ih��
7)�3����^P�a2t�vk������j����)���ao��Pj�ru�1�?a9��l�,n�I�f��Y��1��( froT��$��Vb#�����E�d}B���^Md�C�Dl��h��^M��@�c�H�3�{�c�F��9SK����Q���e�3�^��(C�^�`�h��O&���^���@-�,�-��������~2�t4Z�D���������XB�H����6���n��x
#���k}��y��PT�r�I��L������"v2	�3���DQ(�
;S��M*���3|l������X���8|�P���GR�q"���ar^���d#,��t^y)g�q��BwY�W��(\L�A���&�p>FL�����x(#��]x_�w��v���� �6�}�q�'�����6�:�����������t�roh+�+���p�A���������������f%�
�t.�>��I H<}��������{�\���u���J�����"�U��BVTp�g?S++��2�3Xw�b#�Or��Qj�#�6�b6?h*����-P��d������4S��L�>��&Yu*� el$1����['�xx/�����=�doE�K8,n�^�Q�	�5%+�����pp|������t�!BP��Zs����h+0�
���I����J�v�����
��9��������4���
�#N9��V
X<B
�A��
��V���[8^���������n=���%������s�)��oe�L'�����3����/X���!o��(�L�bW�l�W������f���z�4�a�p���!0����(�<;P��|9L�C:T>;��0����s)�����`��6djG�d< |�:E.�D���n���(IUD=���#��pNQ��,��G���o#��N���]���d��\�*��Z��W�4�#<����Rw�3������1����A��S�	����_�:���XBb����nR4uY�u�~bk�%��$&iTJM�z�XL�������J��,���O)}�;����$0�D���\���v�bw�e�VTe�O�5n�o#�R3`��u���E@���k,�r���?�������2��BT\{L��x�;��,j��Bp��@5�V�4��p���7��_�A�{z|��Tu�\�T0�M���zwx�W���Z{+Xo57�j[_|�����hN|�`2��6��n�_���P����H��^���������DTNk:�z�QMQ?�z@b���/����QL����wZL���Vv�*\��x�H����5�xp���JU��Q��Z��H��������%�������y�����Z}�H�ny�`������Tq���Yt����H�9a�1��IZ���y"�a���
u�����_O�NZ	�q���'�)�QoH1/}+Z~e�|�
�N�%�R����1��w�]����;�F~���,��u�����(�%��Z���=
���i
hw��������Ym&����]_�@�j��[kA@s�ky�������7��]u>t./�_\_U��*��"�
�b����h������HCJ`��"������#���O����CG��:G�����w����Es�������X)@^g�/�RIt�5-��2��$I�_rP'���KGA���OX"{��7�����B����m�QJ�*T�	)/-���$xn�T�#��	to�^&���<W�R��X�\!����G���V���q7�������T/�hG����
-]s����+��R~m�v�W��U��-^r{��	���_[s��������chj5�;���b���^ms��Y d���"T���$�>��(����wC(�d;���n�jf>F3jJ5D��^�6���]����G`���)��U�������vb�����h�9�HG�)Nn��sR��$��/X*�R�R�a�}HjywQ�B�I�q}�DN��i�s���#����o���x������Xw���VZf�h��+\�{���q4��SPQ�]��Vlkf���S6VY��;���E0�SU=�yu�i�?dj���c��6-�a��������e���PE��B��{3�S_����>���7��U`���8�7��nR���@�m�U���j���#��X��6�����Z��oB��!{>�>�P���E"yk��X��e��A%v��}�,�k�7S���
����T��P�����h~���C�@�X���>���3��he���u��q4�`!�W��9y���/�Az�����������/{���Qcy����L.�LA;�r�������?�C- �y�-���o2E����
@�
^"����Ck���
>D0����S���;�W�����R��m��������Rw����{c{��y���_z�P3�1HQ�&�tR�	{uf�_�����/Q�!�i�J*��2��iC�������{��{����B\��E�S��V0��|!�-GwA�''i�}}�/R����y�3�@��Qa�	�Cx��,4����$�om���&<l�_���u��;oy#������T�=��#>@���^�����W=g����o5��3\����>G�������N&�s���&����qH&���oaG����Li�E}�mMs��9����qQ7�1v
YP�������X�����CJ��k���q��������M8���eGx�������Y�
8V�-��r�\����/��n����������b��=\�����"�^��B��I*��27�����'��gU=�u%�����%�� {��������I��Z6���%.��6����N�e��s}�nf<->�kc�������%�7�yT���(����JL��#Ga���r=(l$��D�
��W��ps�=Tw�$��6�
VY�B�����\����b������&�RAP�a.��������d��i��� c�{H�NWY�YX���<��:-��u�%*��5"&�fI\���uR2;��]Ag�����c�����E�vR���7K�w��*�3�T�,����}"�����"��@���Wa�g:��I������_����{�����#�������,�t�r}]�Q��u4t_������������=��$ �e�K�1��m�14X�J3v��K�6E~U��o ^������H���%�]Ra`D\����M��%�_���tY�H���g�~����k,����qf{����iF����e�v��D����Wh�.�'�@�h
��"Q�)Nt��:�b�������|�����g���k7�M������a>���x����w�Nc����
-�x��(��	i=��4�� ���O�d�r��~���/Uz��E�Pd��B��$?���6\"�;+�E�&[�0��S�sOIg��>u�pDj�FS%�0����Z5l8����a���Q���Y�����r�@x�8�� h���=�)Q&��OjEp(��&�N���67������NT'���:�+���s���?LP�l��9wL������#�A��bb'�w���|d��n����6��w+cy���:����Z�
�vG��/�
��&����-f5_��(u=6;b~��"G��^y�2��.�_�7����1h���h�)�1����	�'�$��B��K�;dQ
'���Y-�A.�zy�i���n�z{����a�p�
O���������nW�/���$,�~��)j=KQ���(�����������������T��{I���T�D��b�ZP�L���8^[M�����z��|t���gB>��
����V��&����MQF�����*wLG?=��:����1������H/u(�n�Wq��DZ�_�������W�:���&�t+����D�n���r;+��������yL�\����9����#����Y��fSR>��`d�������.;W��O��xS��"3�jo��{�dO����69�f��j���1e��D3�Dj��!dH�eB>jVJ��ST��H��]e8I��{���=���+`iR#����
����u��~>Z���I�L���f������MJ�s�M��}�_8�7��e���,�;���m��71T<�N���X��~~���ksq������J"����S�J.Y�B�9��g�����������r�_*�@��h+ln�4���`����_ng�_.���R�����;��~8��.����6/������\��T���_����+>�(N6��o5I�d�8rw%�(A��y����GFm/r6��� �����l��`/�W�G��g����?�^�^ `({���%��h(�����8����fm�qu�"�Ni,Q�g�O�"�����8<a���&�3A�j�B�znu-���������;v@�)��K�W��K���d,Fd)��YI��8���kH�;��(���:�d�7�[��j1! ��D���ED1�T� 7UdRrL������T�8cz���
H�3��$f�$�>a3�L�������.hjKt;��3������c�8����8B�@���H�V�a����V�T����S��#0��'��t�t�s<���P�E*�����d��3`������h)���
-�
ka������s��FU$Do'�K?�)���Y4
�$�O ��,���&AO�2�$s��h8'��[k88u������4c��@��$�n�s�1�\��A<(@���S+�����{���sYE6�f�R�'hC����r���A~t�8kP�j��_vj����W-P�q�zx�N�'��.���^�i��<�K����]���-<2��~��5�PQ������r���[4_���K'4'����Y���cx/2�cV�����.����h��)���fw����������\���|
��H�,K�{�^[�16k���EQ��(G�$���Y�w��c.r�"�l�D�M��73���)��	4F�:?|�9��z����"����O��?�6C����P(�	��@T�
Zm����*Rn1���e�%�YA�$A��	
�(����L���V��Q��3G�����$3V�U�^F���Orz�q3Z�Y�+h�+�
�|�ep�<��^q6�L�i$d��e7�����������'�
:]��F�m������}���.�>1_��f6�[�������i�/T���@	Z��Uf�C.�;Y�Z�O�m�
�v���W�/M�\�R����*�I��=�.G�r���JB�h[N[��e��\rCI,\�
����j�kER@E����D%+B�9zt^z=K?�S��Q��F�M�L�n.
g���f������(�=VbgJd�Z">��C�%\;e����B�8���QF2�����`SF<���)���1�����Z/Q7������y�Ux�/<�����N��3�[&�J���\`�S��k.��5a�Z5����r�����R5&R5*�F�'U%uK��]�r�;Z�(1�vUl.*t5�����pd���� ��%��Aa���-�UOx��V6��W�&���.Zcv�x�����v����V�����m
:)��-d�R
���Z�Q�pm0a������_^��VV��h��*!\r�Y���U6K���������h6Y��W���(s��o������2Q�Y��J(`����E�p����~aOSS�q��eS
��JL!���I��L�.$:/[�UC1�S���T3'�vyWdQ&p�����,��c�M�����6G2��Ey3�zU��'k�U������*e��m�f��Fx��"����
q����^`�@yAN1T'�}�
�`���i�3�-��/��Zb/I~��o$,�O�pZ�9@o�Z���
��D�����Z�tT�	����Yz?�~����J�U���6��3��xQ���XN�e���c�ZK�H�{����.��&������y�3d���<����x���;���Qp;(�G����1���U0f�u��%���P����f`��8�k�teB�C���`��Q�K�|�*�g����[��i;x��z
�
��!��:GZ'�K�?�w"�6�1G:���+��Q�63����3���(&��j������`����KHZA��JY���
���i�w��:k�y�TE��u�������l��SL��\��9��
��q6�1H�
zy��)��mC�L�����&}�4c��
B)���zH�*����nO`�SW4�6a��H���\!��.2[,�	��	ur;��Q�P�T:
��o�~`;�0f�M��@f0�!�p����q�at����JO��1^�O/�����!q��PM{�X�vc���X���6����EP��U�E��I�\
'�?��Q^�XE�����1������h�6P�~J���/n���x��W��Y^������JY�"s�^V���,���g)�w/��������y_\���pj^ u��A)���$N,u��jgDb���u����V������yX�X6S��+�$�L�%�ME���;��LK��KQ�Q"���~���:`_����|�=�N�X����k��,���m4�*���a��G��^�������p�,&*'��l���4"�E�D��G���.��"/H�)����04���V��R>u]`"�Z-����go�^���;
g8��5P�����U�5��X��Q��3�X���g������.���L����t�VbN�
���L
XF!���q��q��2@�
}:O�R�V�z�Z�_R�\�����A8��`E�|�n�s�XR�R��OlQ��
�&��z��������F����p���K��=m�A�t�G9���%H��@��~T���i�L��.:��|UQ����#$+4��0�n�C	�����Nh�Z���}�]�b������_�Tvl�E{O._	3b)�A,��{
I]&����t:��X��I�;T�
8�G�J��$$���n�p�4`P\p�Z�]���r�3J����
�)o1P�&}�����W�qs,�-����3�J���A���v����f$��r*���>���t9�6��1L�w�K6a\K9�
�r5F\K`M���&��F��{,}'NNd�5bb����M����[�6�xs�()��<��%�M�
Y
�q���!u���+<"���0K�'�[!���5"��Y�D�rWwT��e�=u��=�=��me.C�^�`�f|Q\������'�Q��l!S���#��8�`�@������M���!B7*��7W����X.)W����aSc!Y�c�7�1����1G�r��i4�s�!jG c��ya�#�X��By���F�����2���_�2�k�,��6�e]>�*��2���*9�I���9R[i��>U���0 ��������$�����X�����BB%��/T<��`��^���k������<f������a�E�\�\�����������q��f�Kr��������g;��L�(����h=Ca�P
$4�^r����x�6��K�Y`=3{M����1N����M�<A���R�@�g�����[��)����bN���T"	"K�i
�3��7���x��vi����+����vDg6�7Yrj�$Y�]������E���L�mj$��W'W�3N�T�k�#������i|W/fV�a?
�>�mx�Aktqy6]������6P�rq����i<}8�{�s83���1�3�K���o�4�M�H���	�N�u"��$��j�)������~� H��:G��g��`�! ���e����l#�k��T�zz-���v7����G���&��TF5��
��)����-T`xG�G��������k�_�����/^�K��<z=F,(��T����������3=���\��J������Dt��%u�������������e�v�����!D�=v�?;�%�n�{�Q�fJ����I�����"�����t-O3�_�:/�	���d�W��v����{?�^�+�T�R�����\�����rA :B#�4���������kw<�����k�z�^[,Y^���$�����as�_r1�v�s������������%�O)��oC������'jL����|��Gp�.A�&��u�8���"�L�=�����w��y��������8I7����bc�z_�
�h��v����>lV!�����W����A�:�����t�C�
���^P�����twI2��U��)�@H
�6��w� U�)5���/;Jzpjo��\f	i1����2�J�"J7��	�'�~��x�A�7�����t��L�~�s�����&�t��Y)�3D�X���?F��h���O7�r��oN��8*����Bi�X�u���;a��C���%��9`}�C0�9� ������[�:�m/E1��H���d�"�p;��"��v��R$M_��&H����m����K�c"$����J���*Q�W;��j����l[�c���w�`L����n}8�v����l�+H��kw[b�"������M%����a��1�N�q�*���ZC���u�D(>D�BQ����z=�(u�C��?a�~��s5hw�p�]O	jc�Y/xy�C5�#^'��de]1��|�U�z�>�J����l��{�@��;m�u���Q��c�Z������7�����;�D�#v����K�_Sz��d[jC%���J����}�{{5����;��~�z;QE�
`�[/����d FF�=x58?`�C�@��TM�iI?.��(!4�@H�1K��A�������~���!����\�"9��=`�8�!�r��e5�E4^'� �2�D�/g���U�W��:�j�%S����2�C�M<W����S���g?C_j5q�[1��N�Qb
����z ��K�a'�O �@�3��s��W�9]S�v��-"!�����`�P eH�����C�b�h���rq<���[+���]�����I������l�D0�
?SY�>J�07�-�2}|{�;A���0�Og���v��2$��B��&��Z�iAw�MN��y�pT���l���F!F�k�����8���-���k7��=wB=a��{��l4rY!�bU�IU���ML��$�mU��`��Jr;��yN���]�VV�����>��;�/$�~H�4*R��Lx�|iq������h��M�����Gw�f���9��0���DYr]y����+:i��:n������/`z���)�k��i���s@T�[6����r���I*�u�a��Bq����{37�@�Sy7cC)t�
w�d�������Z����D���l����.��K�Y�����
�y�ASwW\����'=w\U��������"\�����z�LbC��O4�M������$l�g��1��G���]��%��[
���%L���:D��)Q#��
/�q5���?�A�np~q=���6bC�7a5H�\��5Z�
��I������"��6�l�"A�B��o����Y1N������Q-OV�E����~�����N��_�,�!,���[����/��������%�$<_�� �.Jv����-�:NB�w.E��$���e��U����s�I��8 $LH��A���1���Y�o�k51����x��G��_�t}=����.���3;��^�^�liys�$4a�j�������d2����\a���2[����	e����lxzN�3����-i�A���3
z�����vm�B��jv+�*��qkz4m6����A�+�Ud�h��hFn7z��U�L�1N�g^
<�Q_D��l�d^���Xm0e��2�}O����L�X�����*^���YJ���D�R�'E���'�=�\8�,�U���,Q�����\���8<w�y;�n��f�����G~��S�su�
I�(�vs�r�B�.R�X�]����3/����F�K���y}��t�o&�����vx��qn�1��Q���;�Z�B
}������)����+�� f#d����tA��~#_u?�������x|;�v��f�6����=��y�T���x���c�G���-��3C������r�e�K������.j�Z��N����j��c�])i���<q�1�x�M����Y�/�S�g�g��cw����;v����cw����;v����cw����;v����cw����;���la�.

#51

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#50)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Mar 21, 2017 at 5:07 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

buffile.c should stop pretending to care about anything other than
temp files, IMV. 100% of all clients that want temporary files go
through buffile.c. 100% of all clients that want non-temp files (files
which are not marked FD_TEMPORARY) access fd.c directly, rather than
going through buffile.c.

I still need BufFile because I want buffering.

There are 3 separate characteristics enabled by flags with 'temporary'
in their name. I think we should consider separating the concerns by
splitting and renaming them:

1. Segmented BufFile behaviour. I propose renaming BufFile's isTemp
member to isSegmented, because that is what it really does. I want
that feature independently without getting confused about lifetimes.
Tested with small MAX_PHYSICAL_FILESIZE as you suggested.

I would have proposed to get rid of the isTemp field entirely. It is
always true with current usage, any only #ifdef NOT_USED code presumes
that it could be any other way. BufFile is all about temp files, which
ISTM should be formalized. The whole point of BufFile is to segment
fd.c temp file segments. Who would ever want to use BufFile without
that capability anyway?

2. The temp_file_limit system. Currently this applies to fd.c files
opened with FD_TEMPORARY. You're right that we shouldn't be able to
escape that sanity check on disk space just because we want to manage
disk file ownership differently. I propose that we create a new flag
FD_TEMP_FILE_LIMIT that can be set independentlyisTemp of the flags
controlling disk file lifetime. When working with SharedBufFileSet,
the limit applies to each backend in respect of files it created,
while it has them open. This seems a lot simpler than any
shared-temp-file-limit type scheme and is vaguely similar to the way
work_mem applies in each backend for parallel query.

I agree that that makes sense as a user-visible behavior of
temp_file_limit. This user-visible behavior is what I actually
implemented for parallel CREATE INDEX.

3. Delete-on-close/delete-at-end-of-xact. I don't want to use that
facility so I propose disconnecting it from the above. We c{ould
rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to
FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT.

This reliably unlink()s all files, albeit while relying on unlink()
ENOENT as a condition that terminates deletion of one particular
worker's BufFile's segments. However, because you effectively no
longer use resowner.c, ISTM that there is still a resource leak in
error paths. ResourceOwnerReleaseInternal() won't call FileClose() for
temp-ish files (that are not quite temp files in the current sense) in
the absence of no other place managing to do so, such as
BufFileClose(). How can you be sure that you'll actually close() the
FD itself (not vFD) within fd.c in the event of an error? Or Delete(),
which does some LRU maintenance for backend's local VfdCache?

If I follow the new code correctly, then it doesn't matter that you've
unlink()'d to take care of the more obvious resource management chore.
You can still have a reference leak like this, if I'm not mistaken,
because you still have backend local state (local VfdCache) that is
left totally decoupled with the new "shadow resource manager" for
shared BufFiles.

As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts?

A less serious issue I've also noticed is that you add palloc() calls,
implicitly using the current memory context, within buffile.c.
BufFileOpenTagged() has some, for example. However, there is a note
that we don't need to save the memory context when we open a BufFile
because we always repalloc(). That is no longer the case here.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Peter Geoghegan (#51)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Mar 21, 2017 at 7:18 PM, Peter Geoghegan <pg@bowt.ie> wrote:

As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts?

A less serious issue I've also noticed is that you add palloc() calls,
implicitly using the current memory context, within buffile.c.
BufFileOpenTagged() has some, for example. However, there is a note
that we don't need to save the memory context when we open a BufFile
because we always repalloc(). That is no longer the case here.

Similarly, I think that your new type of BufFile has no need to save
CurrentResourceOwner, because it won't ever actually be used. I
suppose that you should at least note this in comments.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Peter Geoghegan (#51)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

Hi,

Here is a new version addressing feedback from Peter and Andres.
Please see below.

On Wed, Mar 22, 2017 at 3:18 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Mar 21, 2017 at 5:07 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

buffile.c should stop pretending to care about anything other than
temp files, IMV. 100% of all clients that want temporary files go
through buffile.c. 100% of all clients that want non-temp files (files
which are not marked FD_TEMPORARY) access fd.c directly, rather than
going through buffile.c.

I still need BufFile because I want buffering.

There are 3 separate characteristics enabled by flags with 'temporary'
in their name. I think we should consider separating the concerns by
splitting and renaming them:

1. Segmented BufFile behaviour. I propose renaming BufFile's isTemp
member to isSegmented, because that is what it really does. I want
that feature independently without getting confused about lifetimes.
Tested with small MAX_PHYSICAL_FILESIZE as you suggested.

I would have proposed to get rid of the isTemp field entirely. It is
always true with current usage, any only #ifdef NOT_USED code presumes
that it could be any other way. BufFile is all about temp files, which
ISTM should be formalized. The whole point of BufFile is to segment
fd.c temp file segments. Who would ever want to use BufFile without
that capability anyway?

Yeah, it looks like you're probably right, but I guess others could
have uses for BufFile that we don't know about. It doesn't seem like
it hurts to leave the variable in existence.

2. The temp_file_limit system. Currently this applies to fd.c files
opened with FD_TEMPORARY. You're right that we shouldn't be able to
escape that sanity check on disk space just because we want to manage
disk file ownership differently. I propose that we create a new flag
FD_TEMP_FILE_LIMIT that can be set independentlyisTemp of the flags
controlling disk file lifetime. When working with SharedBufFileSet,
the limit applies to each backend in respect of files it created,
while it has them open. This seems a lot simpler than any
shared-temp-file-limit type scheme and is vaguely similar to the way
work_mem applies in each backend for parallel query.

I agree that that makes sense as a user-visible behavior of
temp_file_limit. This user-visible behavior is what I actually
implemented for parallel CREATE INDEX.

Ok, good.

3. Delete-on-close/delete-at-end-of-xact. I don't want to use that
facility so I propose disconnecting it from the above. We c{ould
rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to
FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT.

This reliably unlink()s all files, albeit while relying on unlink()
ENOENT as a condition that terminates deletion of one particular
worker's BufFile's segments. However, because you effectively no
longer use resowner.c, ISTM that there is still a resource leak in
error paths. ResourceOwnerReleaseInternal() won't call FileClose() for
temp-ish files (that are not quite temp files in the current sense) in
the absence of no other place managing to do so, such as
BufFileClose(). How can you be sure that you'll actually close() the
FD itself (not vFD) within fd.c in the event of an error? Or Delete(),
which does some LRU maintenance for backend's local VfdCache?

Yeah, I definitely need to use resowner.c. The only thing I want to
opt out of is automatic file deletion in that code path.

If I follow the new code correctly, then it doesn't matter that you've
unlink()'d to take care of the more obvious resource management chore.
You can still have a reference leak like this, if I'm not mistaken,
because you still have backend local state (local VfdCache) that is
left totally decoupled with the new "shadow resource manager" for
shared BufFiles.

You're right. The attached version fixes these problems. The
BufFiles created or opened in this new way now participate in both of
our leak-detection and clean-up schemes: the one in resowner.c
(because I'm now explicitly registering with it as I had failed to do
before) and the one in CleanupTempFiles (because FD_CLOSE_AT_EOXACT is
set, which I already had in the previous version for the creator, but
not the opener of such a file). I tested by commenting out my
explicit BufFileClose calls to check that resowner.c starts
complaining, and then by commenting out the resowner registration too
to check that CleanupTempFiles starts complaining.

As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts?

A less serious issue I've also noticed is that you add palloc() calls,
implicitly using the current memory context, within buffile.c.
BufFileOpenTagged() has some, for example. However, there is a note
that we don't need to save the memory context when we open a BufFile
because we always repalloc(). That is no longer the case here.

I don't see a problem here. BufFileOpenTagged() is similar to
BufFileCreateTemp() which calls makeBufFile() and thereore returns a
result that is allocated in the current memory context. This seems
like the usual deal.

Thanks for the review!

On Wed, Mar 22, 2017 at 1:07 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Feb 16, 2017 at 3:36 PM, Andres Freund <andres@anarazel.de> wrote:

I think the synchronization protocol with the various phases needs to be
documented somewhere. Probably in nodeHashjoin.c's header.

I will supply that shortly.

Added in the attached version.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

parallel-shared-hash-v9.tgzapplication/x-gzip; name=parallel-shared-hash-v9.tgzDownload

�O�X�}�[I���
E���
� ��=cl�cc/�q�1���T�jK*MU�������������7��m�2#����8�A�F����I8h��M�����|�?����{{�������]�W����vgwww��V����������9��?�4��x6I�9���p8�{Y�����i���Z���I8�Y�4��8N�A��&Y4��V�i��o���l��������������{{���������?���#W���D����.��������l�)m������Ne7�
�D)}o{��A����`�,L�����_�����z��Lj@+�%A�~�H�fip6����R�M���Fw
�
�ar���<�	M~O�I���t6zAT0�^A���za?�����A�)�q=>���,���������w��V7a0��f�mn�4��������&0O�+Q?���U����KH�p�`R�&�(]���Y��k;x���zgj��4$/s�9)����e��:�q]�q���/_wO^8�C���?N�9Lw����/!�,��"U��76l'xz��M���ZG�u8�L�0 4�Q`X�N���jg����@����"��'���A��4����7*�/��6bM�V�fS�
v`}������������p�V���z�_�d�~	�3�5\}
������p����
;�{���fsw��o�"���h4��^������_�h���[-U�;
>y;e�)��F��e�;�;n���h�6�f�Ga���7��T��T�|���xG�sM�=����|6~��,���u��X�z������)=\���l$�4N2Bu���7�g�����������z���Y�~������U�����+��0�F~��3���}|��.���������E��1����# p�,Q�-��$������t6�e��D�t��i���-?�_/�xz
�Z���dq�h�)��cg������"z�#��'p�YH{��QWo�P����4L�:�@~�x�>���|6�8/gL��O����1-`J7n{c������Z|���s�F���0C��x�~
o�|�:������iM����x&���
�"&O���I�B�'
��x_����"�57��v��zkO�v[���9���8NCst���x���'�s����F��FN
������������6���)�(�P�_�t�8D�D��_
�8�&8$a:qX���f���;$���q�MG����&w�1D���9q���$��Ft$����&�2�&A�np�nfl}�:���nO��_�>�s�}�������/�����_����:��$�|,g,W�5���\;}NL�&�0�&������,��<�F���/;sQ����Y1;A������r��������y�	��
�#�7 c�J��	>��$�8w�f��;n��	t[����(
��p�]�H>�_�@��4�K�'8�K�����CD	������`JFT�U��
�	>��E�-�h4,�@+��B_��O��
+����/���%��������O����d���+<Y�5�<�H.X�Z��S4�i�&����n}_����2�$#-a&�V.@��?�>���x�?~F����x��mZ4����$J:��$��Z�O�h<��\�1��0�]����C�@ZYmg�y���#F���@B /�r��9�fC�����D�>2@T��&���8�h��������l|����F������&}h~����S�
��pKv��v����7oO���������	�H������ 'WF2#X*z^�F@z��f$��*1d�R�B=��(��$�HK�����Yr�%�2Qx�8�w���nyO�+�;����a���j������x	�k�dU�Q���oMc-�f���:]|
��D���m� s3g*����C
��}+��R��+�T��K��S<F�*l
C|�`�xs��/�f<�94�{�!��kH-��V{�g��5aV"�|{T<4���=.��0�:	&(�="�
��q8M��Q<K��u!7��qXL
�*�Z[;N�B����"�GzK�$�d@�/�!�o�b�_�M}�[ �@�[����T��!��.mc�=V������=�������!�G�WZZ�o[���M��5Z����8�� uUns�c�i+i������	�Zf�d���A���M?��s������������(�`@�m(c��P����� Q�kA�g#z�o�G��� ~��<,:����������2�}�?p��R8�^�����/����/~C|�n��>!�@,nV)�� �\<4��CI�C-�+����^���x��#�Fw�p<����Z�m�G��(���!���y#��9-D�v��v������u�ST���aU��Z1Ck������M�A8T���g����h��>$��t�&�<���N�Pk���������kK�������`��R�R��8���F��M��#!`�S�i~'��r|��M�1�'�3b8��IGt�f��fSZ��do2M���^%��������m�[F*k��E���f�'@�����Y������(F���yh�^�����uZ�B�2����}���B7��<2��Z�/?��4�Hi�h6�RcP��k���������9���g�'Y��_�����E������N��
�i��{�9�	�Yi��Q����M��oL��di��A������M[��� [�c��E�t���u2��
����@0D�w�9L�����T87q �����:�3IDZj�X!Af�h����H5�5R���&)�%J������?��,��f�po�����2<� ���I���]��>������c,�Zc�>?��g�^���_������0k +��dp�<^�q��t�5�8t	����V ��uy��2�%a:L�jL��:p��7�Y��jh��:�6�[j����(�G��Zk�Q~�I�V3m1Z�n�?Hk�a?��;��e���N��L�x�*��yN�w��y@0~N�{Cz!}B?:k4�
��h��0�	b���l��_u��
�k�+�B�Z:�-�!F�M�������-����9`t�DgrO�=��|$3J���kSm8&�wk*|a��Z���[X5��a���n������.X<�w�Y��w����Ob�3���ymMX�EL��$�
3�������s�������#j�[��a���AN���
�
�!��X$����*Z������a�J����dLY���#wIW��\2�"80���n��s�{��#;���I��4���Ij'���`�K'-������I�
)>�O8�9���������E���a�S#�����&��~2F�)�M����7��B����7�,3x �\AS G���_��a�e�9�b^S}�������cl��k;�l!�6������1�K�P�����Z�����m��ezx�R�aGq<E����
&�p4�������'��|w��C9������rC�����|���o���Q��Hj
��,�{�-MFsg=�rtHA}����;����D
4����5�������%�����������3�8�H�j�)����G����m�������sz^�����Y2�.Z�+����o�����m�+�������Q�h��(>������Dp��������;J��z��#m�tm��'p��U.���$��r��@�
5v����U,ie���1���@�5z����n��6�����wY������������(�9�RC�9r��6�K�U:���oRx���ktY+<Z���SpG��p�C��a��������kRj��f;k�o��~i��j�wcvn�����1R:��{O
�=��Vy��}��/Oz���������Y���|�
���hA���J'�K=|��y��
���=7V��C�B�!o�_��NH����I�v���DJU
�<������9��k0@~���I���u}����>,k�C,�r"�\�<��-��k���}�)X%S��W������Z�:�V�����������6{?�7�#����$�?���g��0oo
�)b�#��G�,�{��H�n}��?����3`�%�{�a#b(�P=���`)�B~^gn:�48d�)���79���x6 ��_��04	��]R�C��J��<u����:�-)�P:�����U���
k��2��1Qjo�(�����2�+<�����.G�&���rJ����>�Z���=�z�\����Z�!�v��Z�&H�9��#k/wxC�Z�7�+oY'^v����J:����e:
�D���������}� �;�I�	n���HV�C]���������^���q������)`���v	T��=���������s?��4��8���<��h8�:wyd&\��K�u<;��;�g�]]��5���R����h$��'df/�mq3h�D
��G��2�d��z�(��.x�����E���m�k�aT>Ej<���?����[���j�~��{�n�v~������p��-�������#cS������KF�k��~e�!O4�6���W~�~����e>|�K;-�!@&�������%���e8�g�������u5
3\�N�e����Tq�pYU;��X�f!����u&.i�w*&)Iv.'~�G�K6b�����X&���������^Qs�v<���DtV�x����?�������.��������e?��{�9_�����;NN��>��E��w��M������%�����&�����]R����A?����YP�$
n&�)N>N��L�n�;����S5����HK���5	���I���sN=������z����<������M��s��~5�aO�g�'����_��}H��K�����YM�y�����/���3�`|�-��=-<�F�h��k�0&��'�#�o~������O�L���3x0j33�T0P`�`&�D��ck�'4��g�,t��� L��?4���4���m�pZ��K�SD���Z%�(����kubf���+��,��B�������Mw��=?�M�q��*��fk�sH��0K���Y�W��
�����x����r��s�(����a�U
��QO�1��sO�'Wh�5���\��v��Vg�����Z�u�aHYW�a���&"���a�r4N%�s���"*��q��ek+*�L���!��p�E/�$�������&QB���DJ4�w�AmL�IqpSG�6�|?!��8:r���|C��#pHr�_�C�S*���!BNf���S
:������^�^���q��#���57�01B-���7�U/F���Q�o�e���F����/8=a�^�J0��8Q&�����D�s��'C�<g����
3b�]gZ'��}R���0��VN��mT����r�[b�kOyZ+hm
2���Q0'���O��`J���'�o���m�f��Q3�D��8���h�Xl��{�(uM�(���:����[D�=\��=��?.�+
���r7�P�n���*2���5An<�_�.yO���-�M�����PhVi���/'�2r��4cQ���67�>y��-�?��_�W1�V2����kq@(�3i�CYD���E�9���4L���Wga�8MZ�`J��&�6��
�S��sP���KY�`�g�K��8�Z{�e(�6�N���O��k��`|P��6������+d&���|���&��;@��[�Ol����y�R5AOv�'i�Jh���-������l�1�$����!�wZ�����I���^��H��Qp=����",�6@�k���2�����8��htG�	(�
�1]Wn�M�WUQ9&����!��~a�&��OB!%�F��OGB������f'�@.9���Am�(C�O�/�fy�\Bs�_3�r����gY���nKn�>3��1;+�_�o�~*<v�Q�pUd��a�B���y���Fd�9H�`�d9>����`�~H�^7���a{���"�(�!��!���y�Lh{v�H5�.�|^�E�O&}�&S��;�rH-*����7���9�q���f�9��m����I��)�,�����L]+l�S�%�P�As�/�����������9��]����$��du���gA��^~�����������9��DNYML3��tD��&��$�DA2&�fs}/�t������$G}E��P+�t�@xE-���o�d������@KM�TR�-�S&)yt�d`�>���QM,f�q��	&���	��	����<�����sJ������� AC+�>=��>Lp
��,�zm��;��Q$:�_��L.	�GQ/	�;x��������z N�[����6�Q��-l"�� ���9l7�������T�J�
��(X�m8�%���+z��1L��Y4p�fsK�������p[m�4����Q�Fh�}���K1H��h<5c���q�����&V�q]Ol�(������^0���[�5��|���@��1?�������C#t�����/� k�2}2����9e����5��YG�(a�/� 	����h6;����0$�����&��h��G���^o�|����5����)�D��`��OK(U?�,/���/M�}�p��&C$�6U����ho����"��������W���E�9�SL��^%�l��n�w��:�t���j�F@@���`��'Q`��
�2�I��=W���"`���m?�0��4�pL���c��hV�G�s-)F�
}�o����d��r��/p�e��N�H��w1�����~�x
���	���dV�u^�I���%%Cp�0&!�- �*d���c��e�V4Pc����h�E�������e�y}���]!����;
���.�_cJkN54�s���E�ky�'�I�1�K�;"��'h|��E^GJ�G(�9N�� �*I04�Oco�-�#N9��zp��XL����az���8�D��q��	\��L�%g�V  ���3�N��2�47�;����'�+�S����%��M�S������'h�������+��7$�w��wg���v�NS(`����s;&\��7|�i�G���J@:�,�~���Io>�o!	M����5�)	OI<�0�%gzd���-0��|�;(����z���A�o��s�;�#�=�i:W&����w���>�a46�b�,�G)��`����<�dF���������!����-B�-�C�sJA����E0��m������H�5#(^�c �W��S��*t�����5��{�����H�t�q��X���a=�n����C��7
�:j��6�c���+t�K5�E�x['y��1��,F@3�5w[���%b);��%LbE4F�m a��B�%GQ_�	��q����;aR�`r�Mah�	��%
�1�aF��L�	�Gr-��� �?=Y�����FbF�o��4�z��t�3�d��5i<��4�P�2$�4���K��i+rD��c��&�1C����)��Sy��A�DzM�D����2L�������.�N)�x8"#KT�v��]$�
v��D�6	u�`
��?��O�)���P-������G`dAc�d7$TL���HhE�m\����m `��
%8����c��X�R�`�I&MHx`��1��o"������S}��i���w(���A6��5?�}�����{��|ZI��%S���`{���l�>`�0!�B��'�/���F��l���DQ��(���n���J�.�/������8�o�Os�I�U���u���8=�:{yvr|u��\�k�����>I-�;�xx0��]��y��g�g��oi�PK|R�M�#��Ln�CcJ
y����3���Xx��/�e�z��H<i�"�c���U��`x�����eE)ubb�Ww���3Tk�lC~V���ET�������i4y�?Q/���g�Y&�1�#o[�z��v��g�����E���d@.����������gi����`�GO����y,��X#B�D2D�)Q8Sz��"�e��H�N8m/4�OG��#���l�������]��.�R����aa�.�%Y^�)kL���$�m�$%�M�������4%^\~o�,����B��#-m"
������� �������O�q�������i=HzQ�����#��1x��6l09Y������Ih�q6\�������|D� g�����4��\M8���?�#b�����D������J:J�$Ii���$g�����Wrwr��?4��Mr�Ca�����Q�����G��6�8W����Y�����c�rW�{5�N�F��j����eZ����j��-
=��J
W������)%I�/���-�J�T�L��5�4L33������lk��3L����O���������6�������"*��u���B�f#
4�y��Z��f����w�(��������"K�����6��f�b
s��	xk\���0�.^@�[�.�80^1�7�C���s�
H��^��~]g}`]��g"��(����q�`V���-���Cg�x:����j���� ������mw=��4*N�t G�=�2�%4�T�
����Ibp$W���'���*P���F3/a������Wk&��w}�����b{I#C���������������G��7/�����^`�*��P=��J���JN"�R�0�;��tn��������;*�\��"Sv���s-/Ujk����7�<�IFZ$[��9+/�V�����t�^����O2V�iM.!�t�p����������$�3����	-.�p/�9��;G�/��%��G<���
8�J�8%I�B�3K�Z����L�96UG�3c�dUh��������Nk�E�D.gus��}��r��%C��"�k)���2�����
|*�N��O@/w)8q���F��^1A��	�k�����_ne��,U�mp
������f�g(�c�b���ig��>���=��M�Y�6$����b.)��dN
���{�mEd��f"*h(���R-�������F�6P��Ff��>�C�U�_�I���I0����^W�k�XO�r�����R/I����i��Y2���t���A�2��X�:�T#�f56�w�Z�U\���<}�/����������tM�������~�W���V������T����t�~~|qqvz�}m�_9���R�{��L`��-� B<��������,-�@<@��� ��Q?�F�����$/ Y']m���������>G��i@����\H�\l�N�)���9����t#E��E�Q����TZFU�3G]R`��8E�zX"�@	�JP����_ ���	���t&A�^:*�/�����KAW����si=m���o�����?��{��Gt���u_��_:��~�������O������
�n�$������X0��%�n��<�;��������(�B��^���ZC����)SK�b��u3�^�����I�����m*u���z���u����Z(��/e���1NS`���!J
��DIom<<x"�:��S�*����!P������*)9y:��b@�o�q��3p��:����pa���=������ahfQ�%��T'i�#��f�S�SH-X GNh�q�������tk��N�%-ta���`��i6;���~gP��^���\����v�� ����j^�����
9y�����W���S�8k���g�`_����t<�e^�P8F�������i��0�F�����hN��Eroh��nB%��YJ��Y
?'#�;E����BhI���r�G�(�\b��q�_b��5]d�Y��9f�������Q>g���*+�7&!wNQ�3Y���h$���h�,f��������N�������r?v�7��n��Gy���fK=����e<�+q^C���{�������������t���]t6q�}T����J������
;d�ml}���bX�#�>Egj�q������saD��R���t����U�&���V��)���R�O�(U��a����0����U\'������(�v1<��2�t]?�QO�\��G	�/���[��
T��r`�����M�\��/��"?d�3�����s��<g���F���|R��df=��-=���I�S�j���Q��<��q������������1�
���I1F0�����d�y��{	�����1���#��}����O�k�4
���8]<z��0V����b�1
�������1i�k�����H�U��]�����ME2��QnbL~�7���-%��%��_�J���;�';8���a�f�(�J���b&.M�(2����w�����2�q�8cc�uO�3<�A�;_�tl��j�dLQ����0��edY�[�4���k��Z2�B
����x�"�<�q+6=�]�u?���|�1���5h��7(a]����_��+��c�
)!r&�L�$��H�5�u�x�N���t��|%+Y���$�*m��S]����V�S�v�~��sR(9/����e����p4��{��6���� �����;�4�Zf]�����7���_�����c�v��
�b�!����/E+�v������~`&6��%�cjH��X���M���3�G��Y�v���+����d�����1���C�����BT�\��i��E;�Y��i�yl���y{��q��������
54��efo�kyK<q�T��b����0T�h(tt�H��)�t��l��q�t8#"������)y�K��P(������HuJ�]j�Z�����RTV-	MP��IFpY'��]�|�:^���6�y�����8N��>�pz�d���we7��tL�SY�o���
i��G�����/�g����:���}|w����K3�����0$�\v�t�"���q�\��������t��z�������1���PSs�c����]�@��bV��e<�M����qC���:
4>��G����wn��ZgB����$}oT�lqg8�3m�
'>=w�,�
X����!5�N�2q
v�Z��I�_��~�d#tI��Q����������\��U��x�x�&!Zu��YvWw�/�w�y|�P����m8�9��_�����\�{{�����N���2�K�|��f���[���p%j����CR��]w�.yK������J��[���K������}Y��y�%s�����R��m@��N��p|Gg����z�c�oV.���L!h�����u��}����*���}�����>�#)}2k����|�����<��;�����>�^�GU���59��%E��{�wH��E`������������F^7�I��pi�����������o��������G������|��5�@#07����j�5E�-�f�;��������s
��i��S���!�ot��t��S�{����&�,!��[���N��7�%�:�	�1����b�T���%s	G��8�\�i�+����D�2;,w������8�>������2�������J���z�;�f��	��`�:398sj��Z����w����]kF�\��{6A�aLe�G����6(���a���O����n������#���u\O��Kt`�����&�j1����S>'��M=��@h��rz<���B(3��t���p,��f�o�w����on���*e��x���k;�M'�7=z�����1�:A}B�����|\����^=�T�<px������;k���������/���}�O��`���;x9WN]Y1��w�%|�Ky'at=q����W�q���Qy7�gE���Y����&e�)�)�k�����0)t��?��Q6����lj�������F	�jX�O��M.��_%��-��au�$�WA0��;#��dv���Y��@OyO�����!N����s�
���+�<���Tr��������"S���Z� {]2����4��'�\i9������!�/{��E��=ZW�$�k�����CK�^�&�AO��TZ�C�

�
Z.�m��6���k���=6K�0
�p8V���v)�2���X4�?��4����O�Y�v��.l0�����6�i���G���U<���G�BLV^�G����I�!����C+�'
�Xu��
�^���F���@-W��^~�`?��a�����p8�4������:X��a��2�����������-��Kx��:��2�7+aX�$;hc���J��y/@��iYB�M�"?�:E�]JL����:�S�p��w���l�?����2mN�?@h��.~����Q��f�x���M03mL����b)�>���1z1�I��Y���/{�X8�ytSZ����{�l�X�~�Kc�,�,����W�s�/H���� �0Fj��x�hC,���	�p	��dE4G�L1(�m����(c�������D�xx@�l�*V�B���fU�NB��&lv!)w��*�������������2�*6x��������;\��+����'Y����������bv�5�k���Z;��foo�5X���j���#��������+}A�
.���_����=���WB	���l��/�1��@���6�����i���w����� VZ�?
��-^Gg%?��6��"��@?��� �T�������������������$��#.v��ZV��5��
��2u�}�+�*'SF�	�����_���'��F��"���� �V�����i�5�|�Tf�������(t�|!GK�YyC���_�3s����E�.Q��	�C���v{'���[h����[�J?�������M5�/}]�����o���$��#�����H�G��������v�{�K�f���
I��i�����(Xh�^���x+��C%(�+t7�a���u/|�����;���q�_�>�*Y�^��V�����t?������fs���
Zau��"���[�
"�n������<�ro%���0��cVE��;�����J�����r<|��(����k9tR�O	�����H7��,����b4��&0����Rd�����7/��J���n+�����:{�����l��N����9EiK�#��	W�o��u	�5��i�����2Ww_^S��P���h�1+�F,�I�g�\�r+�=����������O��_�K)��0I��s1H���e{���B����a��u>�c�������6��1X����
������kW���u��;�3���e��v���JgP�Y'.���]�\m�\��
n�z��~�������m��[-j�����#���%X�(K�U)��u^J�h0�B�Xr��Q��jT�/��A4W�T>~9��|wqz��|U
��r�b\�����``���v�����8�������m#��K��S����t�`i�U��,�����������0������;�����%1���0��{@���Ag{,Z�\��GTk�5��g����}���)�N�"�#��K�3�|<]�f2�z�J��%�.��E/|�bhI��zL����0�|6��7�9�
�������d0ejx�9�������o�f�������v/h�X��Y���� ~���]3
�"�D!�[����?�N@����������2��x���gy���&�����YBR(��������;�������������F�}V�:7�g��n�vr�?�������=��vP�j:hE����p#�����0y0����$����?��K\Q�kh������k��.�I9����*��NV��}����3��r�` �z_W��F�����q�g?J��I�S�q9�?����a�8U��S�5��J��uSF��\�7h�d}����Tq8)���d��0-~��N�aP.3���.�Oa�J�~��g��y`����+I0EEu��m�o����0�C�U.��e��D���t��J��Xw�u����8;��(7'F��$T�'�����
�*�'�\A�:�����������8����V@9�$������_�R�����0�A�'f�^�G[�_��H����^g��;��0g�����C�����?8�����$�������I��9g7��{��:��N<�������-��3�Q�o]�p��(�EY8~��?�$�a���j��s�c
�>�����LsJ����i�@}I�!��~�Zt�E�Zt��3��gL���O`!�J[���k��3���
G�$1��`?o���&6�F�&�r��EC=S|�D�t��yK�0�l��x����3��W��O]��(U�~��s�������H2(�����j�	��B�f��jr������v�~���779j��Q�HS��C�p�m�j��Q#�xi�� ���v���v���?����/+.��h��V��^�����o������Zp�)�"��L��d���S{{w3�� ������aw�>B�NEL��w���7����\�Jo
�k� �����S�����t����f������V�E	��EI#�H����mm��58�"*�
�z��O04!e��������7�8�p��
���A�s9R���$�#��gc�db	��l<���^�C������;�����]~���?X&����%&�����;?:8�}@z����+�ie���AeKmi�n����$�^3<����2*��&��wA#kGx�;����i��bw�(���JYS��	7�Bf
������-
T�|E�w.�/���k���vd�enL'��X[[[2��q�m������w���w[i�`��[�Xe�r�/��pl�{���!��������|���:����{�f�k�u�����fF^~���� 4!�X@�M+��d�����~��������,���%�@���94>�bR�sz��;��3�Sb}'��������7�S����Bn��Cwa��K�Z���*��x����f��+������N8�o6�z�~�u{�~���`�q��@�J�7�����}��-V��Fi���d�v���,��[�k���b{?�D��j��U��R�������C��JfL����+LG�Q^��^�R**��|.(��7��7��H��pL?�~9JQ����z��D�ww�M�c#�5�M0�S����QC�����������B3��l�Z��0�`v���h�������Sx��Tu����s�PT�n���^��J�����+G�
� ��v��Y"�
��:�d�v��O��y���!V�����zC��6T�����������w������r�����};��P��2��f6��]x�O]*��U��A�I�.=��� ?�r��r�n<k��I�W�+�!`��L���3��Y<���8�����}�.����*ME��Z�7���&� �4�>S����S�����`cn��{�)F���B�%l����S�&�so.��G��vYh#n��n�/�[4{�4���F���]�b��Q�:��}�7N�k\�tm
W�9qja)pDq�=]OU2�H�N����c"?m���A*���M�r�x���v��K���(���l�4b��1����d�	:����<KY_�aO�I��c�j�G�Fe�H�N�[����������H\?(g�$�qKKrNG)7�9%����*C�J\-1����Y	��b���D����X3V����;:<9�$�*|;nrK���k�?U�~i�\�Y���k��yEju��\N�����}|y�}~������������D�u"I/%9�
�sJb}ZF�:C{n�����+w���tD3>�c3r�e-�P<[��Z	�7����M}���gW��?���K����R.��6�������7/���pE�Kme�O����[����C�8�<��	�����bQw6����w�@�K�!��_�^x[�EJ�=�^Mb���JfL�%�zf�R��?l"0hL��s���/$�\��/&������E�k���4C�5i��Jb���,���Z#��
.����r5J�
��,b���'�}��R�$���f�3tt��G��7�c82
�;��������3��yL+u���_����b��
|�@��1UZ#f��{\���t�����W�S�J��C��8����S����47�B<���9�E������� ����E�2C�(�W��U��a5�-P�b�����D�/��&F�IF�`n3
��C�������De	�)-'UaZX�RA4�W�|��7]d9�`�$$���E�G�|$���/	�q�u��U�t`�K|�9��v\d�����m�efd���h���������*�
{��q���n[��]����X�8����l����3#�����t_
[�Y�(��~���v�*{�s����a�%R6��>���?��}?�����>}��3n����;��S�~z�J>���@�6�������v.^
pg�V�A����d��*C��p���b\��c�V�{R1����y�G����k��y	�\�\�7��������Z~���������"�P
#��{�JR$|�T����|�����n&��l2&�)x�K�JbU����o�0�0dI�e�U�2��j�JN�9]s�#��`��J�~_?s��DMoo"��I�>�w����[�<���s�G�'��t���*u��\��/���U��"���?��"R���,��	��U�%����S^��'����,�~�g2�K����T�W{?��� ��c0D6��R���N���h[��dP!�~��!�e�D"�7���4/\��i����)>���|6��O��,���YK!%����I��[��fr�7e���YC%�4�J)������i�+�j�L�j���O�`�A<�kk�t�|�>������~<����h����H�o�hc�����&i�7�
a�Z���$G>r���J�H���3��~7b�"�$Ou��`
�|
Be&.�N�My��(�
H���Jg��$-GUz7�r�4�'T��l�%��a:J�<���z
��:��K���EM�e�/����S�Q&�-|x��Oo�8�%�qX'F�U����W��X�������<G�����0������X+$�M�� r����;l�����N�%��=B�4�'���Qb`-�����������Y�G���eG��6�Z�0���<�����Mx>��
|Q
������zD������u�p"��C��,��)�� c��[>��V��[���i�����x��h7N�kzz�e+m6�0J,�jR���v�W�B$��n��-�B��D����2"����!�[��N�CJ+(L�{���F��cF
���<V������F�Z�N����[5)�M(uU�X���N?Ri������z=�_m�;I2YfS�M��8������������q-�U���@<u�EI-��&39�O�����BK����&��1a��_^�>��
�	��$n�C/������yJ���
/Is�����<�K*9���=������j;b�X.�p�u��� Q�������a�������sft]ib`@��,��Mq�--4o���,���K���L�0<��O� "!N�/�L���kR�M��+���v���>��]U�k��������O0�.����2^L��N��'G�--�I��*)����uQ0,�rd*��AppN�{X�6�Z�x
��D:y�^J�0���A�������K2�]��>���D<�[��������V-@�������b�7��yLrS<�[o�:�K����Y��d\�����8�p�2��N�+e�:�'L*��	�-���N1YR1���G�=_P��P��HO���
�=�`95\��5>K5�t�3p|�8M#�9b�L�G��&{f	Jg7T<[��a�6���=|�M�
�I��1�p�ip�
���jyW���,�.���@���c�|�
�0z4�u{���fEFt��-qP��I�R@�-uBA���h���B��{����3
��G	�QG@�#`4Sz>i<q/Q�������
�Mtfl<��}�O��P��iV��W����6��_�8�m����v]`1����T%��A����9e�Xm���zw����[S���FS�c������s�vB�JW��l$����~"��$�z���K���7���3�/��������������s���iwx2��H�����z9@���r}�����w�
�&	����O�� �������7go��=9=}q��y2*�1N�7P��,��lnz�N�K��<2J������h�����&��m#"y^E�gsHG�V�����.����Es��4���OM��� �M>���}��J�7�[b�Q\�,!��L;����3�J/��S�Uu��\:U)�C���r�%M����$q��t�j���zw����#)%��I�-�s��c�aA�{u_,�~���������G���g$�F�8���+e�U�����,n(:t���|��2�{H��a�$�b��2�K���-�<��o����|��z���o���_� zi�0���"���e0�9�=����Pr���:�8��xlSvU��d���������gVx2�Y�B����S{C���iZ��B=.WFb���=�d�s6d>w��7nw�~���=S���������� �	]�����x��Z��)��D�p��J]ZM�(��M��������{����c?���$xQ~D���P���Hq2C����{Jx���#��YS���IQ��GD����2$�9������{��������b��j!�A����o@��>f
Qs��VUxNH��WS������O��gw1g�����f�U�k(6�^��5�G�)�L��I���6�(�2My^�3�������Jhcm|�i8V�47�V�N"��m4�p?��������Dt�����:��*����F�p�5����x�^�����r�!��uO�q��z�~�Y���(�z����xc��Tq_&�{-�-��"t���/I)HEt�!�\3����������T&(�r���Ex�E����
S���X�����.��!.��{���]�������1�����5�A{V &�b��#�W>�Z�6��q�����S������D�)P���A�Q{`��M�.��
L�G�����K�O��	�i_�M
Hd�o2<3}����F�cq��9$���/�������+p�-�[��TD�ZW�k��<6��M5r�����NB�%�r���8�t�H*�[9����`�M��
��������U�qg8#w�"���{��������,W�������PyT�����E��.�-�EI�>H^�o���$R��]+QxW��q
�e(�Ti
�4,�d�c�����O���F�)���V��j�zW���������P
������>�e|��I����/.�������_t�D�a�T9U���,�I�c��D�+'U�����'�`E+�/�A��eg�i�j���~r�����G�;}4P��0��
R��y�"��'@�G�����UK��An�GI�:���������'��
�!J�����P��-���U��A��4#�$
�|����3!f|����	�)�pH���id��g}m)�
�X@��j^����V&	[��Y���8-QI�{nj9��1�Df���c�.�>�V�F��]Da����VMZRP�Vwb|#	�������cb�y#3?��G�;J��.�oj�(W�\�U^�#�����di�����_H����4K�;f}���zfw����7��T��=2#����V�:�?�)�/s*?$�eQv�#/4�����@��6�B��2Hq9Ll�5��8�N�����E��(�X�������1�����=�;-�?vp��o�
yB`:�xr
rQ=����}@u����������%]vh�.�~�:}q,��z��0�������A�{�
��5:J:����o���]x�n��G������I���<���T��p7������-o^��J���Y�GE?.i���[j����t���hI7�X��Q���[��{{-����G�}�fu'N,��xC%�p������2�0����z����Gy\��I3����u��|�qfe[��}�M_	�rga���e�*���W�iQ�5Y�$G@nk�D���;�H�;�.�p9;#��P�X+�|&k�B���e"Wj��M��(O7�RF��D:e�d8�i�����G�5x
��k'���> ���A��N��J�����
f&<����Q���9;�f}�$TszG?�1v
K��'d�5��QC�NR��^~�����,��c@�=�*��C=�^���)�[�8��Qnq|�>�k�����'�4�	@p�P\D*�9�`���~��2[?���d�U�"���p���N}g���%�=�?�p_	As-�~&��V�A�^�}j 8	���/1g������k�\9�[a����["����\�����`�E��f^�^����H������\U\?)��-�y_�^�T�Q��)u49������?8���*�������(rU�uU�O�3$����
����d7N������!�/�����;�eY���e��;���N����d��gQ���S`�Z��I.���c1�S�g�" �h?�OI����x�iE�2�~� `>'c��9���~���p�\�JK.A�S��aYF��_X�!
��h+6��*7�Z�5���{x��F�.���_%��L�JM��-N&|Po��x�
T��\x>(x�Bv��u5B��Q�6l�+2�m�
F������cn�*~O@!������E�|o��J�-���$�`�����
�a ��O��d��t�>S�S��a�����2��bJ��~�g_���n��e��uoZ�����,����W�������~��f���y�w�UI�^7�l���i�
Pd�I������n��"9r@����-��Sc2��< ]�~��3/�a�p�J)|U�eUU��u�����,�OC����U�9���y�����yU���>�E_�b~�9��U{�~������\v���~~��^�N�?M9�%�AC�7���Y���*��>�+J�ks)u�[0����Em�(|��\ZLR�~�je��*�~�������&����7�!Q�w)��	��W�]��
�
�����=Z!=��$������:1g���(-�H�]6��*��}D��}����cs~���/m�:��b�r�.>�=%�7{�����m�95`Mn�z���*��M2~���R����'�P�+X��N�k�_!$rS�x����`*��aS�Vy����Qj�v�����������nK�Z��C�(���4���s"�[�������m�����Y��Zg2��\4������?���������E��S>n%���K������2�QM�)9z��m'"+<��z�������_�>�s�}���������_�=;�6����gYwN6%��f�z����F�U��^d��*�R�f�������G7�V���7!o^"�lm���b �H���";��f���6����&si�9�g�{��
s	�\=f��2�r'X��O/���� Pd�j�(,��e�����p�i�\��ma���:C4�k��=�A�����d]���0�sl��!e�i/ �����N�s[�0�D�K��O����������2e(y���D���1����������>u�j$�wJ��^�4��_�q_�Uy���*=����7�z����B����-�y_p��4W��Z��j���H�a�����Zpr�{������+��gb&&��n~3��t���0HD^�5���h%w��
|�	��s�/w��{G_Z����N�� �R4�o�X�����,��W�c��������R������3JSwr/`�9�������)[n���P��(/��E9$!".��d��:�K�KZ�U1E���1���w��&h���v0?Ln�D�^����j��<��p������9�x���~/�)G�r��<�`b��Z2*�Dd��t�H��I����.��>��eB�
�1<�����|1j�\
o���TU�YA��k�.��+2|�<�����8X:g�r=?��s�����������aG�������F�������eoF�bnM.���8�O�����2���$�>�)��!xs��Tt�	\
a�<[���"}���x��X��$�2��Dv^�E���u������Y��[�<��,���G���.o�����jm��\������~v�K:���Vg����"L��8^z�i�yp��5�3�y�;j��%#p�6\��yYt�G���&Ul�������A���3cL��J�%��x&7��e������%��mu����^}�y�1�m������o\ea��>���f�i�;�b��`�X5��7�j�t�Z!J���{K/t�b����s���8��:Q�����?�p
���	�\4��������.���:��9e�R���.��x9
��;U[,�4�����r��L����Ar��W{���ww�4�fS������7y��Qj5�;����`0KP0�@&e��su�~:K���%yc�"t/�c���ej�}����������W�|��.�i����zl�
���L��(�������kF}Dl�CY?L
'�����]Jkg��g�>,�v��,#eE:��q+a���U����?�#9Y'3�tl����1O����6�K��p�:]���
����}��@��w�K����+�i��-�4]���WT�<S����+�F��t���.%VMO�9.�4��S�=DWA���]_1��~��
����?'�W^���Q�0q�{�<���B*$R�����;����/W��(�x8d���������_��N�C�����<xf+�6Xo���n�����O*4�X�$�6��W1O|�C��9���x���I$�t�P!��N��v���=�`.��6�QD���s{Q^MS[R�H(kk�Ky�:���n�0^����<�n���-�����N�^W�OqxJh0�%�����u}�#qS����^QX��B��[5�@��	�r"��X5�E1��;�U�GUiE�G������D�����ba29�CJQ�A�����G�"�0�������m@�V���
�"�������}Dc=����a���]]�����u�B��+(�)��������e�a)P����:?�������-�rDB������<�yZL�M�E�t���p*
x�%�^e��1h���������b�/�O��F���4e&����&V'3 9F��I	3�S#�2�R�WrZ~n��G����}�Q9�D����u!0���t��o9����/:+��&�������*AU���b��yJ(�*p��)U��a��7�'Z��,���"o�}���S^�-)���V~���J����L�4rF��\����\����hm�T.�l%�[��e��a�R�/����V�Z����x�U��r���~:y���<���uQ������>h��Q��;��?m;����Uq�e�\-��,�7�n���S(����r��q�f�����h�:@��;�m�wp?�Eb��#vt��x6���x�7s�o���N�z�����dA#` �����1YY>�6����(�C��2����Z)U���?�m@�i-���������,_��������A�y�u��|��� �y��.�G�eP����v�g��,�N�G���DoN�0��'��b�2���k3��PG��3Q���`~�������7/������X5&�p(�3&-b�0���$����
�4�C�=7?,��9�,� �`�0���8r�g��H��k:�^�������F-�������-�5��]�8�����id�������y�'��
�_��*c�H��*l��,������m����|t��u��#���py���|b�A����dn@�4��q_8# ����Q����}�	+#�
��YR�r��J�����L/��*<+�^Nsi�.�<ZR&��-,���C2���/v���D�XdI�ON%�d9�M*�Q�o����a�[y�)�DF�t��x�q�I�Ev���*� �����7����P����Q����tin����L@yI0�O���V=�b�������b|�3X�")�a2g6}������+�^��)���pw��^��0
3��p���o]!�t`x�m-Os
����`L!��q���A�
�_���I����t�e�M�P�Q�	��o��+o�Bx���c��/A���`��O�V�y�����\��X��I-z\qb��"�f�L���G���g�D=����W����)�������V��e�fm�������e.1�Lw�e;>{�Bj��UD
	�,����u��gyt8��h���M��
�3��Q�� �Rk��j��K��I����z=2+��L�_06��1�y"7�{6�z���.m��j���Eg�~��'�����-�'6���+��*TU��T��x_�+/x�o�c/�U,�����f�����&`�M���%�[�?j�!��!H��%�aB����(5,y�	��L!<)B�&����?5�B�}qbA��J.W�&�.p��|��9U0<���]Y���� ���Qr����"�:���r�#@�,F���y# ��8���iP�4��L5�$��R{�p'�L������%��5�i��K��u�@��@�D��e�9�[).V��|�-&2F?m���!���wC�u�����Q��y�����rt���w��4+���dH�G���*�i��3�5��@����6^.���\[{X�q���p�R�g�&,��B���y�����3X��{%�(H��T]���U��-D�U���������x#2���"�����c$�8����p��.9K�i�X+V��gP�tRz%4c����3s�	�w���)zK�Fn��X���>������1��l��zTv�������.���1�+�&X�aQ�yg�|��gf"��mYP���o_\�m�Z��?a���^�9�vg��NdOA7_���R����,�mY.x����.6�{r��by�r���!,������,%��{I�����'��3p��2Gm}�+b�wF�o�u+��-:;�s�������P5� �[i���V�%������Tq�(�5���L�u����mw���p?�98P������z��Xn��Z����TN�B�:�=4VR���qB����&�:�M�:<�4�iZ��u^���j����"����|��x6]i���Z��&�a��'���	\b��&Q���g���8=�:{yvr|u���+��J^��wW��O�G�������7�o�.���^��t��T�Z}��T�(�z��D�.}���;�
�-R�1���zE\�YJ�P�2r�a�I'gFk��HK����q�\�hr�!���K��d�������]��2�&���v���R7��������{�O�1/���jIDc�c�H��^R�]DE�_�Bq:�;V:�zUs@���J�&��6U�-/'.N�E�����"%Z`�_���p��s9Q�m������`����`�
T�F���Q�]4�6�#��"8��S�e6������oJ��b2b��������c����9$Uk����=�f���p\���G����:{�;8�2��v�Y��V����T�N�;��M�i�1x�at�zK4;����>o�ihEr����[Z_����z��
n��@������8Gy��>�sq;r[Ja��+�m���)�c�)�5MM2�+����Q�td�96s<t
u{����F������J!�|u4F���h]�q�L��z�'��D_5���,}�
������-P
��W��$\�CA�$1�8����8@�3N��DIJ�1 m�h��"���P�'�`�^�_`"ip�����P�7�q��#BK��3�St�H�����p��Rs}�~6z�`���r��L~~�������|
�]�.�������{������������5�����S�I����.��>��yaaN�,�f��]1�A�����H_����hl�������\d��lT1

�������sVH,v�
?��=�:y}����%� TY�[+�y����������'ZW��^�� 5o���8��f��c��f?�^#�5���I������f8�8��|�*|W�sK�tt��-��(z��.�]����=�7��x�l�JH�e��h1��Lk&���`��zk�5��%�_(P2����kB�Q��RY��;��J�"#a+��\�������r\nq�5�o�.�����7�w�N/6h�����c��M�W?�d3xda=J�K"_�MnA����CI�d&��af��-�\6��{ ��~0�D�\{l�����]Uz��lt�)^0�yn�H����,��.���M�>���sq����F��B����W2���D�P�L���Ya�)?��qN���if�a�k�������QI�m�L����\O�z����S��-�%
�t9�����H������DX���+Qk��P�!�:����>���(���)r����P��������K�y�mmG67��bB�D`�BK�:���WmI�����I��\E��,^���"o_M����,�$�����y��kh�1����_@y��4~��4��5�6�Yh��5�%��6�z�mF���kZ�#��9H��p�:�RQ����5}�����RMpP�I�?Y$s�e�ze�c&���i�F��e�4'-�<��p&�PwX���o��.~R��{
�F�������-�'Uq���X<}Z\�A���h�P�P��D�&gC��/9���������m�<\��������Z�e�h
�'m�MW�1�x��S���La�r��r�.��\�/p�f=r�^QE-��i�fD #����[m�-�<~�F]���`���!ek���tY�������|6��ed7IR�����R)���H�7�z��:�vq����d����7�x��i�dx�]�Ub	� �tz`����R��%Ki��.�:0��$6
_QBY����s)h.�t��~��C�U��|�s��q!-2��{�Z���Oe2���r�h�����H$b�R@3t"�\��:V�Y��i ��=���I�b�
G���"�m'����[t�3_����s���{�j�G�vU
�7U�.Y��.��Q��t/���x%�v�Q|��%����1�q����������;;78\r�����L���.��\a�}D�b����A��~U��h1��<�����������m]F$�r����2��|��I�h��mN���{�hS.B��s���K��^~��<�R���xja��j:I�P�.j�I|�J�n�F�p���iZrF����)^�-=-9[��%h�#K�0�����17�-mN�&�B�$�(�k\�8�������������v��on:��C���`�&��3C�M���7�o1����C�`�E���g�/�8e}%s������J�\"�������%bed������G�����8j����P�a�y<�V~r�+���-&��Rm�� ��f���93�v�M��R�{k3^w�u������������MB�9�.���c�mI���Y6�oL�|[t��s�x�vnq:�N&mf@���D���1���4K�;
�����M&C|�[�� &��C�&���Vr+5��/rX������^7�)V�p*���H.hM�M>�i����q�1e#����x����#1/t�Uz3�fq��������7���1�x,W��)�KK:�����v8[����{,{�d��,���,TY]�X�9
@t]��({��(�����HT��?���#�]`H:
�Q'c
�t���\�1,�I������k	�3�f������4�~�y��D
2Y�H�����A4����}��,����Y�M���
�\��@�,]���Vn;��]F/3��D+Pk�E8�-S��Q1�)��C��&�G������
�^�$����U-�2��B+������87�m/�����Wn�7�-���q"|�l������h�z	�k?A{��>�^�W�� �w�'��.�s�l��r�f�v��� e-#n�2�?'���`�L�����"m������6�����=������m#G��/�)0���,�"��(v�,���"������IP��$�����~�����)�����'I@w�������8�P;���Q{?�$����
�sZ��Q�n>t��[	�q�\_�Mp�Oq�e��F��K�������E�����.e�|��rs)t�(�0����q ^�zWN�|z����N?�4�i���n���L~�"���xDw����
�r_�P_���H7 �t������n�={����w�N��������t�-��s�e(�X{�.�m��w�LF�������������_��G���W���9=�$��D��&�8JF7�hi	%D������{�L;G�:8u� }K�@]�V���h�����a��9����[�VE�o�|��D~����8��spP=���;�L�rO������}}qqZ�T����`!�=$�+Z*�����&����o��S���Y����r�RX�y-�v���vKm����~�^�vE�a�!��]BM���PF�~�9<.�s|_<���l����������y��VN�lX�f���3 �����O}�j�ZQ������p�x��[�/AqY�K[�HT���O���FU��mD�@/�������7���������=���90�g�d���U�Xu|q����.8F-S���=����]��!�U�~wx�9v�;_��.�~r~����v�.@��������+o����-���E	�W�'��A���w���Zl��\�U�x�y�^wTm�K�uoks�_�k-����]l�{�^��K�o�����%��w�I�
�E1/��W��
�S���v�#�����%�c�@�S��OP�'���TI����=�C�K�~���	
D��p2�����HU!��0%R�*\VR�D�Y]By��X�s`E�8
P�8�w���(��y*KfF���^�'�W��.mfDJ�'Rz��4q>�NW>������B��V
�%n�wU���S�����|��6�����15*5g@'x�W�={��5Lc ����&��S��H�������g�;`B�Ef�~��6������R������"�,�d-xbP��z����!Ag����%�n�J������Ex�����#����2���)	��fO�Y.D�`'h��I����"�Y2N�������3�y��p�b����3u1�%_��5 �9�hO��i4I):���=��\�����>����o:�M�$�?�^��]tM�[�����f���%p����W�=��!(�pvx��9��.��ai�j���L�r��Es��M�r��;�f��=a,�`��B��?�TZ8�/s���b8�~��%��u�a�KfUS��n0{��g���Sq�g�oP����X&������;8��s�)`X(��(�jK����%���Ur)�	��8��p�!v��x�z�VHpP���M���#��4]�y��!�q�g�>jE�7�A��a����4���?\\��{�Qa���s~=L��5r[��`~GM�g��c��b��q����>�p���v���7O;�QO��a�=��v	��V�	f�|QKn��bA�x6l�'TB����%Gx->GwI�F���������bQ�ij�v[r����3�������*���S-���dY64'���������7*�g�t`z�����$w����	��MhQ����������G�������z�8{�fN~'� wr2k����������NY%K/��R��t#���g����	[p��Z�]�"4�������.��K��99#�ms�x�_��W�����X,�jR�}F����4	���>����bw��?��*����l{�pL����y��[#��LFA�
c�C�|4�����9F+a�8����'���1�"�Q��
:��<�c���{#A��B{���q:���Hk�7����0R���D�"�~B�����1�p���H��&'�9i�*�!uG&����-y����HV	q���(������	7cO�=��>�^�jl�`s
��G�z=m�����z`�L�����g����������$���M��%I��`lj�����a/@�Dx$����n���=��y�C�9s�l��:@B����#
��4����Y9���I
�9d��09��
C�$��c%\�"��>��/PV���/ �1����
���S�&��R���
�Fq_����2�����Lx���sE�o����k����lu�����n	#aDF��)�4c(8�R�NV����&�M��o6[-�
d>�'BV����_p~[J5<���~4�4�O��J�>�D���2b��h1J��P�Wi�!!��&��v����m@J��SoaB���7v�(�W4`�p�D���!�H����T���|5����{���V�dN���
�k���c��<=������P�������D>k��^8��j��_f�1���3�v�[[�F�������\��c����c�5��V:�TC��
�w�D��i:����Z�a P-I>�6o�@����Y2Br0@
���`f���FY��9>���oVw�$�����y���}������Y��\o;���{W]df(6Vx��@d
�<�h���c�O���
�t���W�(����0l�.�b�X��O�,��$���6g�Zq�8��{#{f�X3+���S�/�������Q���:�3��2���UJ��9��)��B�]u����*��W=��}:���H�R>�l"F��V������[��q��i�q��\z����l����y�$�'Mn����v��q�;m3T>=Vr�,��l���
q��X�������Eh}{���i��|�H{�����[,8?9�x�Af�����'��F��t1����<h���85
���~Yr������F��L��I2�\^�`F�oVz��v��A�S��Jv���Ti�HH��g��(�p���#���/��e�i9{�P�T�+���b��gL�m����B����M*4��^+1�xc�(�	��Z#�a]��oc��9>$�b
��X��E���}y��#������Y'o��U^U����?��P`��$co�lm7�f�RCqw��&�����P\,3|��,����r��r��r�\M�w6=����%�IJ��e��k�k�=$�S������B�w�Sc�JB6#��M������nk�gEXk^��O�6|����e��>�[�����[��}�����55�A��j#�B2n����(�4mi�m�jB�:4�@���#f@\J"��<������v0�t���Q���=e��`������P
njD�y���_DbB���1�� �GnH����Fmm�sE�l<K�~Dh_�d1qj��	�[����5jy1D�����8d�j���U�n��;n:i���x����pk�$�������������o��� +��[��Dop���S�7N�q�E�@3���J3���qp8���W�����^Z�\Nl�|Y�J��X����%�w'��3�����&�r�I�	g��7)���IR�b���1�&�nFp���y�`�ek�W��Id^�wq�V���V�����w(w�+��������qH��W��kT�>Mg� u��t�
G^����B|,���U3w������ ��8��d�A�KRRm2+>2(�[�!�4���h��E��66�&MiE[�]F���cU_L�F_��r������V��z�L�k��AN���-������t�a�4RH�Q$�\��H�_�|�����'�o��5�������)���O#��:���$�
4��PKL���VO0���Q���D�y��Y����g������]�-0���}>��d<$x�w��K��W����5��!�~P���sI����)|�,w\��.X�����J�KQ�}	OZ�)��[de�T&���y>����i^��2�Q��$�k����������Z�d7��Uj9�������0E��F'��I��s'p.��]	2"���w��E�;�)2�+l������������&�AQ��I���8��_^���]js�����������v�u|]���T����zj+��)� ��K�P�f��������Jn�/�Xr�������������w
��&*�
pe6J����.������+�O@27f����)�{s��w�^�����������lKe;3[�l�{�d��co�q���r--$�pM@�NR�xACm 2x@)U�X�stV�JW����`�����)m@��)}*U�\��0!�b���%�����*���l��$t��c���$6�gv��m2��o|'�����I�9��F���)kV������42We�_����L��u� 5����`�������B�aB~U���0�v����m�_�����*y�����W��s����]qN��d����Vf�\�����G���d���
�<VV��lm�t�l��XTp@v�U4L��`���O������h4��Q�xP�e��^�!�#9�,��\����n{o��T"U������+���R�-)#�����Y��j���q>
����6���q���t6��qX%��:�����-�q������
�����}�[��y��}��I�?Q����Z�fITw���}���_�it�o���Y��!��0�,w�0��z��7J|Q>5�RBd�1}'���V[������z:�$��{�KS0��/���4�a:�;i��!z�pH�KE��-(x�,R��8�E�7[����
���Q����u�"�rCW's@�j�.T��������N-��M2w�H����v4S���.k�?��a0#f\P#����Hq"��U�������h�j������@R��:������������M�%cj������9Q���5�:�a_��	����#��V��/���Zp�t����h~�c����BO�$BU
��/R������������38�	$*�`z�xZ���v�o�#i��Z��ss}s���?p�S����y���X�:�=�8;<9�BG�S����z�d���B����������x��
��I����2kV����@�f���-��@�������k�`�V)��f�h�����9���jS��4s`D���,0�K;,!;~(����hM�.�./�O�� 3|��M�x�cDUZ8W��	��"�Gi�_}��tUT�l��:L�-�m�$�5X
��S$~�q��������)p�]~A�!�1U}��A�.�R�W��� X_�-pd�Y)�J����t0K�SL�,�Mg����������V��'�T��0 p�gk�����2$�ixe��:<[@S���(�gl�a3�� !t&��}!wR�4��K�'��������{r����������f�Wj+�@L�0��2s�$`�hl�����O��'�_o���/o__�>�Y�� ��/<�OlY�]��L�2��1}��G=B�\�hDM|O@A�i6�!O&<|`����~��v.�i�T7�������
dK�����������������R]<(��,Q��1^�@#����r��Wz�����u���H9Q���>?Dr	P�$KL�4F�3v� `�XR������ef�2���S��.�N�^�2���>hA������D�f{B��mW=��x�d�S���L^-��
,�{�����Rd�4D���cM"������J��9��`����h4�����vR��K��B�? ��+�
�#����d����aK�c�wFt�Q���l��\8��Qg��1{{�����f�����k4�����v����F���+7�������M�#�}�n?�6O�U�h�(3�����G�:g�p��W�����A�����^?L�W��)�J�31���������m	td�%g�S:Y�4(^��\�+7'I^���a��5'�^������<�9����E
H�Z��|X�&u������5	�����z�:	@i�W���f�����lYxi5�2 ���#���3���\�E�X��6��og�{�=��oaQkss��nm
�[�bF�o%�
��0v��ql���Wj�� 0��1��^�E�c)���L�?��\��{|}h?<��Hu����4D������>Y�7~������K[�%O����t�~�&�_z ��0���^X���3YW�]
��x}}��J8�o��7n��d_p�0��w�Fcg�������E�!����?)�
��Ru*�1�p�v�W�
�Cz���3y�a�A`����Y�:����D���G����]wnj��{8,�V�d�Oj�'����:b�y��'kH�����J��M#hN��s�7-Q*�*�2��cmG�4`O���8^������c�W�h���~���{v�c-�_P���:}�if���+Ll�u+���'�:s��"������`9�)@O��1�����_��<��@��Pm�,��^
�����
����PFy�����������AO���0��"@����SBsK��~���
�{�_G
��F�������q��km�c����
��XyA�����m�]��T?��7YD�V����7pC��������@*�>R=��[\7i���b��
9T�f��*,���"LbU
��}�E���ke!��J��w�?�X�)?�p�Q2&?��"�f�w��< �B#5p�.��;{��l���
qCh�1���`B�m�E��v��������\l��0�]V����GH9o`���[�R*�qo1*q?c�o	��-��>;�t���T&�������<����ak�Q��q���A�K�&��������:����yn-�J����k��(�|V�p[�|K�I�!�
�j��� p�$WS��R�+TI�S�3~
n��f�&�\������g�z�
v����18Yg�D�H#z
P����~Vb��2P���������}�����J�����H�JB$�z\��_��(V��?��E�q�o�����r�n�����}SL�d�	��������s�v
�V��mM����`���u{�O 1�j��l���n����
���t
������bs��_�Cl��%��x���Gp����N
��m��Z������F�DQ"sfy��%B��75���k�Y��{C����ogOg�&��w��c(c��x������`����F_Z�.������w����z�Te~zs��������`��;W?�@y��
G�@(��3�-����{���j���<<0[����������}}�_�J5������j�� �������������J�z
���8J�5��!��c8�SE�;�I����d�	�B���a-��{����z�K�� A�_�
N�#����|�A�L8�����()��0q���'F��:$�#���|H�n�[3�9�Of���mrP;n 71]�h�h<�{��������t��e������KR�����J2���
�`�<�����N�k:K�ja��G�&��Z�t(7W���ks��Q�'wX�����n�8�Vb�2��t�����,�,��.�28����H�������N�>i�����Y��o�Y��	p���%��������%>7��3�����n�>'�6����@���c�E��3���U�{���$��S�A��Z�.��:
� ���gO{�S$����>��<��S���������P+[W\7�������_K?��tvv�9�,��f��Y@�����g��A��(~>��:o��2!���l�S��p:MW����L���T�[*��g��92}����b4�-9(<����c�������&��,_���Yr�����<��Q���rp�ru����G*���kb���D:���fi�d����g��B���z����<�����%K��J]=���� F�����D}�V��.���W��������98#�R]>?�l�2��gl)%�\9:]�% p�������M���HQ����g�
sZar��_�l���`���$�!Y���!?9=1B_S:S��G
����t�h}�[L��S�9�����z(i���N��Ho�������]�$�	�@�����H��	��c����]��2���N������o���+!�P��m�]�@�W@������6��"[�"lw�&�}���J��z�F	�jIK�;���*B�����M	��������Fc;l������6JT�V)
���*R"l�l�'��'H.[;)+�)j��:�;g�_��U��E�����K�N�ytX��1	�0�i�����G5��d�6H�=d$��
Q7�d�C���J��@��WxPI����P����2<�(A�(Q�MG&t�����r���m���V�ppB��������[L6���/���xW�������|�
<�
_�:����N��Rb;���N��,�/L)Cz*5��x��+5_xR�����2��I0#c1Z��m47��B2	<D�%����l��yv����q�-�d������a���^���������t��%SR��61
��6�x8�N�L��QFe�{���x�qr_g����lq��K������\�u�����N �1A,��2;Qz�;,9dJ����Z]M�����2��c�&�:l��g|���M��>���g2�_�}�FWJO_�,�P`�P�c��G
��:f"��yFc��^���>xT<���g
FnGd6n��H\{�;��f�v{�=�#Yqy��/S��?�{j���
3�M��N�8YT\��/����w�&�}7�F�Y|{K����4����#���F�����9�MZh�Uw�Tl��� ��&*�U��"�SF�<:X���j���UU������1��b��gL����b%��/]��@�e��3�.~_��'�������@�K����������� �;�j$�{�\�fKCh��<����!zH��N����������mZJ����!B���8�����r:�G(��y����6�]�?��x*,cR������x����Qk������[�K���B�������1c��KafU�������ZIU'&����R�PIJ9��H��j�����*�`�7���d�O��������x��H��c�����Zrl���k����0B~E!�������R�}
�)
�piD�x��u���'��������8S�_�8@��?�
�&�b<a;&�">A�_��G�c�QH�R+7�V��p�.K�L�(S�J����OR�T.�.G���f(� ���^���.d%��Y����������G-)�.�~Ol�"T�E\��-����BCX��H��o��<NX����U^��[���Z��
a*���-,���>hpa3��(e�?��y�L�	��6��������)��%:�������Z���pw_���`6[����@;���������0�4}�\�H*&�?��p0��+�����<G�!T��W"���X�X�.2����7V��~bc&>x�
���}�����e���\�;�-�R���p
>$�Q�=dl�k���)�5�]�+�x��H�7-�����`�K[G�������&;��q�
�����n_�������N���T��>bav��N��0$q�{%F�5'�x!*��X[,_>=9���<3)�:��~N�s�9%��r�O�'����E����Z�X���MO�Xayv�+�+�E�:������I������s��������]l�W�L�K�}�r�2��o�����c������e������{I��}��K�G����>xz�^wTQ�P������J���������������M�:\�����d�������oT.Es~��kx3�.�Q��i������K���8��6;���U��j�Oa�-��eg�*�B�WV�%�|E�R ���f%�P�(U��ejyQ��X��Q8~q�N�����,���p
V`�&9c!�4E��A:�wcks��Kd;o#e�*f��a����������R�V�^�p	U���T?^6��1��+@�^�0H�p��=����N����J�;����PFR�(=q�!��(�������h���j[�3[U��Nu���;e�&�_�,iL���JQkG�@�I}^X��5����{P9��&FaO����k01��+�]i���
��0L�f���v�,��O�|T=�W�X=)!.�����<u}{7������[[%�EW�S�~�I��))�����jA�&oN�O���
��>�9;��^�g�\�Q�����w7�!< "�x����.Z���8�x+���������i�k�=�yi�^��p��aW���b2���e�>�����lm�z�F������b��5�'�\����������M��FN8�!8���y���*�������^�Q�9W��+������d��WVA�|�0��y��������J�?9��� �6f�����n�����3�(��]�����^��VbH�7�no�Wt���^�|�"����K�^���6��$���S%���A��\� �U��n���R�<BO�.����d
:)����R��{��,��y�f��<~p������f�t@M���/8�|M>�8BF��@ �Z�1e$�k�`_�U:�f���������B0��]��`���[������
�b���!1K-���j\���m����h���
$�����L(�������
p�v�� .^��x�XU4�(m��KE�J;�z�^m5y7�Is���A_�������]�l�^	��(QC�;���}�..@uoL��P_��N�������w��S���`:��[o15;Ug�%�[A���n=�o�=��}�f������������`�[�d��������*R!����s-�f_h1U�K7��CD���L8
���m��Mn-$��=	4K�V
"������#����S��}���Oh4c��>x���=]r�I�:�]�3�����h�6[�pw������e����
..E�	x�����D��b���G3g���-Toy�3_������������&�������
�#���z�W��O�������3�Q�������h6�{��_�qo1��3�zY��w��F��~��l��f��n6[����N���2[�[�����l���#h�Z���- ����x1�%%�T����=����$�l%jCD�)OI��O=��O������G`d�D��di��p2x��'O0�)��A�\]1�*���F �QN����q8	o�y���0���i������s|}�tN*E���I��t`���Fs%HTV}�)�"�D���RL���3l59K!����$U�s8��2�9�n��
��Fk��n�%i���Cl���%�2�A�\��L1���O��L���H�
�����?�v�5��|�q;Jz���/^��R�����-U�����J#y��jIv��E�D�vJuEwI�O��d�XqI
����.$����[���f��.K�\���U5��A,��^������;��dDD�g�$�0!@��-p����P����I��Kc)1r��w[+�~������m��SE��d�P�:�>`V@���R
y!������.����2�@L�C��n��wl/�8�&>����t����1��{=D����)�L�>%���z�����A��0�(D9��,�+���vg�1_L���ai���A��'��q�� Tp~]%���������uO����/A�(&���Z�	���?D��!���YJ���|���s����2oy*��E�0ya/��|��Cz*�]j�<����!$	�����a�{���+r������jbS�&�������U�E���&���g���f��j�:��a��E��
> 9��:\zN"vQ~y��P^u�����tIq�U�woF��.WP2J�CL���H��]>���W��Z���*@��\�<^��;�F��_>�)���&���Cj-
D�_�t����Vc��F$u��( X1�M�o���i+���'�	N�,'	q0�H��l���z��Wko9��*(�����S�g8V���~K�����a��H�U����Z�
5�<�,���'�J�,������~i&*��[�Aa^�
�+��h3;�����)4���	i`�\mH���L��,&���7A�."x>�S#�y����R�^P~8S%�d���r\�2[���>��dD�W��/M�Z�����M������#v�+b=��d4X�
��H��*"(M�0EH��(5���B�7"�c�r����L�=�����ms�=]�������x����L�7PJm�z��)����Yf��]D�n���/
qKB=c���Z��5�2?��u�Q�������uG���F�ylA�������!�}�h�c������������Q}W�����#:@�ctV8�(����������K	�������N���8����yO����O��H���n�N���|!d��)����m�).�7������J���������H��Q:�o�v�Z��yJ2��Vw�����0�������_���O�����^�;����� ���{���p�{s�#��^[;�FO���4���&��Ma�ReZr�Kf�l�x�}�B�������~�S���^&�-�������P��y4N ���"s!����?������Tqr��������>�������������&K	��1����a(u��')\���Nm���@��q��y�����v�PK�H}|Kn�3a�q�%W���� QCu���G�&2�|����=j�r������2B���*H�5����q6�K�H������	 ��q4RC��
`��`�K?�&�Z�`��9$�9W��s���S�I.K9���\��-�����V{���~�v�Tvh�O��#��O+W��4��-yC1}<_�X"��4�k�d���u��!�xBM�-���J��� �����2!�����t�'�jD �$r{:�����'9�3�'RU�{�mD��t�/�x�]��?��E�{[����>��4���?�&�^t)C��.��.K���k�z��b={�X�\+��:����/�/�G�
��L�Y�1N�jZT���?q�
��C���i��_�R�����P	11�x���:��X:k��r8?��C�C���h������0I��;��1�i����3��Q�O:�0�3��&���T�m1�����:����N�~a�v
��a��bF�@z]����N������^����A���d����	�����T��xK 5������8��G�o�{��z��u���s��Mm��>��<��������'8��xK�K�M">!�@�W!v���F-�5U/��@/�?��E��}�N�<<����6]��ze�����X���m�I�tTG���_��C\�
�Hgi.5/�l����Q�:!���WU�<X\�HK���?38�y�B�<���+��
cKC�	U ���}E�>�� �@Gg����!�
�#�8%���v1���&1<�F�>;C##���0McbP�����hc0�T!n
�uJY/��x�1���M/*l��
G�W(t��2���������!&�!�1:�k�A�rP�3u\3��m���Kd�t����4h&�V����}<���6G��t���Dd�U~M��������"#�5N���4�W�b�'~Wza�)_�2oY����s����6�����b���6�H;m�e���vTp�fJ�
�@��G��s:���v6���4&xP�
�q����G�
���4����z	������\����,={�F��}�=�}��0k�8G���([���?)�U*��*�`^��V�W�I��sD/ ����6������	��"�_��$���,���Y���x$�&�Af�k��J��}�X�q��u]#����|F��:������m��`�e������8"ck��s�=�Q������!��L��j�_������p���� �����`���"f��$��D�M��������	E�!�R8�����!z���F�����I�/����
p���X�GZ1�Mk�.P�D!=��6:��>���0f�|�m���UW��c�e MN��C�U��{�k�{v�z)�
J��)�j�6!�~�����D��k�����<��]���J!ph��~��(W0-�n3�������&}��l���"���5Kf�W��A��Wx/��h����p{��zB��[X���e���'��ok}y���n����6h-�8�#��})N���Af��`�L���Nt}��i���R�5�v�e��y2F}�8q�Br�#p�pN�r@%jC$���J��G�b�����D
���V�Cr'���m����,B��J��j?�q�`��2r���P�S��b%�����S���xy�Z.:���Ej�T���v5��x�ufMV��S�2�[?T�{�5~���>| /�^�!��g�c��pE$hN�B�?t+0\*L��Luv��
�^�E#�o'��	EX�j���Bq�Y����5���N�1�G�9C
�v
(V�2�+a"r��7x��_��:�/�it=Ws7��H��"�������4��
�C����Z
IL���S�\���S�w���d������l��K9�g�C1�*�b�h�kM�d?����r2G�]L^��kRn^�x��O�4F�����6!�F�p1BT��?,��7�e����'p�T��(�����	o�����@�{\�^L���g���{D��/�Y�����/�J���E��)�$p���T�����j�heP%�g��$������M�{x�=:�������
t.�/�W-���I����H����$�����S ������@��7�m]��s����,��<XVr4�[^[��}��B��V���47j��;d&��!��"�B�@���=5��<�P��I�P��Sx��&��P�����y��c?F��Q����d��`���f��`�[%y���A�V�:��DT=>�RU������L����E�)�����i�7�L��}l��8/�x�Ks�������3N��;�
���=-���x7���*�5�-
��J���%����F�l���)�3�>8�x+>O!�^2g[l��P���2hQ"~6�w-��;�B2��".�F��W���>���mD��-��T��U����,��m�tl����g��y�srZ�sD�4^8�f>��y?An���K����KKf����Ut������/=&�����t�2p-�^���
.��w9.^i��������� ��
T�i��o��sQf_\%_��0CZl��Zom�v)v���"es-W7���u�i�v�>����~@�V%in����[�����{�[����-
���v�
�@����u�	�.gI�]zTaT������	�O�A��S�r�g^�2��AL so����d2�;@<wty��v����QY�������#����)�'��{�$F)������o������f4QW���1#��h�/c�P�n�w�.+������-�<u�H������W�������z~�~S4����\�t��N�����B�+����*_�;T<�3�:ir�	M`ko����-f�3;���+w���F�����q���{`<����������.�P�X'��e�.�&�u)���T�NR��v&����o��r�J{����U�[m����Xv�M�&k9������.3R��*�Ke%�6F���sr�F�L�RU��IH���>�v�9H���\�h���b�$t���Ig�j�(-�d��.���l����kE����6�ljl��n����|��N��BW��.��1�}�(a7�����4�Q�hS7f[R`
"x�����\N4d��5d.V����Y���&�����6���5??��b��X������8?��s��"vl���B:��ts��c����8���3'{����G<x��u=~�L���_��"�C���u�s�)D��~�����V��J�x7�V���J��B�������ppY%�{�9>��E�P�\��O>��a���QP$�)�_M.J�k<��)�bK��w&���8��#q���\h����w����P�0��v����{2�c�-�Ti��[fj�0��$mK�f�"�w�N��������k�9�������:�-�g�}�����]`�����<���,��1H��Q5UWv�`���-0��4��l������oVs�/�QY����W�����6��G0��]��Y]�����'��3��k���������fMoZr���+�
D'-p8aC���cE5����l#�0�K�n������.�Y#A$%���0,��f�=�����B�QDy���|1'��5������DVz�gf�������6��[��r����{8�uwk�f~���*V<{���<S&�}�C8��Y�� �d���H��-��H�J:��o-�/+3���7�'k��L�����d_�#2���V-e�v
����xn}a1ak`zh:����>)c}L���aT�	K�T{��������B����{����k�-����I�e�����~.d�o	��QS�"�����+h�7�4������k�5�����9b�TU04@���W%O�ZP4;s�)��o8�:�	��|�mK���XO�u��������6��$(���}"
��4�����F#Rk��v��gc}��,F�e�<��7k ��6��G�s�6�.����gu7
���+1�qh`������=s�Lf�|]������z�����	 ]��4x�hOJ���Ha��o!����j`{B'cv�y?�?��L���p��O�Pw��X]�N�@v���s��K"������O��3.J6�;��0�B�N�q���Kp%���UH�p�;#��0���(�r�54w^c�/8��*L~`c;F3�x5�����+���C�O�pF��a��I��
�$o-�,����'����c�]�ht����Z�}_�VA�jM��_!K�#�	��<���Z&Gv�8z.Hu�U.@�
K�B��2��Zt���2Z6^���)�s���g��i��q@�V��,d��W��?w��|����Ch40�47������s�X���� ��Y=s�W{`�����Lzo�O��8��	�d�����$i�0t��@.���@�!�$����R���A;K*���	�t[8�=m�K�L/�d`�[s������D�r2�1�r�W+���]��47�0rT��G�5<�9qC�9�z,�$���%��H�������c��i8�cN����{������`�uB.D�I���0!�!���r�n+�7��������# �����H�]b"3���[�����	�8��dHi�%7�<":�����t^pZN�d
63�=�����D���#����Z�=��RV(���t��`0�M���}3�x:�m��� �Y�GpA���r����v�E���4��q������19$s����&�L�a�,~�0	�q�OS�Qs��$�&�d���i�����k�s�)������N& �C�T���u?���J��E����H�����y��9e��U�'��Ok���(���@] 1F�Sb�����i������,���Y�w�Ff���p�g�������B��C�^��8��Z���qi�*��)���EM|.�������b�4s)DXt���G��TZH4���#�Z�0�h�����2B�9#ff_";�h��)n������{��������������*��m���$6\4����?�9��x��T]<����=����\A����}^9����"J&�A:�"N~���:*�8�/��9c������2&���R��O2�O[�D^0�&�E��Y�����<��b��dZAy-
M%�u�����������@k�������E�������3�AUVs��N�X�3���dY2x����@�=�C�oZ8�<����T�!Qc��)-�y���C�W1hRXQ��Y�jNz�e�g*@}�='��G;������G1Fsu�f5����W����������+?U��U}�J�_!�A[�)�\0\��C�mamk)��
j;����g@�c_�X
��H��k�Q!x�r4,��?���%���C��s��+����pO3��� su���=`��0�] �>�RR
�E�-������h~���y�hu��VV��:O�?�`���A�����rf|��,wAqx0�_���=+)��1`C`����P�H������8x�FnH����M8f����2�k��Dd�U&�����N��|��M^��eo�1�xn{A�����*W�������J>�n\�c�s��$Z��'�B��$L���R������l���d�[��[)�
���Ele
�$�W���.`1��-�����X=��w����Jt8���'�
��C��^p�fk�T/�g�[0Xx���K�1��)���8�u�����1 �z@���{��xG�����n�������������h\`)gM�����M���&��c��x���~Y�9o�������\T�Z���������$�3qa���K�����R��C�fG�[�e�-]X��<�^3�d��]��
�����2����&Q�"�\Z��fANPU����+�>s�e/�)_a�[�sV������$��$&s��@],g�j�@o24�/���Y?�0=&jH:��E�U�t��")�	��?�G�9�'UkKdK ���������X7��Y�d�wV�M"�$���M����;\z��a��Ag;b��p��l+�*#��K{2��~���� (|�3�?����C������B�����H'����%�$�|O-�d�������7.���pH/���~��Z@q4���t��t?If�xBJ�Fo����3G!�z�����1#�=���)��S��G?�\L���"S�x`@.c���E;�s�Z���W�h2
�S����������N�q����
���pJ�1|u��3K �V;J�������o����.�����O�
�c�q?#�![�&#D?5�<�.��E}\��N��
fhg4V~m��CXeg��]m��,��\	:!�H�p1�0���0�6-C����w������i���u0H������r6�x���
��	��d�fi�6�1�J����y~�S{}�H������>����A)5�,~����8����6�L�$U,����,Z��X
�-�o��%���q���)���MV����Ni��[�
dNe���� f�9L������x6@���`%q�*2B��55���W�,O���r�����_�R�Z��y):�����������s�W�?g�-�����@�F��1���x�6�b��~�Y�U��:�.��9�\��^�h��7�u	�X~!2�T��i=��pm�LpEnS�nXm|-��Q���6!��^Jz�����vjF��~���N��{��)�������qzR��XLe�4�I����:��k��m2(��N��')���`'�0��g��{������+`����x5��a���	�^Oq��]q�K��"a�u�����w������|~|�?��p�#I_��y�FV���*��eJw4�K.���Dd�wW-�6������g�����l��p�s�a��<������r�q�T�=%�n��+Z"''B�i������v����������!�M
m��l�o���71hs�]0��!x�j`<�� �|Y�2����K� �����1�ME;_c1�����<E�dY~

��8[��cPk�34p�*�2���X����n������~{6���9�������&%�l����r�#�73�n1��Z��|e ��70��<M��/�u.![,����`8�����47a4._��������������	!
3��7�n�p��� N���u'����]pt�B
���PNN;�Y�j#����BJ�8��&5�k���3.}��)�Rl��^�����,������E����G��2���o�o�!����w�W�����`�;b�{��.U>v.-p�<(w;�[���{�}�_W�k����bx����pP��U�T���:V��[�vUx���+P���N��r������L�Y��|�J
_8_j� ��ZE�z�{6�|�<P'�QMV���������_�1�P�~��Ku����V�F��~��l��f��V�����ak�l�n5��f{sws�?���f��@�����YRRNK��`������`��<9����[�O.	���>����e,Y)5�/��C	�h�z�a&��&�S�F������`bX�;��6F��0_���=���p��k4�w�[[�����n+��n9�c���t��!}�?����!��@��z,��u��������w�����Z�t�?E��e�/�OO;��w�����89��{�Gn<���v���|x���e
k����<�\�\\u,o����5d�t-�b���(����~� �����������Z1�,��(�������C��moY	����f���7��~s/�����<u�J0�XR�HG�Y�'�no�����-I��-
�0���t{������������A������UU����-`��/^��ZM�(��P�F��q��_�`���x���z�".�]h������>�+�B����V4����/-��}�9���������_�o=�������2�|�1QB�J�����]{����.��d���l��e���7h
SD�M��L�����~v�i��p�r����\
�2�D��T.��N��r v@���������p8
��
'CS��d�]�Z��	�&	��5����\{L��=&�����.�{������~�	���u���1O
�C��-F�q�o06�wFM�R1��E4����F�Q�}v���@�"ct���Q�y��M]��0��&��AR�
p�������8�w�eh�S7:�`�E��1�*�4u]s�yP�th�,H}�|��m��x:��60"v�5h����Vn����G:�}�T������gi��&��@*�4���lv��!"|����
���R���x�S`�M�i����&���zK��%�lE	M�b�99�Bek\l�)��O��J]�"����T6��fR�o�b�e��h��H5���VQ�zV<�=?sX��k�zH�ZH��t��D��<�v�9�f������g�:���q�E���c���1^D����������B���O�7�'��6�LcQ0&������C�@��!�����p���rw�����[H�px}Mv�<����+�B�f��[����H�4���!��}c
�#GN�c"%������+ZBvu<s	D�X�v�>����*�;=N��S����\�r�:�iO��Wfy�1d�IUi���)����Js�QS�gu��S.����n���lV�.A��G6pj#��'e�����7��Zt������=j�nVj"(E�# IL0�]
���P�����m��HY��Cc}���?7(��WPLQ��B������&:W:$y�2�<�Vy���J�Lkg\���`i�����#�����T���P������B1<V�sF
��#A�B�����$\B\���B�g���5�+g��>��fm(�@!c���������`�R|18��%b=��������>��)�-rUf��,#rZ��~U6TF��U8S���g��4�i��TLT�F��:H��$��'3���8����v
�%��I�����kvL��3���r��I��b]$���7��	��EF�@l��XN�vlQn}�=2K��.����������VM�����>��U��pI�^J�y�.!�Q��g�L8�
�+��9��h����-�,vW}�H�
k���\�zFW"	����O~�
�9��^bFn�A ,��;�}������D����L=���L����'��=#�`k�	th
�����]��w���Z
�'q��!wN�5n�A��C����0��Z�(�y�)���7�4�D[������/Lw�]��~7���$���'��}#p�IC�Z�r+'M6o@.��)�X'�|i�����BUS���
��H=��P*l��mR���vfEu,Gf�^�TI��v�����;Y����J�����JVn[-G��0�5���mN~��&�
?t�|��}��d�kt��x�����4�>�Lo��\���Y2�8�pO���x��0H��	��8��"�	���:����@j8I��I!L0���������i��+�yji���<o5�z��<�t��"%������hh��3.S�����&?�L����g���FW��{�
dt�xZ��E7A�6��q���B�@AL��g�A�=����	�K+�<��b�]y���r������l����E����'���xN�ZL��0;�p��4z�+��z���t�6�v>��<�k%����,���	�?�����1�u1c���5�&�����uc�_�������7����V��_md#���U�6��2�����f��R
\����������/�.G��������H������o��������g��������j%/eQg9F��~�d�+��H�(,�P}��gA���[$cG��G"�]�3Omz89;�j3���.+��dK��$VR��%��#�;�/0�����D0z���|\;�7�N@��D��!�<4���C{(��"������p�E�Q�&T%��ip��#����}8���48�Z���iP�a�l4����(1�������P�D�F@��y�N<���@DK�tC�9,n�3�j;S�(?^�qY�������-Ki���Kj�fW�=�"�53��F�P�j�)��Z��i�FUb��V����dL2��6qM��J����L�U'�t:� ����w����pk��{<~<�\:�/$���K��m��c=s�+�����\$������������$;L�;����?��1�M��h��g1m����1l���>'�#^�N�m~^�d�`\i��v�0���Lq�[`���Td�Q��"����`b)��F!��c�r�Z�� �%dV��,��'��t��Dg�y�O���X��_v;aR/)�����R�RM�,������_��I����y5��'|�������T��g���rF�p&0l59�jZ��E�N�g��Yx���� �LA!�hf+=������"�(i��Yx��'���D�B�@�����\�ap?�2�
V�����~%\~)�������g.��
���	!s����/�Ei��|Hm�"�1�����R�����h�J�@�&G=:rZ�-��_��)���-M�V�Y����YSHD�v��>G  ��1C�D7�u�8����G)��7,,7�\��&s�����������*a)Z�@n�h��u���IX�J�isk�VF��5+�if�AA�`��Of����b+�E487�F:f�'o�|��KF����\�wj8g�E�[�����TCv�����;&7�6�p�R�A�U��u�+����^��VTs�����o��������%���C�,���������[H%��^�Z������q�7��%��$+�V(�}D�����d%����]k�m���f]m�r�[�����s���S��H�5Q��Gk_7����f�'
`>W	  ^i0��J�}-E1��+r#=JE��AH��aW��dO
g�0�m����Y�K�L�,�7Z�c�/����h�����V:�X�
�2�-�s(kt�ot�fL�_���U��V��{��t}�K�v x�����%�bT�4���uu��7��!L���^���
���?H���jV`k��hgC�l�
�����w����SR�/���d��x��!��mCE]�9�"�S��4�(B�JF�����lq�-�<X.)�D-��H0X��W������FL�5<X	�NB�
�k�5�F[;J��m4�������J7P���`Bo��,Aq�d1V[<)6���'���ux�b����^��{vx~��sUs����{J��}��/���C���xo�:��;��\]�t�_w����7'���'��Jn4%�FE���)-*�R��~{g�7��~���������
��������4��X#KL�!��
c����,
m\��V��4�SQ�N�S�e���8`��.ZhJX)����!1�����0L�i�
V���+�w���c���?���������q?,o��\�
~����=�J��J�q�pz�q���Gx���cO�p��#�!�sw���������W6����W�����������x�R����f�����j����/��e��j���&i'9xr2����R�AYI���],J��aZr�K0���(�P���sN�l)�����c�n�^0� ���
��`�z��D'��k|��.T����b\!j-�mL�E���M�S�o{��^��o4v����"�B�(_y4!_)�cl5k{���okA�� ,��!���Wro��&sg�Q4����	��hA�5�+y��K�{
���9[c�WA�������u5������~r~t�9�V��D(�9g�e� ������H`����^C`+;�wG�!L�	w�|1~M�U��Pt�iu�V�T������u�3Xg�Z[��������U��`4��)ylU�y�?D��G�(�E���F�u[���iv��O`��|��:
��s�{P������dJt��.�E�i��'�u��SJ���_�?�J.i����&�?���}���Q&�����T�e�7���������"��L����n�Q
vB������]tv�dx�mv��������C����~�|C�S^��=����K�d�x���v�]�:���T	�_�U@��m���v�5Pgp����.\cq�v�������E�9"!���>O����F)�gE�,~�fq4���j��X����?t/�)��}����X��N�"�z?������m��HH�x�v+�/����+k�6u��������X�|E
v�����IS��G����S�Y��
�������1V�5�����]�P����������%��<Q�E2�����o�0����e�lk�Vj�)��y#|�d�2>��=_�F)|�;+��/��@�����W����E)���CWm�L�u%�"�/���s��D�������[�b^���6�cJI�
a@c�@������;����s��F���}�����`�W�A"�����d�_���@�Av�{�r�>�oU�J6�uoU)}Dm�������5�_��l����V���_
���v����]\�a[��$oOPP~X��q�
�!�'��L;��|�a��nB��eM��z�Adw����{��Zk;7�b�/:� �M�"�|M�T�1��p>;g[���'�_)B?	�%bZ�J2��p���AyJ�SSN�L��C���4�g���P��T�xB�c=��hg�K��(*�����wq|���`��k�H\T���_�����g��UN�aTG��id&�U�E��5>����]�p��z\��5�`�Z C�m���x�~{�����������C�2�V����I�D<�9�C�Ct��G�vx�7���%�[���kf��ts��@m=
.�<��5I0�����7.~���`a5�B��F9�\C8P�/h/?������T�.�%��T�gqo1�4�����������Be�l��hp�N&���@�Lt��0l��tkfZj��a��L�]8�a!�ECM]�);�O4��D�������h�V"J!�k�L	#��CX�0�u��H�L���.�,��S
��}�H/�~����V�.c����,�[��n�N�rs��S�����k��R�x��B�� �}F�+&y9��-��e]{������s�����@�c�p1A;���|@���,�J,x���@���C��y�b�����������(���^<��v�X�vJ��U�f]��:9�#����
��5����(��b�oO�����{���b����f�C�
�v��>��M 4<����������n%�q��9������E|��!������A�VV�+ze�g��C������x���8{��?�z��~�T�-
�]�+i?����{G��	��}�gM���A�A�H���PC�>N��A�;(�n��.��5� `�"����)�jJV�W���������sGI�vD���������,6��gDv4�TZ�N����w��B�g��e^���j@y�#mh����[�����/�����]}�b(��
j�_C��92P�H��@�����!}���A���c�!��J�C�)1��o��gH�VO��w�������Y����6�R���FL�|��I{� �
����i0�������;p`�����;
�<�U��|9T�c:T\�RC�d�K���dq�`p�Xd�	<�#slp�E���/���.��BqIUD���')F8q&Y��,G��Y����o#��N���]��oc�����*��Z����4�#�����1^��3����;��1�������#���'S������|'2	�V��JzK��e�+����-��@�gq/+��yC��R��7#��/{��R���n4�z�SJp�wHu._O������tV�~y��h�LV�{�O�5n��O�R3��������@���k,s�����������2��BT\[L���8��,j��B���@5��G4��p�-�7F���W�{vr��Tu�\��/�M���zx�����Z{;Xo57����/��c��hN|�p2��6��n�_���^����I��^��������YE�Mk:�z��JQ?�&:Bf�� 7���m�x��[8&c�L+��
n�#>k������l<8N�Kz%�����)�Y��c�����ne����c^��Y���k�t��������`�1��� �*��Z}6����I1'�>f�3���l<O�plu�P�n�6�V����P+�1\W��%3�� ��b��#����B���P�i���J>��U=�^�n�K��#xp�B$�O���e\�n�v���m��(U���s�3�0M�Y�!��y���`"�"��$:���Bhmm6wj-nh�~-/>z�9�c���UW������7�U�E���b	n@�r6}���H���r�Q70j��K�,�C���#���OY����a��:�������u����Es�������X�@�n�/�RIt�5-��2��$I�e��'���KGA���OX"{�������������n�Q`J�.T�	)/-�� Ixn�T�h��6t�r�($���S��B,Z���Y\P��P�V���������
�p��T*�P�#Q����������kn�DW)�6p;�+C����/������������n���e��1����ZkS��V�M_�!�K���'�/r���X2�sV��[�B&�� �K�t�V3�1�QS�� ��������� ���/�����&�G+���V��� ���	�4
TW{�C#7H�����E�����!KoJ���C���JK�_G��!'��]DyrC)Q'F���-f6�!nx��U�S���s�Q�OJ*c�&0��Z	�1�d8A���.V��h,OA��n�%��~�#U��|X}
�f������A����G��#.�`L"3�O���h+7yW]�9�)%���>0������
PE,���{S�S����(���7�U
d�-��0|[������y����C����.C��������3�Y����$��v���<b���3��H6~��#��v-�g*y���������e�4�������tf+9�<Ds��;ca��VV�k���B�
�x��\�����rR�4|�G�����K��H�%�FF��mV�3�33�������!�G(����Z@L�<{[�5�d���#�H�X:���q?���z����|p=w(��������[�^����
�M[����PO�;[��&���iz\'OP1T�g�j�z���Nj5a���r�P�%�<D/"m�VI��Z��3m1pu1�|o�|�~�|W��2�a��m"�k:�L/B���D@��-IZa�$��Tm�lv����#�T��rT>h%���[[�\l�`���E���@m�oG�1�������J���JE��=�#�����L+H=q�s��O+�VS�9������sD���]�	���:��}Q��d�~2�!�0)v������;��3��}���5����`e�[��D�X��i4dA%s{��6W6Vdg{)}wo�������I%��
=��I\���'�J���-��'��cE����-�<�[xu�����zI+vkpW�'|�g�7q���X$�]�m�Kx�,R���o��x�2y*s% ����:!�|V��^Wb����n;r�	��~�5��+�4��!���^�4jk�=+��\u�;77�����c�6�^�����G_�z#�G%��l����^���x?rm��2,���FR�M��y~e/
�-���s'����l���h��W4_�f�~��\�.5�O�r@��>
���	�,�rYvxZf� �����U�j������*NKtc���8t��	�Zr;`�o���N�HnW��:f��9�#�?z�d����u.��R������L9�����hf�);|��7$tnBU	��}b�}�f���M�j��3��+��V����~�#%�;��\qVx��vU�������w��m�����:�H��H�I'�EVb�S$��4��N9	��)B%�'�i���~�o��J'�!&���aq����������oO+$�
mOjf2���+�1����X�9���R.a��Uq����:F_
�"�h;��������F���Q����$�k�w>���i�)�y�R)���u�S�q����{!y��o�t|9��������h`�e��(�M�[���[q@�3��s��-�^�x�
?�G������n0�����n�w�
xU��QVYb��q��'��
���d�;�z���Og����|��u��3<�u��_����Y�P�_h�B���0�a;��FQ� ��{i�������%���<N��K0q��G5�H$�,.S�3>P���B��Q1v����������&k������!������#<�o�G^�cV=��'FM`.����4sw��D�J��N�l��%�d���\��a�%���!9.�����kX�rB96�-X�3Vw��U���2����m�:`0Nlb�n��&:Y�lJV:u�k$"�1�@�0�;e�c���
x��<T�9�0V���0��`B��98I8�u������*��j����uTH|�^]����6@���6�?���|yw#���r�mn�����o��U��o������G't���;]%��V��J�P~8(7���\P�'r���k�/%���5�!���#5r��:u��{B1��
@���g���b�2i��W{��F���z�,U������vg�T4A�R���`EzN$�1�E�M�j������E�����FQ��9��1e�c�~��e��*���!��-M���B���*|-�l�T��d-�2��dH����'W/�>�yQ
Qu�v#����'���)H5����WI|l�`)���m�P(���-F�Be:��a����"�:�/E���,�N��:������w�*�lo`5z�=�u1�5��7A�v��7��� ���C�x�t�tZT��<���L4��q�)�;.y�2��E*�?�X)`�2Q�<aG��6\���c(��x�yO'\\r���(��KkQzVf�KK�:��/xdVS+��zq��7���5�$�5@��N `n�`k��F��A2���./�?=��O��<���a7In"
4�Q��fE���A��F�]�`S������^^��y�Y�uH����	�@*��bMc�I��1��O�/��;����{������VU)����0��l6�h�j�_^�r)����CwTxVWQ���&�?#�O�0�j��z�6����6�x���ScK4n�H�w���<�E�"�G�b���pF�jDH���(�c��)&26�8zq ��������%�S��>�?�
c(o+fNgN|�>�b�R>_�Y��5�F,�#�}�\�O�7�������u<�	2���uB�x��F�.��'2���f+���d1���o�F���D����a|��(��3����%c�=���d�
�%	85:�?��n�
v�pPMN"0���;i>-��S����|]^M�}
(r��"��PH=��g��K�kt(D�Nx�+<���Q�KV^"�!]#�����&���`�x|'�o�����;�	z/��n��o��t�"��,6���H��h��v��*�A�
a�/�7f8���"��6�/�NB��X�QH��6APx
v�j]�O�|�y�H��Wc�����	Q�����FhR��d9����������&�N)mRp��?)A�V��rs�=�[�5�4@%�S��'�_1�z ��)h}g��rN��Y����;2�!�����4�H�,�)(�rNp�1�����D�T��e!%�.;�����ZrS��~���E�������4:|X{���@U2��[C�xZ	�:���'Hb�
�H��ehhS5�OX� �J<B���"���=~�f���OY"��3��qo�/���C�s���G�����kZw���fD-V�����F?\�h���x~�C��Yb#H^���n@�|tm]��q��#�����T&��&�,��V��#.!�aI���K����8E�Q�E�%]`*���j��cA��x����=$����3|F���G�T'���^��3��������������4+C{�?�EAq�5p5J��V#l{Q���:kHa\�F���0K�a�=�GM���S�B��E��$
t����0�8U�����S*"5�]��	)�J�=o���d*�\�M������!�M���(2_��m�E����<� �(a���Oa�E�_2��-B@S�.?k�Gp�+��F/6�,8�nak$s���������P(�	���}/�T_��������}*M�n�Ig�u�4�{
�ON>�?~!���3��s�(g��Mrc�p1��'����->-�{h�2�Jma>Z1(6�
{r�uR�c0N&��&��l$fK,�?���7�Wv$�N���A�oUY��������*�����2��I�(Mk��5^X��]
��8�|�x���J3|*$1��D��0�Q��h����y��~i���'�����iq��s��vy����{+4���q���5[���O	�_�PR���<i�47�m�2��N�=�
]&a1���[����l)���b&�4�=�ur�����p�nq�Q�{���4����B|��,N ���Z��3���,U�A�(�97O:H�J!}�`_��	��r� 0o�B�,5��qwO8�����A�8��<5���5�Fy�b��h}'�!����������4�W������bm�zQ�e`�gV[X�uG+D�[��j��eQyE�t5�j����~,�L���Ldk=�Z��9p�J���Vh�v)h����a@DG{o��V�����m
:���-���J��w%�Q�pm8�C��v��;����J���NCv�RY\Y����56+$���2������f�J
_��6X�+m]\���R���kAZ��^����%�������m�]n>��]�N���������Bh!giz�i�
p\<Qy��n�^
�PN}���S�6x��3�"�:������8t)��o"�5v~��M���I��Y����nl�zL���U6����������� ?#���%q�q�=������j��4���� z��g�o&�
�%$��RA�_1^�[��
����#2��\e#���A�5`������������O�otxBg��d:�����+���Q��I&��Y������r�p�d����,YR���	�P����5��6�g�;x�B�j-����UO/������\��Y7F�}$�� �
^c����J���N�^���*:�2�L���+�l��LHs���V,�<j��N�\g�,����y��9�b�:YOa���?!���K�w��&P�A:T�U��>����4>F9�L��<�A���<���n���k�w���v	��.(������Fy�?���[����Z�`�H8�@���N'S,��\M��30�XA.�dX�
�u��)��mC=L��F��&
}
��`��%�����H��Eg��v��a�1�h�i���I��t�BA��.�5��K��-�a*��[D���3yL(����'�S��t)�v5b���b�<\x,�w����"D�s��Y��k.A��8.!q�������F��>g����/}D90��:)��'��)��L��$G*���nf���l�������}�>��a0�_�<��lLq���YC�'���Ua��<�u�-X�~.e������Y�v~q}��r�x�Cx&V7�Xm�`'�G3��4�*x��!�8*��IRhy��7��u^HQ����vQ@Qt�����
 ��7�3�6�
�[,�j��	E���n���N�M�K��6Q;������l�9�"ga�����5�|`����L��|�r��!e��2��f��B�C��N�.��!
�]��x2tuk�GHrOE������^��	�{S��D�%2�
3����go��F'�w��P�k�2o���%v���x�W!��������7���JF�������V3�>��eUuUun �����.�W�L�t���%��,�DI�z�P�_R������CJ�Y���"��vjbiq�K?�TU�����^�RIq� �P�.uu�����u�-<(@�{�cz[�T)x�T��!�����q8��E�W�(Z_�{��a�&��}�}QE�X_���Z�����F[C���-������3�#[u���@D|
s"v�@Q�����U����q�I~!��J��5�)��#_�Bs��LP7�m8`
.8U->\���r�3J$����)�`�DM*�0���qs��;�*?�+v��=�+����u ����6fw��n�����A����<����� �4��TH��#Di����\9�1��5V����<q�=c�[��!^��	����2
e���^��������!;9���<e��u��gt�PF	[({+TU�k�H�e���Z���'�����3u���+��vm.C���a
�c�P���f�{���G���j�n��"g�g���fh��<(��V�^��J����:�r59B]@65��E�3y�3+���q���N�p9!ru�>D:�,����B �(��P6���d~�h�L*al���O�|���G�;����|��Lu���.��:~Nn��ij�����:�FT��l�������=��5s����}%H�`�5C�k���������XB�t_�+�����r�B�f
���������|�.���N���+����r;�g�%��~";�k�<��?������@�Z��7�&�%�P ].]�ab��5���(��#�O�iBu;R�*0o>�by�zV�����hi��Q����F��HA�����@����/��~���&;d�p?0$����vD'���yrJh:<���Z������ /���X#a�]}<�<��Y���s%l�3���+A;�%��rfU�������������������c�����)[<w�Mc��t�A���Eh�����|.)k��~����@�(�*�$��Z��DG�Y����b�R��� 
�"N�TT���u��-�l&0��K�\�L�2��3 ���C}7�x46�i�R%Q#��x��J7R�%��/�)�n��./C��V���z�*]Z�����1JA��PE������QgzM
�����U�_�-�����,(�K�X%Z�����}
?��a��h��FJT�����1~q>,�Jx�(^��"��J���I�9R��|�����a��,�	b������S��])��s�4�N���k�T�J��:-��z�s��K��0|�E�k����������=�]���g�z��-D������8���\�/�3o;�<��F�N7j6GI��?�1�����g���3@(�=�?ZL����l����F�������E
�qH�M<�g�)T�A$�<�Hf
<�E|�&f�Gi�l��cT�\�l�6���]���a�t�q	`n|:��8�>�t���=�>�L:�/��b�����W����?�$�m��e�p:�d �c���;C�EH��.���%=8�O��\�V	Y1����2����8[�<�O��-��Z�Vw�5����N�~�s�(���f�t"���G�N����c<]������@e��O��#�
��yk�Q�3#��;������JG���9b}�C0:� �'9��r`� ��7��U�D{M��j���$��$�5IvV��T"i���h��ppF'�6Q��d>&B��(�����}��SYo/O~9=����tb\��Wzp�������`�~���p�a	�=�
�Rl]Q���_���e��H����Y~���Ln� k�u���,� T�J�~`�<Z,�'��qh��Gl���~�1�[]
�a�����6�U���{�����EZ�N��;>������U�0���v@���Tj~�����y����������]��������S�������d1'�5e�N��64��y�L��dg�x���Ss�y��a�����
<aLXt+�{oq�G���	12B�������r�$OT�SU�����qA��N	a�B����<1�{�bU�q�X�w)��#����1Ag�����Wpx9��T� k#���EJ�	�yC��zHT����
��}� �������U�������?����}���������:W]�5��:��qG��.���I�^��.�?��v���������],�#��p��4 �.y���Z��!]�`�x���|q��f�+Ai`��������7���6����v��,e�b,���bv��bzp���A��&Q+J���E���]�`�tH��MI%�I��0&�Y���8Xc��P)��B���Ar�?,��+@���jw��6nO��bqI���P��][C�;�X�����>z�.&]!��=Ko�a�lr`e
���}���������#�Olc~����XHt��(iX$��s�����A���F�>c��G7
�( ��,���Pb�fsmT� e��N!>lL��3f����tN�?�<�������NT�`7�A}xdI�Y�X.�gp���"]��+a q��N`���
%^2�4#�C)t��P�d��)fQ�U��5/z'7TYo���A�=$�pn-�VTqQ��U88�,�����V��?=�xr�5r����*a�,���+�Yp9��w�$6����A����0	D��1;f����u���x�V�]�.RI���Z=EJ�(���G����:�;7��
N��N.������M$��������-
��I������*��6�|�"E�B�n�+��|Q@Q���i���h�����zZ�����������7�����;5�x�=	�7A'����R9~g5��I(_0����P�d�jJ�:/���Mz�=��>AaBb��7
����-���vx�B_��E_��7�{���z�/A��C���%��(xf�6�������v�R�Jr"��]������5�B��U�k�Q����=�g��*��G�����U�kQ����MF�I�l�G7a/v�jYE�����D7r���\��i�X�9�|����s��$��A���@��A\��%���^B���I���uM��5O��]�'s;6OIr&Q����I�0��
eO�|�#����*���������A���8�p�y;�nz7a��m�}?o)�����\Tr;k�� ��	�NHy��E2�����r���)>ss�_�8�����mD�|;x�y\���G@L5dTd���3�%�����S�3�Fl�������Jj�������|�+�p�8H�,��G7�N��l�7q��z~�T�R�B�Aj����+')��`����
�=���n�]d��,�uS����
K�F�w;���E��Vz>S$n9�n�5`m��<)-����^�k{m��������^�k{m��������^�k{m��������^�k{m��������^�k{m��U��_W}�+8

#54

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#53)

Re: WIP: [[Parallel] Shared] Hash

On Wed, Mar 22, 2017 at 3:17 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

If I follow the new code correctly, then it doesn't matter that you've
unlink()'d to take care of the more obvious resource management chore.
You can still have a reference leak like this, if I'm not mistaken,
because you still have backend local state (local VfdCache) that is
left totally decoupled with the new "shadow resource manager" for
shared BufFiles.

You're right. The attached version fixes these problems. The
BufFiles created or opened in this new way now participate in both of
our leak-detection and clean-up schemes: the one in resowner.c
(because I'm now explicitly registering with it as I had failed to do
before) and the one in CleanupTempFiles (because FD_CLOSE_AT_EOXACT is
set, which I already had in the previous version for the creator, but
not the opener of such a file). I tested by commenting out my
explicit BufFileClose calls to check that resowner.c starts
complaining, and then by commenting out the resowner registration too
to check that CleanupTempFiles starts complaining.

I took a quick look at your V9 today. This is by no means a
comprehensive review of 0008-hj-shared-buf-file-v9.patch, but it's
what I can manage right now.

The main change you made is well represented by the following part of
the patch, where you decouple close at eoXact with delete at eoXact,
with the intention of doing one but not the other for BufFiles that
are shared:

/* these are the assigned bits in fdstate below: */
-#define FD_TEMPORARY       (1 << 0)    /* T = delete when closed */
-#define FD_XACT_TEMPORARY  (1 << 1)    /* T = delete at eoXact */
+#define FD_DELETE_AT_CLOSE (1 << 0)    /* T = delete when closed */
+#define FD_CLOSE_AT_EOXACT (1 << 1)    /* T = close at eoXact */
+#define FD_TEMP_FILE_LIMIT (1 << 2)    /* T = respect temp_file_limit */

So, shared BufFile fd.c segments within backend local resource manager
do not have FD_DELETE_AT_CLOSE set, because you mean to do that part
yourself by means of generic shared cleanup through dynamic shared
memory segment callback. So far so good.

However, I notice that the place that this happens instead,
PathNameDelete(), does not repeat the fd.c step of doing a final
stat(), and using the stats for a pgstat_report_tempfile(). So, a new
pgstat_report_tempfile() call is simply missing. However, the more
fundamental issue in my mind is: How can you fix that? Where would it
go if you had it?

If you do the obvious thing of just placing that before the new
unlink() within PathNameDelete(), on the theory that that needs parity
with the fd.c stuff, that has non-obvious implications. Does the
pgstat_report_tempfile() call need to happen when going through this
path, for example?:

+/*
+ * Destroy a shared BufFile early.  Files are normally cleaned up
+ * automatically when all participants detach, but it might be useful to
+ * reclaim disk space sooner than that.  The caller asserts that no backends
+ * will attempt to read from this file again and that only one backend will
+ * destroy it.
+ */
+void
+SharedBufFileDestroy(SharedBufFileSet *set, int partition, int participant)
+{

The theory with the unlink()'ing() function PathNameDelete(), I
gather, is that it doesn't matter if it happens to be called more than
once, say from a worker and then in an error handling path in the
leader or whatever. Do I have that right?

Obviously the concern I have about that is that any stat() call you
might add for the benefit of a new pgstat_report_tempfile() call,
needed to keep parity with fd.c, now has a risk of double counting in
error paths, if I'm not mistaken. We do need to do that accounting in
the event of error, just as we do when there is no error, at least if
current stats collector behavior is to be preserved. How can you
determine which duplicate call here is the duplicate? In other words,
how can you figure out which one is not supposed to
pgstat_report_tempfile()? If the size of temp files in each worker is
unknowable to the implementation in error paths, does it not follow
that it's unknowable to the user that queries pg_stat_database?

Now, I don't imagine that this should stump you. Maybe I'm wrong about
that possibility (that you cannot have exactly once
unlink()/stat()/whatever), or maybe I'm right and you can fix it while
preserving existing behavior, for example by relying on unlink()
reliably failing when called a second time, no matter how tight any
race was. What exact semantics does unlink() have with concurrency, as
far as the link itself goes?

If I'm not wrong about the general possibility, then maybe the
existing behavior doesn't need to be preserved in error paths, which
are after all exceptional -- it's not as if the statistics collector
is currently highly reliable. It's not obvious that you are
deliberately accepting of any of these risks or costs, though, which I
think needs to be clearer, at a minimum. What trade-off are you making
here?

Unfortunately, that's about the only useful piece of feedback that I
can think of right now -- be more explicit about what is permissible
and not permissible in this area, and do something with
pgstat_report_tempfile(). This is a bit like the
unlink()-ENOENT/-to-terminate (ENOENT ignore) issue. There are no
really hard questions here, but there certainly are some awkward
questions.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Peter Geoghegan (#54)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

Hi,

Here is a new patch series responding to feedback from Peter and Andres:

1. Support pgstat_report_tempfile and log_temp_files, which I had
overlooked as Peter pointed out.

2. Use a patch format that is acceptable to git am, per complaint
off-list from Andres. (Not actually made with git format-patch; I
need to learn some more git-fu, but they now apply cleanly with git
am).

On Thu, Mar 23, 2017 at 12:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:

I took a quick look at your V9 today. This is by no means a
comprehensive review of 0008-hj-shared-buf-file-v9.patch, but it's
what I can manage right now.

Thanks. I really appreciate your patience with the resource
management stuff I had failed to think through.

...

However, I notice that the place that this happens instead,
PathNameDelete(), does not repeat the fd.c step of doing a final
stat(), and using the stats for a pgstat_report_tempfile(). So, a new
pgstat_report_tempfile() call is simply missing. However, the more
fundamental issue in my mind is: How can you fix that? Where would it
go if you had it?

You're right. I may be missing something here (again), but it does
seem straightforward to implement because we always delete each file
that really exists exactly once (and sometimes we also try to delete
files that don't exist due to imprecise meta-data, but that isn't
harmful and we know when that turns out to be the case).

If you do the obvious thing of just placing that before the new
unlink() within PathNameDelete(), on the theory that that needs parity
with the fd.c stuff, that has non-obvious implications. Does the
pgstat_report_tempfile() call need to happen when going through this
path, for example?:
+/*
+ * Destroy a shared BufFile early.  Files are normally cleaned up
+ * automatically when all participants detach, but it might be useful to
+ * reclaim disk space sooner than that.  The caller asserts that no backends
+ * will attempt to read from this file again and that only one backend will
+ * destroy it.
+ */
+void
+SharedBufFileDestroy(SharedBufFileSet *set, int partition, int participant)
+{

Yes, I think it should definitely go into
PathNameDeleteTemporaryFile() (formerly PathNameDelete()).

The theory with the unlink()'ing() function PathNameDelete(), I
gather, is that it doesn't matter if it happens to be called more than
once, say from a worker and then in an error handling path in the
leader or whatever. Do I have that right?

Yes, it may be called for a file that doesn't exist either because it
never existed, or because it has already been deleted. To recap,
there are two reasons it needs to tolerate attempts to delete files
that aren't there:

1. To be able to delete the fd.c files backing a BufFile given only a
BufFileTag. We don't know how many segment files there are, but we
know how to build the prefix of the filename so we try to delete
[prefix].0, [prefix].1, [prefix].2 ... until we get ENOENT and
terminate. I think this sort of thing would be more questionable for
durable storage backing a database object, but for temporary files I
can't think of a problem with it.

2. SharedBufFileSet doesn't actually know how many partitions exist,
it just knows the *range* of partition numbers (because of its
conflicting fixed space and increasable partitions requirements).
From that information it can loop building BufFileTags for all backing
files that *might* exist, and in practice they usually do because we
don't tend to have a 'sparse' range of partitions.

The error handling path isn't a special case: whoever is the last to
detach from the DSM segment will delete all the files, whether that
results from an error or not. Now someone might call
SharedBufFileDestroy() to delete files sooner, but that can't happen
at the same time as a detach cleanup (the caller is still attached).

As a small optimisation avoiding a bunch of pointless unlink syscalls,
I shrink the SharedBufFileSet range if you happen to delete explicitly
with a partition number at the extremities of the range, and it so
happens that Parallel Hash Join explicitly deletes them in partition
order as the join runs, so in practice the range is empty by the time
SharedBufFileSet's cleanup runs and there is nothing to do, unless an
error occurs.

Obviously the concern I have about that is that any stat() call you
might add for the benefit of a new pgstat_report_tempfile() call,
needed to keep parity with fd.c, now has a risk of double counting in
error paths, if I'm not mistaken. We do need to do that accounting in
the event of error, just as we do when there is no error, at least if
current stats collector behavior is to be preserved. How can you
determine which duplicate call here is the duplicate? In other words,
how can you figure out which one is not supposed to
pgstat_report_tempfile()? If the size of temp files in each worker is
unknowable to the implementation in error paths, does it not follow
that it's unknowable to the user that queries pg_stat_database?

There is no double counting, if you only report after you successfully
unlink (ie if you don't get ENOENT).

In the attached patch I have refactored the reporting code into a
small function, and I added a stat call to
PathNameDeleteTemporaryFile() which differs from the FileClose()
coding only in that it tolerates ENOENT.

Now when I SET log_temp_files = 1 and then \i hj-test-queries.sql[1]/messages/by-id/CAEepm=2PRCtpo6UL4RxSbp=OXpyty0dg3oT3Vyk0eb=r8JwZhg@mail.gmail.com I
see temporary file log messages resulting from both private and shared
temporary files being deleted:

2017-03-23 18:59:55.999 NZDT [30895] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp30895.203", size 920400
2017-03-23 18:59:55.999 NZDT [30895] STATEMENT: EXPLAIN ANALYZE
SELECT COUNT(*) FROM simple r JOIN bigger_than_it_looks s USING (id);

2017-03-23 19:00:03.007 NZDT [30903] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp30895.8.1.0.0", size 9749868
2017-03-23 19:00:03.007 NZDT [30903] STATEMENT: EXPLAIN ANALYZE
SELECT COUNT(*) FROM simple r JOIN awkwardly_skewed s USING (id);

Am I missing something?

Now, I don't imagine that this should stump you. Maybe I'm wrong about
that possibility (that you cannot have exactly once
unlink()/stat()/whatever), or maybe I'm right and you can fix it while
preserving existing behavior, for example by relying on unlink()
reliably failing when called a second time, no matter how tight any
race was. What exact semantics does unlink() have with concurrency, as
far as the link itself goes?

On Unixoid systems at least, concurrent unlink() for the same file
must surely only succeed in one process and fail with ENOENT in any
others, but there is no chance for this to happen anyway:
SharedBufFileDestroy() is documented as only callable once for a given
set of parameters (even though nothing bad would happen if you broke
that rule AFAIK), and the code in the later patch that uses it adheres
to that rule, and the SharedBufFileSet cleanup can only run when the
last person detaches so there can't be a concurrent call to
SharedBufFileDestroy().

If I'm not wrong about the general possibility, then maybe the
existing behavior doesn't need to be preserved in error paths, which
are after all exceptional -- it's not as if the statistics collector
is currently highly reliable. It's not obvious that you are
deliberately accepting of any of these risks or costs, though, which I
think needs to be clearer, at a minimum. What trade-off are you making
here?

There seems no reason not to make every effort to keep the stats
collector and logs posted on these files just as we do with regular
private temporary files, and it was pure oversight that I didn't.
Thanks!

Unfortunately, that's about the only useful piece of feedback that I
can think of right now -- be more explicit about what is permissible
and not permissible in this area, and do something with
pgstat_report_tempfile(). This is a bit like the
unlink()-ENOENT/-to-terminate (ENOENT ignore) issue. There are no
really hard questions here, but there certainly are some awkward
questions.

Much appreciated.

[1]: /messages/by-id/CAEepm=2PRCtpo6UL4RxSbp=OXpyty0dg3oT3Vyk0eb=r8JwZhg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

#56

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#55)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Mar 23, 2017 at 12:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Thanks. I really appreciate your patience with the resource
management stuff I had failed to think through.

It's a surprisingly difficult problem, that almost requires
prototyping just to explain. No need to apologize. This is the process
by which many hard problems end up being solved.

However, I notice that the place that this happens instead,
PathNameDelete(), does not repeat the fd.c step of doing a final
stat(), and using the stats for a pgstat_report_tempfile(). So, a new
pgstat_report_tempfile() call is simply missing. However, the more
fundamental issue in my mind is: How can you fix that? Where would it
go if you had it?

You're right. I may be missing something here (again), but it does
seem straightforward to implement because we always delete each file
that really exists exactly once (and sometimes we also try to delete
files that don't exist due to imprecise meta-data, but that isn't
harmful and we know when that turns out to be the case).

ISTM that your patch now shares a quality with parallel tuplesort: You
may now hold files open after an unlink() of the original link/path
that they were opened using. As Robert pointed out when discussing
parallel tuplesort earlier in the week, that comes with the risk,
however small, that the vFD cache will close() the file out from under
us during LRU maintenance, resulting in a subsequent open() (at the
tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
fine to assume that we can sanely close() the file ourselves in fd.c
error paths despite a concurrent unlink(), since we never operate on
the link itself, and there probably isn't much pressure on each
backend's vFD cache. But, is that good enough? I can't say, though I
suspect that this particular risk is one that's best avoided.

I haven't tested out how much of a problem this might be for your
patch, but I do know that resowner.c will call your shared mem segment
callback before closing any backend local vFDs, so I can't imagine how
it could be that this risk doesn't exist.

FWIW, I briefly entertained the idea that we could pin a vFD for just
a moment, ensuring that the real FD could not be close()'d out by
vfdcache LRU maintenance, which would fix this problem for parallel
tuplesort, I suppose. That may not be workable for PHJ, because PHJ
would probably need to hold on to such a "pin" for much longer, owing
to the lack of any explicit "handover" phase.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Peter Geoghegan (#56)

Re: WIP: [[Parallel] Shared] Hash

On Sun, Mar 26, 2017 at 1:53 PM, Peter Geoghegan <pg@bowt.ie> wrote:

ISTM that your patch now shares a quality with parallel tuplesort: You
may now hold files open after an unlink() of the original link/path
that they were opened using. As Robert pointed out when discussing
parallel tuplesort earlier in the week, that comes with the risk,
however small, that the vFD cache will close() the file out from under
us during LRU maintenance, resulting in a subsequent open() (at the
tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
fine to assume that we can sanely close() the file ourselves in fd.c
error paths despite a concurrent unlink(), since we never operate on
the link itself, and there probably isn't much pressure on each
backend's vFD cache. But, is that good enough? I can't say, though I
suspect that this particular risk is one that's best avoided.

I haven't tested out how much of a problem this might be for your
patch, but I do know that resowner.c will call your shared mem segment
callback before closing any backend local vFDs, so I can't imagine how
it could be that this risk doesn't exist.

I wouldn't have expected anything like that to be a problem, because
FileClose() doesn't call FileAccess(). So IIUC it wouldn't ever try
to reopen a kernel fd just to close it.

But... what you said above must be a problem for Windows. I believe
it doesn't allow files to be unlinked if they are open, and I see that
DSM segments are cleaned up in resowner's phase ==
RESOURCE_RELEASE_BEFORE_LOCKS and files are closed in phase ==
RESOURCE_RELEASE_AFTER_LOCKS.

Hmm.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#57)

Re: WIP: [[Parallel] Shared] Hash

On Sat, Mar 25, 2017 at 7:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sun, Mar 26, 2017 at 1:53 PM, Peter Geoghegan <pg@bowt.ie> wrote:

ISTM that your patch now shares a quality with parallel tuplesort: You
may now hold files open after an unlink() of the original link/path
that they were opened using. As Robert pointed out when discussing
parallel tuplesort earlier in the week, that comes with the risk,
however small, that the vFD cache will close() the file out from under
us during LRU maintenance, resulting in a subsequent open() (at the
tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
fine to assume that we can sanely close() the file ourselves in fd.c
error paths despite a concurrent unlink(), since we never operate on
the link itself, and there probably isn't much pressure on each
backend's vFD cache. But, is that good enough? I can't say, though I
suspect that this particular risk is one that's best avoided.

I haven't tested out how much of a problem this might be for your
patch, but I do know that resowner.c will call your shared mem segment
callback before closing any backend local vFDs, so I can't imagine how
it could be that this risk doesn't exist.

I wouldn't have expected anything like that to be a problem, because
FileClose() doesn't call FileAccess(). So IIUC it wouldn't ever try
to reopen a kernel fd just to close it.

The concern is that something somewhere does. For example, mdread()
calls FileRead(), which calls FileAccess(), ultimately because of some
obscure catalog access. It's very hard to reason about things like
that.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#55)

Re: WIP: [[Parallel] Shared] Hash

Hi,

SharedBufFile allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment. This includes
changes to fd.c and buffile.c to support new kinds of temporary file.

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 4ca0ea4..a509c05 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c

I think the new facilities should be explained in the file's header.

@@ -68,9 +71,10 @@ struct BufFile
* avoid making redundant FileSeek calls.
*/

-	bool		isTemp;			/* can only add files if this is TRUE */
+	bool		isSegmented;	/* can only add files if this is TRUE */

That's a bit of a weird and uncommented upon change.

@@ -79,6 +83,8 @@ struct BufFile
*/
ResourceOwner resowner;

+ BufFileTag tag; /* for discoverability between backends */

Not perfectly happy with the name tag here, the name is a bit too
similar to BufferTag - something quite different.

+static void
+make_tagged_path(char *tempdirpath, char *tempfilepath,
+				 const BufFileTag *tag, int segment)
+{
+	if (tag->tablespace == DEFAULTTABLESPACE_OID ||
+		tag->tablespace == GLOBALTABLESPACE_OID)
+		snprintf(tempdirpath, MAXPGPATH, "base/%s", PG_TEMP_FILES_DIR);
+	else
+	{
+		snprintf(tempdirpath, MAXPGPATH, "pg_tblspc/%u/%s/%s",
+				 tag->tablespace, TABLESPACE_VERSION_DIRECTORY,
+				 PG_TEMP_FILES_DIR);
+	}
+
+	snprintf(tempfilepath, MAXPGPATH, "%s/%s%d.%d.%d.%d.%d", tempdirpath,
+			 PG_TEMP_FILE_PREFIX,
+			 tag->creator_pid, tag->set, tag->partition, tag->participant,
+			 segment);

Is there a risk that this ends up running afoul of filename length
limits on some platforms?

+}
+
+static File
+make_tagged_segment(const BufFileTag *tag, int segment)
+{
+	File		file;
+	char		tempdirpath[MAXPGPATH];
+	char		tempfilepath[MAXPGPATH];
+
+	/*
+	 * There is a remote chance that disk files with this (pid, set) pair
+	 * already exists after a crash-restart.  Since the presence of
+	 * consecutively numbered segment files is used by BufFileOpenShared to
+	 * determine the total size of a shared BufFile, we'll defend against
+	 * confusion by unlinking segment 1 (if it exists) before creating segment
+	 * 0.
+	 */

Gah. Why on earth aren't we removing temp files when restarting, not
just on the initial start? That seems completely wrong?

If we do decide not to change this: Why is that sufficient? Doesn't the
same problem exist for segments later than the first?

+/*
+ * Open a file that was previously created in another backend with
+ * BufFileCreateShared.
+ */
+BufFile *
+BufFileOpenTagged(const BufFileTag *tag)
+{
+	BufFile    *file = (BufFile *) palloc(sizeof(BufFile));
+	char		tempdirpath[MAXPGPATH];
+	char		tempfilepath[MAXPGPATH];
+	Size		capacity = 1024;
+	File	   *files = palloc(sizeof(File) * capacity);
+	int			nfiles = 0;
+
+	/*
+	 * We don't know how many segments there are, so we'll probe the
+	 * filesystem to find out.
+	 */
+	for (;;)
+	{
+		/* See if we need to expand our file space. */
+		if (nfiles + 1 > capacity)
+		{
+			capacity *= 2;
+			files = repalloc(files, sizeof(File) * capacity);
+		}
+		/* Try to load a segment. */
+		make_tagged_path(tempdirpath, tempfilepath, tag, nfiles);
+		files[nfiles] = PathNameOpenTemporaryFile(tempfilepath);
+		if (files[nfiles] <= 0)
+			break;

Isn't 0 a theoretically valid return value from
PathNameOpenTemporaryFile?

+/*
+ * Delete a BufFile that was created by BufFileCreateTagged.  Return true if
+ * at least one segment was deleted; false indicates that no segment was
+ * found, or an error occurred while trying to delete.  Errors are logged but
+ * the function returns normally because this is assumed to run in a clean-up
+ * path that might already involve an error.
+ */
+bool
+BufFileDeleteTagged(const BufFileTag *tag)
+{
+	char		tempdirpath[MAXPGPATH];
+	char		tempfilepath[MAXPGPATH];
+	int			segment = 0;
+	bool		found = false;
+
+	/*
+	 * We don't know if the BufFile really exists, because SharedBufFile
+	 * tracks only the range of file numbers.  If it does exists, we don't
+	 * know many 1GB segments it has, so we'll delete until we hit ENOENT or
+	 * an IO error.
+	 */
+	for (;;)
+	{
+		make_tagged_path(tempdirpath, tempfilepath, tag, segment);
+		if (!PathNameDeleteTemporaryFile(tempfilepath, false))
+			break;
+		found = true;
+		++segment;
+	}
+
+	return found;
+}

If we crash in the middle of this, we'll leave the later files abanded,
no?

+/*
+ * BufFileSetReadOnly --- flush and make read-only, in preparation for sharing
+ */
+void
+BufFileSetReadOnly(BufFile *file)
+{
+	BufFileFlush(file);
+	file->readOnly = true;
+}

That flag is unused, right?

+ * PathNameCreateTemporaryFile, PathNameOpenTemporaryFile and
+ * PathNameDeleteTemporaryFile are used for temporary files that may be shared
+ * between backends.  A File created or opened with these functions is not
+ * automatically deleted when the file is closed, but it is automatically
+ * closed and end of transaction and counts agains the temporary file limit of
+ * the backend that created it.  Any File created this way must be explicitly
+ * deleted with PathNameDeleteTemporaryFile.  Automatic file deletion is not
+ * provided because this interface is designed for use by buffile.c and
+ * indirectly by sharedbuffile.c to implement temporary files with shared
+ * ownership and cleanup.

Hm. Those name are pretty easy to misunderstand, no? s/Temp/Shared/?

 /*
+ * Called whenever a temporary file is deleted to report its size.
+ */
+static void
+ReportTemporaryFileUsage(const char *path, off_t size)
+{
+	pgstat_report_tempfile(size);
+
+	if (log_temp_files >= 0)
+	{
+		if ((size / 1024) >= log_temp_files)
+			ereport(LOG,
+					(errmsg("temporary file: path \"%s\", size %lu",
+							path, (unsigned long) size)));
+	}
+}

Man, the code for this sucks (not your fault). Shouldn't this properly
be at the buffile.c level, where we could implement limits above 1GB
properly?

+/*
+ * Open a file that was created with PathNameCreateTemporaryFile in another
+ * backend.  Files opened this way don't count agains the temp_file_limit of
+ * the caller, are read-only and are automatically closed at the end of the
+ * transaction but are not deleted on close.
+ */

This really reinforces my issues with the naming scheme. This ain't a
normal tempfile.

+File
+PathNameOpenTemporaryFile(char *tempfilepath)
+{
+	File file;
+
+	/*
+	 * Open the file.  Note: we don't use O_EXCL, in case there is an orphaned
+	 * temp file that can be reused.
+	 */
+	file = PathNameOpenFile(tempfilepath, O_RDONLY | PG_BINARY, 0);

If so, wouldn't we need to truncate the file?

+ * A single SharedBufFileSet can manage any number of 'tagged' BufFiles that
+ * are shared between a fixed number of participating backends.  Each shared
+ * BufFile can be written to by a single participant but can be read by any
+ * backend after it has been 'exported'.  Once a given BufFile is exported, it
+ * becomes read-only and cannot be extended.  To create a new shared BufFile,
+ * a participant needs its own distinct participant number, and needs to
+ * specify an arbitrary partition number for the file.  To make it available
+ * to other backends, it must be explicitly exported, which flushes internal
+ * buffers and renders it read-only.  To open a file that has been shared, a
+ * backend needs to know the number of the participant that created the file,
+ * and the partition number.  It is the responsibily of calling code to ensure
+ * that files are not accessed before they have been shared.

Hm. One way to make this safer would be to rename files when exporting.
Should be sufficient to do this to the first segment, I guess.

+ * Each file is identified by a partition number and a participant number, so
+ * that a SharedBufFileSet can be viewed as a 2D table of individual files.

I think using "files" as a term here is a bit dangerous - they're
individually segmented again, right?

+/*
+ * The number of bytes of shared memory required to construct a
+ * SharedBufFileSet.
+ */
+Size
+SharedBufFileSetSize(int participants)
+{
+	return offsetof(SharedBufFileSet, participants) +
+		sizeof(SharedBufFileParticipant) * participants;
+}

The function name sounds a bit like a function actuallize setting some
size... s/Size/DetermineSize/?

+/*
+ * Create a new file suitable for sharing.  Each backend that calls this must
+ * use a distinct participant number.  Behavior is undefined if a participant
+ * calls this more than once for the same partition number.  Partitions should
+ * ideally be numbered consecutively or in as small a range as possible,
+ * because file cleanup will scan the range of known partitions looking for
+ * files.
+ */

Wonder if we shouldn't just create a directory for all such files.

I'm a bit unhappy with the partition terminology around this. It's
getting a bit confusing. We have partitions, participants and
segements. Most of them could be understood for something entirely
different than the meaning you have here...

+static void
+shared_buf_file_on_dsm_detach(dsm_segment *segment, Datum datum)
+{
+	bool unlink_files = false;
+	SharedBufFileSet *set = (SharedBufFileSet *) DatumGetPointer(datum);
+
+	SpinLockAcquire(&set->mutex);
+	Assert(set->refcount > 0);
+	if (--set->refcount == 0)
+		unlink_files = true;
+	SpinLockRelease(&set->mutex);

I'm a bit uncomfortable with releasing a refcount, and then still using
the memory from the set... I don't think there's a concrete danger
here as the code stands, but it's a fairly dangerous pattern.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#55)

Re: WIP: [[Parallel] Shared] Hash

On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:

Here is a new patch series responding to feedback from Peter and Andres:

+
+/* Per-participant shared state. */
+typedef struct SharedTuplestoreParticipant
+{
+	LWLock lock;

Hm. No padding (ala LWLockMinimallyPadded / LWLockPadded) - but that's
probably ok, for now.

+	bool error;					/* Error occurred flag. */
+	bool eof;					/* End of file reached. */
+	int read_fileno;			/* BufFile segment file number. */
+	off_t read_offset;			/* Offset within segment file. */

Hm. I wonder if it'd not be better to work with 64bit offsets, and just
separate that out upon segment access.

+/* The main data structure in shared memory. */

"main data structure" isn't particularly meaningful.

+struct SharedTuplestore
+{
+	int reading_partition;
+	int nparticipants;
+	int flags;

Maybe add a comment saying /* flag bits from SHARED_TUPLESTORE_* */?

+ Size meta_data_size;

What's this?

+ SharedTuplestoreParticipant participants[FLEXIBLE_ARRAY_MEMBER];

I'd add a comment here, that there's further data after participants.

+};

+
+/* Per-participant backend-private state. */
+struct SharedTuplestoreAccessor
+{

Hm. The name and it being backend-local are a bit conflicting.

+	int participant;			/* My partitipant number. */
+	SharedTuplestore *sts;		/* The shared state. */
+	int nfiles;					/* Size of local files array. */
+	BufFile **files;			/* Files we have open locally for writing. */

Shouldn't this mention that it's indexed by partition?

+	BufFile *read_file;			/* The current file to read from. */
+	int read_partition;			/* The current partition to read from. */
+	int read_participant;		/* The current participant to read from. */
+	int read_fileno;			/* BufFile segment file number. */
+	off_t read_offset;			/* Offset within segment file. */
+};

+/*
+ * Initialize a SharedTuplestore in existing shared memory.  There must be
+ * space for sts_size(participants) bytes.  If flags is set to the value
+ * SHARED_TUPLESTORE_SINGLE_PASS then each partition may only be read once,
+ * because underlying files will be deleted.

Any reason not to use flags that are compatible with tuplestore.c?

+ * Tuples that are stored may optionally carry a piece of fixed sized
+ * meta-data which will be retrieved along with the tuple.  This is useful for
+ * the hash codes used for multi-batch hash joins, but could have other
+ * applications.
+ */
+SharedTuplestoreAccessor *
+sts_initialize(SharedTuplestore *sts, int participants,
+			   int my_participant_number,
+			   Size meta_data_size,
+			   int flags,
+			   dsm_segment *segment)
+{

Not sure I like that the naming here has little in common with
tuplestore.h's api.

+
+MinimalTuple
+sts_gettuple(SharedTuplestoreAccessor *accessor, void *meta_data)
+{

This needs docs.

+	SharedBufFileSet *fileset = GetSharedBufFileSet(accessor->sts);
+	MinimalTuple tuple = NULL;
+
+	for (;;)
+	{

...
+		/* Check if this participant's file has already been entirely read. */
+		if (participant->eof)
+		{
+			BufFileClose(accessor->read_file);
+			accessor->read_file = NULL;
+			LWLockRelease(&participant->lock);
+			continue;

Why are we closing the file while holding the lock?

+
+		/* Read the optional meta-data. */
+		eof = false;
+		if (accessor->sts->meta_data_size > 0)
+		{
+			nread = BufFileRead(accessor->read_file, meta_data,
+								accessor->sts->meta_data_size);
+			if (nread == 0)
+				eof = true;
+			else if (nread != accessor->sts->meta_data_size)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read from temporary file: %m")));
+		}
+
+		/* Read the size. */
+		if (!eof)
+		{
+			nread = BufFileRead(accessor->read_file, &tuple_size, sizeof(tuple_size));
+			if (nread == 0)
+				eof = true;

Why is it legal to have EOF here, if metadata previously didn't have an
EOF? Perhaps add an error if accessor->sts->meta_data_size != 0?

+		if (eof)
+		{
+			participant->eof = true;
+			if ((accessor->sts->flags & SHARED_TUPLESTORE_SINGLE_PASS) != 0)
+				SharedBufFileDestroy(fileset, accessor->read_partition,
+									 accessor->read_participant);
+
+			participant->error = false;
+			LWLockRelease(&participant->lock);
+
+			/* Move to next participant's file. */
+			BufFileClose(accessor->read_file);
+			accessor->read_file = NULL;
+			continue;
+		}
+
+		/* Read the tuple. */
+		tuple = (MinimalTuple) palloc(tuple_size);
+		tuple->t_len = tuple_size;

Hm. Constantly re-allocing this doesn't strike me as a good idea (not to
mention that the API doesn't mention this is newly allocated). Seems
like it'd be a better idea to have a per-accessor buffer where this can
be stored in - increased in size when necessary.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#59)

Re: WIP: [[Parallel] Shared] Hash

On Mon, Mar 27, 2017 at 9:41 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

SharedBufFile allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment. This includes
changes to fd.c and buffile.c to support new kinds of temporary file.
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 4ca0ea4..a509c05 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
I think the new facilities should be explained in the file's header.

Will do.

@@ -68,9 +71,10 @@ struct BufFile
* avoid making redundant FileSeek calls.
*/
-       bool            isTemp;                 /* can only add files if this is TRUE */
+       bool            isSegmented;    /* can only add files if this is TRUE */
That's a bit of a weird and uncommented upon change.

I was trying to cut down on the number of places we use the word
'temporary' to activate various different behaviours. In this case,
the only thing it controls is whether the BufFile is backed by one
single fd.c File or many segments, so I figured it should be renamed.

As Peter and you have pointed out, there may be a case for removing it
altogether.

@@ -79,6 +83,8 @@ struct BufFile
*/
ResourceOwner resowner;

+ BufFileTag tag; /* for discoverability between backends */

Not perfectly happy with the name tag here, the name is a bit too
similar to BufferTag - something quite different.

Yeah, will rename.

+static void
+make_tagged_path(char *tempdirpath, char *tempfilepath,
+                                const BufFileTag *tag, int segment)
+{
+       if (tag->tablespace == DEFAULTTABLESPACE_OID ||
+               tag->tablespace == GLOBALTABLESPACE_OID)
+               snprintf(tempdirpath, MAXPGPATH, "base/%s", PG_TEMP_FILES_DIR);
+       else
+       {
+               snprintf(tempdirpath, MAXPGPATH, "pg_tblspc/%u/%s/%s",
+                                tag->tablespace, TABLESPACE_VERSION_DIRECTORY,
+                                PG_TEMP_FILES_DIR);
+       }
+
+       snprintf(tempfilepath, MAXPGPATH, "%s/%s%d.%d.%d.%d.%d", tempdirpath,
+                        PG_TEMP_FILE_PREFIX,
+                        tag->creator_pid, tag->set, tag->partition, tag->participant,
+                        segment);

Is there a risk that this ends up running afoul of filename length
limits on some platforms?

Hmm. I didn't think so. Do we have a project guideline on maximum
path lengths based on some kind of survey? There are various limits
involved (filesystem and OS per-path-component limits, total limits,
and the confusing PATH_MAX, MAX_PATH etc macros), but I was under the
impression that these numbers were always at least 255. This scheme
seems capable of producing ~50 bytes in the final component
(admittedly more if int is 64 bits), and then nowhere near enough to
reach a limit of that order in the earlier components.

+}
+
+static File
+make_tagged_segment(const BufFileTag *tag, int segment)
+{
+       File            file;
+       char            tempdirpath[MAXPGPATH];
+       char            tempfilepath[MAXPGPATH];
+
+       /*
+        * There is a remote chance that disk files with this (pid, set) pair
+        * already exists after a crash-restart.  Since the presence of
+        * consecutively numbered segment files is used by BufFileOpenShared to
+        * determine the total size of a shared BufFile, we'll defend against
+        * confusion by unlinking segment 1 (if it exists) before creating segment
+        * 0.
+        */

Gah. Why on earth aren't we removing temp files when restarting, not
just on the initial start? That seems completely wrong?

See the comment above RemovePgTempFiles in fd.c. From comments on
this list I understand that this is a subject that Robert and Tom
don't agree on. I don't mind either way, but as long as
RemovePgTempFiles works that way and my patch uses the existence of
files to know how many files there are, I have to defend against that
danger by making sure that I don't accidentally identify files from
before a crash/restart as active.

If we do decide not to change this: Why is that sufficient? Doesn't the
same problem exist for segments later than the first?

It does exist and it is handled. The comment really should say
"unlinking segment N + 1 (if it exists) before creating segment N".
Will update.

+/*
+ * Open a file that was previously created in another backend with
+ * BufFileCreateShared.
+ */
+BufFile *
+BufFileOpenTagged(const BufFileTag *tag)
+{
+       BufFile    *file = (BufFile *) palloc(sizeof(BufFile));
+       char            tempdirpath[MAXPGPATH];
+       char            tempfilepath[MAXPGPATH];
+       Size            capacity = 1024;
+       File       *files = palloc(sizeof(File) * capacity);
+       int                     nfiles = 0;
+
+       /*
+        * We don't know how many segments there are, so we'll probe the
+        * filesystem to find out.
+        */
+       for (;;)
+       {
+               /* See if we need to expand our file space. */
+               if (nfiles + 1 > capacity)
+               {
+                       capacity *= 2;
+                       files = repalloc(files, sizeof(File) * capacity);
+               }
+               /* Try to load a segment. */
+               make_tagged_path(tempdirpath, tempfilepath, tag, nfiles);
+               files[nfiles] = PathNameOpenTemporaryFile(tempfilepath);
+               if (files[nfiles] <= 0)
+                       break;

Isn't 0 a theoretically valid return value from
PathNameOpenTemporaryFile?

I was confused by that too, because it isn't the way normal OS fds
work. But existing code dealing with Postgres vfd return values
treats 0 as an error. See for example OpenTemporaryFile and
OpenTemporaryFileInTablespace.

+/*
+ * Delete a BufFile that was created by BufFileCreateTagged.  Return true if
+ * at least one segment was deleted; false indicates that no segment was
+ * found, or an error occurred while trying to delete.  Errors are logged but
+ * the function returns normally because this is assumed to run in a clean-up
+ * path that might already involve an error.
+ */
+bool
+BufFileDeleteTagged(const BufFileTag *tag)
+{
+       char            tempdirpath[MAXPGPATH];
+       char            tempfilepath[MAXPGPATH];
+       int                     segment = 0;
+       bool            found = false;
+
+       /*
+        * We don't know if the BufFile really exists, because SharedBufFile
+        * tracks only the range of file numbers.  If it does exists, we don't
+        * know many 1GB segments it has, so we'll delete until we hit ENOENT or
+        * an IO error.
+        */
+       for (;;)
+       {
+               make_tagged_path(tempdirpath, tempfilepath, tag, segment);
+               if (!PathNameDeleteTemporaryFile(tempfilepath, false))
+                       break;
+               found = true;
+               ++segment;
+       }
+
+       return found;
+}

If we crash in the middle of this, we'll leave the later files abanded,
no?

Yes. In general, there are places we can crash or unplug the server
etc and leave files behind. In that case, RemovePgTempFiles cleans up
(or declines to do so deliberately to support debugging, as
discussed).

+/*
+ * BufFileSetReadOnly --- flush and make read-only, in preparation for sharing
+ */
+void
+BufFileSetReadOnly(BufFile *file)
+{
+       BufFileFlush(file);
+       file->readOnly = true;
+}

That flag is unused, right?

It's used for an assertion in BufFileWrite. Maybe could be
elog(ERROR, ...) instead, but either way it's a debugging aid to
report misuse.

+ * PathNameCreateTemporaryFile, PathNameOpenTemporaryFile and
+ * PathNameDeleteTemporaryFile are used for temporary files that may be shared
+ * between backends.  A File created or opened with these functions is not
+ * automatically deleted when the file is closed, but it is automatically
+ * closed and end of transaction and counts agains the temporary file limit of
+ * the backend that created it.  Any File created this way must be explicitly
+ * deleted with PathNameDeleteTemporaryFile.  Automatic file deletion is not
+ * provided because this interface is designed for use by buffile.c and
+ * indirectly by sharedbuffile.c to implement temporary files with shared
+ * ownership and cleanup.

Hm. Those name are pretty easy to misunderstand, no? s/Temp/Shared/?

Hmm. Yeah these may be better. Will think about that.

/*
+ * Called whenever a temporary file is deleted to report its size.
+ */
+static void
+ReportTemporaryFileUsage(const char *path, off_t size)
+{
+       pgstat_report_tempfile(size);
+
+       if (log_temp_files >= 0)
+       {
+               if ((size / 1024) >= log_temp_files)
+                       ereport(LOG,
+                                       (errmsg("temporary file: path \"%s\", size %lu",
+                                                       path, (unsigned long) size)));
+       }
+}

Man, the code for this sucks (not your fault). Shouldn't this properly
be at the buffile.c level, where we could implement limits above 1GB
properly?

+/*
+ * Open a file that was created with PathNameCreateTemporaryFile in another
+ * backend.  Files opened this way don't count agains the temp_file_limit of
+ * the caller, are read-only and are automatically closed at the end of the
+ * transaction but are not deleted on close.
+ */

This really reinforces my issues with the naming scheme. This ain't a
normal tempfile.

It sort of makes sense if you consider that a 'named' temporary file
is different... but yeah, point taken.

+File
+PathNameOpenTemporaryFile(char *tempfilepath)
+{
+       File file;
+
+       /*
+        * Open the file.  Note: we don't use O_EXCL, in case there is an orphaned
+        * temp file that can be reused.
+        */
+       file = PathNameOpenFile(tempfilepath, O_RDONLY | PG_BINARY, 0);

If so, wouldn't we need to truncate the file?

Yes, this lacks O_TRUNC. Thanks.

+ * A single SharedBufFileSet can manage any number of 'tagged' BufFiles that
+ * are shared between a fixed number of participating backends.  Each shared
+ * BufFile can be written to by a single participant but can be read by any
+ * backend after it has been 'exported'.  Once a given BufFile is exported, it
+ * becomes read-only and cannot be extended.  To create a new shared BufFile,
+ * a participant needs its own distinct participant number, and needs to
+ * specify an arbitrary partition number for the file.  To make it available
+ * to other backends, it must be explicitly exported, which flushes internal
+ * buffers and renders it read-only.  To open a file that has been shared, a
+ * backend needs to know the number of the participant that created the file,
+ * and the partition number.  It is the responsibily of calling code to ensure
+ * that files are not accessed before they have been shared.

Hm. One way to make this safer would be to rename files when exporting.
Should be sufficient to do this to the first segment, I guess.

Interesting idea. Will think about that. That comment isn't great
and repeats itself. Will improve.

+ * Each file is identified by a partition number and a participant number, so
+ * that a SharedBufFileSet can be viewed as a 2D table of individual files.
I think using "files" as a term here is a bit dangerous - they're
individually segmented again, right?

True. It's a 2D matrix of BufFiles. The word "file" is super
overloaded here. Will fix.

+/*
+ * The number of bytes of shared memory required to construct a
+ * SharedBufFileSet.
+ */
+Size
+SharedBufFileSetSize(int participants)
+{
+       return offsetof(SharedBufFileSet, participants) +
+               sizeof(SharedBufFileParticipant) * participants;
+}

The function name sounds a bit like a function actuallize setting some
size... s/Size/DetermineSize/?

Hmm yeah "set" as verb vs "set" as noun. I think "estimate" is the
established word for this sort of thing (even though that seems
strange because it sounds like it doesn't have to be exactly right:
clearly in all these shmem-space-reservation functions it has to be
exactly right). Will change.

+/*
+ * Create a new file suitable for sharing.  Each backend that calls this must
+ * use a distinct participant number.  Behavior is undefined if a participant
+ * calls this more than once for the same partition number.  Partitions should
+ * ideally be numbered consecutively or in as small a range as possible,
+ * because file cleanup will scan the range of known partitions looking for
+ * files.
+ */

Wonder if we shouldn't just create a directory for all such files.

Hmm. Yes, that could work well. Will try that.

I'm a bit unhappy with the partition terminology around this. It's
getting a bit confusing. We have partitions, participants and
segements. Most of them could be understood for something entirely
different than the meaning you have here...

Ok. Let me try to explain and defend them and see if we can come up
with something better.

1. Segments are what buffile.c already calls the individual
capped-at-1GB files that it manages. They are an implementation
detail that is not part of buffile.c's user interface. There seems to
be no reason to change that.

My understanding is that this was done to support pre-large-file
filesystems/OSs which limited files to 2^31 or 2^32 bytes, and we
decided to cap the segments at 1GB (maybe some ancient OS had a 2^30
limit, or maybe it was just a conservative choice that's easy for
humans to think about). We could perhaps get rid of that entirely
today without anyone complaining and just use one big file, though
don't know that and I'm not suggesting it. (One argument against that
is that the parallel CREATE INDEX patch actually makes use of the
segmented nature of BufFiles to splice them together, to 'unify' a
bunch of worker LogicalTapeSets to create one LogicalTapeSet. That's
off topic here but it's in the back of my mind as a potential client
of this code. I'll have more to say about that over on the parallel
CREATE INDEX thread shortly.)

2. Partitions here = 'batches'. The 'batches' used by the hash join
code are formally partitions in all the literature on hash joins, and
I bet that anyone else doing parallel work that involves sharing
temporary disk files will run into the need for partitioning. I think
you are complaining that we now have a database object called a
PARTITION, and that may be a problem because we're overloading the
term. But it's the same name because it's mathematically the same
thing. We don't complain about the existence of 'lock tables' or
'hash tables' just because there is a database object called a TABLE.
I considered other names for this, like "file number", but it was
confusing and vague. I keep coming back to "partition" for this,
because fundamentally this is for partitioning temporary data. I
could maybe call it "file_partition" or something?

3. Participants are what I have taken to calling the processes
involved in parallel query, when I mean the larger set that includes
workers + leader. It may seem a little odd that such a thing appears
in an API that deals with temporary files. But the basic idea here is
that each participant gets to write out its own partial results, for
each partition. Stepping back a bit, that means that there are two
kinds of partitioning going on at the same time. Partitioning the
keyspace into batch numbers, and then the arbitrary partitioning that
comes from each participant processing partial plans. This is how
SharedBufFileSet finishes up managing a 2D matrix of BufFiles.

You might argue that buffile.c shouldn't know about partitions and
participants. The only thing I really need here is for BufFileTag (to
be renamed) to be able to name things differently. Perhaps it should
just include a char[] buffer for a name fragment, and the
SharedBufFileSet should encode the partition and participant numbers
into it, rather than exposing these rather higher level concepts to
buffile.c. I will think about that.

(Perhaps SharedBufFileSet should be called PartitionedBufFileSet?)

+static void
+shared_buf_file_on_dsm_detach(dsm_segment *segment, Datum datum)
+{
+       bool unlink_files = false;
+       SharedBufFileSet *set = (SharedBufFileSet *) DatumGetPointer(datum);
+
+       SpinLockAcquire(&set->mutex);
+       Assert(set->refcount > 0);
+       if (--set->refcount == 0)
+               unlink_files = true;
+       SpinLockRelease(&set->mutex);
I'm a bit uncomfortable with releasing a refcount, and then still using
the memory from the set... I don't think there's a concrete danger
here as the code stands, but it's a fairly dangerous pattern.

Will fix.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#61)

Re: WIP: [[Parallel] Shared] Hash

On Mon, Mar 27, 2017 at 11:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> >> Is there a risk that this ends up
running afoul of filename length

limits on some platforms?

Hmm. I didn't think so. Do we have a project guideline on maximum
path lengths based on some kind of survey? There are various limits
involved (filesystem and OS per-path-component limits, total limits,
and the confusing PATH_MAX, MAX_PATH etc macros), but I was under the
impression that these numbers were always at least 255. This scheme
seems capable of producing ~50 bytes in the final component
(admittedly more if int is 64 bits), and then nowhere near enough to
reach a limit of that order in the earlier components.

Err, plus prefix. Still seems unlikely to be too long.

I'm a bit unhappy with the partition terminology around this. It's
getting a bit confusing. We have partitions, participants and
segements. Most of them could be understood for something entirely
different than the meaning you have here...

Ok. Let me try to explain and defend them and see if we can come up
with something better.

1. Segments are what buffile.c already calls the individual
capped-at-1GB files that it manages. They are an implementation
detail that is not part of buffile.c's user interface. There seems to
be no reason to change that.

After reading your next email I realised this is not quite true:
BufFileTell and BufFileSeek expose the existence of segments.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#62)

Re: WIP: [[Parallel] Shared] Hash

On Sun, Mar 26, 2017 at 3:41 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

1. Segments are what buffile.c already calls the individual
capped-at-1GB files that it manages. They are an implementation
detail that is not part of buffile.c's user interface. There seems to
be no reason to change that.

After reading your next email I realised this is not quite true:
BufFileTell and BufFileSeek expose the existence of segments.

Yeah, that's something that tuplestore.c itself relies on.

I always thought that the main reason practical why we have BufFile
multiplex 1GB segments concerns use of temp_tablespaces, rather than
considerations that matter only when using obsolete file systems:

/*
* We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
* The reason is that we'd like large temporary BufFiles to be spread across
* multiple tablespaces when available.
*/

Now, I tend to think that most installations that care about
performance would be better off using RAID to stripe their one temp
tablespace file system. But, I suppose this still makes sense when you
have a number of file systems that happen to be available, and disk
capacity is the main concern. PHJ uses one temp tablespace per worker,
which I further suppose might not be as effective in balancing disk
space usage.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Peter Geoghegan (#63)

Re: WIP: [[Parallel] Shared] Hash

On Mon, Mar 27, 2017 at 12:12 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Sun, Mar 26, 2017 at 3:41 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

1. Segments are what buffile.c already calls the individual
capped-at-1GB files that it manages. They are an implementation
detail that is not part of buffile.c's user interface. There seems to
be no reason to change that.

After reading your next email I realised this is not quite true:
BufFileTell and BufFileSeek expose the existence of segments.

Yeah, that's something that tuplestore.c itself relies on.

I always thought that the main reason practical why we have BufFile
multiplex 1GB segments concerns use of temp_tablespaces, rather than
considerations that matter only when using obsolete file systems:

/*
* We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
* The reason is that we'd like large temporary BufFiles to be spread across
* multiple tablespaces when available.
*/

Now, I tend to think that most installations that care about
performance would be better off using RAID to stripe their one temp
tablespace file system. But, I suppose this still makes sense when you
have a number of file systems that happen to be available, and disk
capacity is the main concern. PHJ uses one temp tablespace per worker,
which I further suppose might not be as effective in balancing disk
space usage.

I was thinking about IO bandwidth balance rather than size. If you
rotate through tablespaces segment-by-segment, won't you be exposed to
phasing effects that could leave disk arrays idle for periods of time?
Whereas if you assign them to participants, you can only get idle
arrays if you have fewer participants than tablespaces.

This seems like a fairly complex subtopic and I don't have a strong
view on it. Clearly you could rotate through tablespaces on the basis
of participant, partition, segment, some combination, or something
else. Doing it by participant seemed to me to be the least prone to
IO imbalance cause by phasing effects (= segment based) or data
distribution (= partition based), of the options I considered when I
wrote it that way.

Like you, I also tend to suspect that people would be more likely to
use RAID type technologies to stripe things like this for both
bandwidth and space reasons these days. Tablespaces seem to make more
sense as a way of separating different classes of storage
(fast/expensive, slow/cheap etc), not as an IO or space striping
technique. I may be way off base there though...

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Thomas Munro (#64)

Re: WIP: [[Parallel] Shared] Hash

On Sun, Mar 26, 2017 at 6:50 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Like you, I also tend to suspect that people would be more likely to
use RAID type technologies to stripe things like this for both
bandwidth and space reasons these days. Tablespaces seem to make more
sense as a way of separating different classes of storage
(fast/expensive, slow/cheap etc), not as an IO or space striping
technique.

I agree.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#57)

Re: WIP: [[Parallel] Shared] Hash

On Sun, Mar 26, 2017 at 3:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

But... what you said above must be a problem for Windows. I believe
it doesn't allow files to be unlinked if they are open, and I see that
DSM segments are cleaned up in resowner's phase ==
RESOURCE_RELEASE_BEFORE_LOCKS and files are closed in phase ==
RESOURCE_RELEASE_AFTER_LOCKS.

I thought this last point about Windows might be fatal to my design,
but it seems that Windows since at least version 2000 has support for
Unixoid unlinkability via the special flag FILE_SHARE_DELETE.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#55)

Re: WIP: [[Parallel] Shared] Hash

On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:

Here is a new patch series responding to feedback from Peter and Andres:

Here's a review of 0007 & 0010 together - they're going to have to be
applied together anyway...

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ac339fb566..775c9126c7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3814,6 +3814,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>

+     <varlistentry id="guc-cpu-shared-tuple-cost" xreflabel="cpu_shared_tuple_cost">
+      <term><varname>cpu_shared_tuple_cost</varname> (<type>floating point</type>)
+      <indexterm>
+       <primary><varname>cpu_shared_tuple_cost</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the planner's estimate of the cost of sharing rows in
+        memory during a parallel query.
+        The default is 0.001.
+       </para>
+      </listitem>
+     </varlistentry>
+

Isn't that really low in comparison to the other costs? I think
specifying a bit more what this actually measures would be good too - is
it putting the tuple in shared memory? Is it accessing it?

+     <varlistentry id="guc-cpu-synchronization-cost" xreflabel="cpu_synchronization_cost">
+      <term><varname>cpu_synchronization_cost</varname> (<type>floating point</type>)
+      <indexterm>
+       <primary><varname>cpu_synchronization_cost</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the planner's estimate of the cost of waiting at synchronization
+        points for other processes while executing parallel queries.
+        The default is 1.0.
+       </para>
+      </listitem>
+     </varlistentry>

Isn't this also really cheap in comparison to a, probably cached, seq
page read?

+	if (HashJoinTableIsShared(hashtable))
+	{
+		/*
+		 * Synchronize parallel hash table builds.  At this stage we know that
+		 * the shared hash table has been created, but we don't know if our
+		 * peers are still in MultiExecHash and if so how far through.  We use
+		 * the phase to synchronize with them.
+		 */
+		barrier = &hashtable->shared->barrier;
+
+		switch (BarrierPhase(barrier))
+		{
+		case PHJ_PHASE_BEGINNING:

Note pgindent will indent this further. Might be worthwhile to try to
pgindent the file, revert some of the unintended damage.

/*
* set expression context
*/

I'd still like this to be moved to the start.

@@ -126,17 +202,79 @@ MultiExecHash(HashState *node)
 				/* Not subject to skew optimization, so insert normally */
 				ExecHashTableInsert(hashtable, slot, hashvalue);
 			}
-			hashtable->totalTuples += 1;
+			hashtable->partialTuples += 1;
+			if (!HashJoinTableIsShared(hashtable))
+				hashtable->totalTuples += 1;
 		}
 	}

FWIW, I'd put HashJoinTableIsShared() into a local var - the compiler
won't be able to do that on its own because external function calls
could invalidate the result.

That brings me to a related topic: Have you measured whether your
changes cause performance differences?

+ finish_loading(hashtable);

I find the sudden switch to a different naming scheme in the same file a
bit jarring.

+	if (HashJoinTableIsShared(hashtable))
+		BarrierDetach(&hashtable->shared->shrink_barrier);
+
+	if (HashJoinTableIsShared(hashtable))
+	{

Consecutive if blocks with the same condition...

+		bool elected_to_resize;
+
+		/*
+		 * Wait for all backends to finish building.  If only one worker is
+		 * running the building phase because of a non-partial inner plan, the
+		 * other workers will pile up here waiting.  If multiple worker are
+		 * building, they should finish close to each other in time.
+		 */

That comment is outdated, isn't it?

 	/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-	if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-		ExecHashIncreaseNumBuckets(hashtable);
+	ExecHashUpdate(hashtable);
+	ExecHashIncreaseNumBuckets(hashtable);

So this now doesn't actually increase the number of buckets anymore.

+ reinsert:
+	/* If the table was resized, insert tuples into the new buckets. */
+	ExecHashUpdate(hashtable);
+	ExecHashReinsertAll(hashtable);

ReinsertAll just happens to do nothing if we didn't have to
resize... Not entirely obvious, sure reads as if it were unconditional.
Also, it's not actually "All" when batching is in use, no?

+ post_resize:
+	if (HashJoinTableIsShared(hashtable))
+	{
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+		BarrierWait(barrier, WAIT_EVENT_HASH_RESIZING);
+		Assert(BarrierPhase(barrier) == PHJ_PHASE_REINSERTING);
+	}
+
+ reinsert:
+	/* If the table was resized, insert tuples into the new buckets. */
+	ExecHashUpdate(hashtable);
+	ExecHashReinsertAll(hashtable);

Hm. So even non-resizing backends reach this - but they happen to not
do anything because there's no work queued up, right? That's, uh, not
obvious.

For me the code here would be a good bit easier to read if we had a
MultiExecHash and MultiExecParallelHash. Half of MultiExecHash is just
if(IsShared) blocks, and copying would avoid potential slowdowns.

+		/*
+		 * Set up for skew optimization, if possible and there's a need for
+		 * more than one batch.  (In a one-batch join, there's no point in
+		 * it.)
+		 */
+		if (nbatch > 1)
+			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);

So there's no equivalent to the skew optimization for parallel query
yet... It doesn't sound like that should be particulalry hard on first
blush?

 static void
-ExecHashIncreaseNumBatches(HashJoinTable hashtable)
+ExecHashIncreaseNumBatches(HashJoinTable hashtable, int nbatch)

So this doesn't actually increase the number of batches anymore... At
the very least this should mention that the main work is done in
ExecHashShrink.

+/*
+ * Process the queue of chunks whose tuples need to be redistributed into the
+ * correct batches until it is empty.  In the best case this will shrink the
+ * hash table, keeping about half of the tuples in memory and sending the rest
+ * to a future batch.
+ */
+static void
+ExecHashShrink(HashJoinTable hashtable)

Should mention this really only is meaningful after
ExecHashIncreaseNumBatches has run.

+{
+	long		ninmemory;
+	long		nfreed;
+	dsa_pointer chunk_shared;
+	HashMemoryChunk chunk;

-	/* If know we need to resize nbuckets, we can do it while rebatching. */
-	if (hashtable->nbuckets_optimal != hashtable->nbuckets)
+	if (HashJoinTableIsShared(hashtable))
 	{
-		/* we never decrease the number of buckets */
-		Assert(hashtable->nbuckets_optimal > hashtable->nbuckets);
+		/*
+		 * Since a newly launched participant could arrive while shrinking is
+		 * already underway, we need to be able to jump to the correct place
+		 * in this function.
+		 */
+		switch (PHJ_SHRINK_PHASE(BarrierPhase(&hashtable->shared->shrink_barrier)))
+		{
+		case PHJ_SHRINK_PHASE_BEGINNING: /* likely case */
+			break;
+		case PHJ_SHRINK_PHASE_CLEARING:
+			goto clearing;
+		case PHJ_SHRINK_PHASE_WORKING:
+			goto working;
+		case PHJ_SHRINK_PHASE_DECIDING:
+			goto deciding;
+		}

Hm, so we jump into different nesting levels here :/

ok, ENOTIME for today...

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index f2c885afbe..87d8f3766e 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -6,10 +6,78 @@
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- *
  * IDENTIFICATION
  *	  src/backend/executor/nodeHashjoin.c
  *
+ * NOTES:
+ *
+ * PARALLELISM
+ *
+ * Hash joins can participate in parallel queries in two ways: in
+ * non-parallel-aware mode, where each backend builds an identical hash table
+ * and then probes it with a partial outer relation, or parallel-aware mode
+ * where there is a shared hash table that all participants help to build.  A
+ * parallel-aware hash join can save time and space by dividing the work up
+ * and sharing the result, but has extra communication overheads.

There's a third, right? The hashjoin, and everything below it, could
also not be parallel, but above it could be some parallel aware node
(e.g. a parallel aware HJ).

+ * In both cases, hash joins use a private state machine to track progress
+ * through the hash join algorithm.

That's not really parallel specific, right? Perhaps just say that
parallel HJs use the normal state machine?

+ * In a parallel-aware hash join, there is also a shared 'phase' which
+ * co-operating backends use to synchronize their local state machine and
+ * program counter with the multi-process join.  The phase is managed by a
+ * 'barrier' IPC primitive.

Hm. I wonder if 'phase' shouldn't just be name
sharedHashJoinState. Might be a bit easier to understand than a
different terminology.

+ * The phases are as follows:
+ *
+ *   PHJ_PHASE_BEGINNING   -- initial phase, before any participant acts
+ *   PHJ_PHASE_CREATING	   -- one participant creates the shmem hash table
+ *   PHJ_PHASE_BUILDING	   -- all participants build the hash table
+ *   PHJ_PHASE_RESIZING	   -- one participant decides whether to expand buckets
+ *   PHJ_PHASE_REINSERTING -- all participants reinsert tuples if necessary
+ *   PHJ_PHASE_PROBING	   -- all participants probe the hash table
+ *   PHJ_PHASE_UNMATCHED   -- all participants scan for unmatched tuples

I think somewhere here - and probably around the sites it's happening -
should mention that state transitions are done kinda implicitly via
BarrierWait progressing to the numerically next phase. That's not
entirely obvious (and actually limits what the barrier mechanism can be
used for...).

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#66)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Mon, Mar 27, 2017 at 12:20 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sun, Mar 26, 2017 at 3:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

But... what you said above must be a problem for Windows. I believe
it doesn't allow files to be unlinked if they are open, and I see that
DSM segments are cleaned up in resowner's phase ==
RESOURCE_RELEASE_BEFORE_LOCKS and files are closed in phase ==
RESOURCE_RELEASE_AFTER_LOCKS.

I thought this last point about Windows might be fatal to my design,
but it seems that Windows since at least version 2000 has support for
Unixoid unlinkability via the special flag FILE_SHARE_DELETE.

On testing v10 of this patch over commit
b54aad8e34bd6299093e965c50f4a23da96d7cc3 and applying the tweak
mentioned in [1]/messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/, for TPC-H queries I found the results quite
encouraging,

Experimental setup:
TPC-H scale factor - 20
work_mem = 1GB
shared_buffers = 10GB
effective_cache_size = 10GB
random_page_cost = seq_page_cost = 0.1
max_parallel_workers_per_gather = 4

Performance numbers:
(Time in seconds)
Query | Head | Patch |
-------------------------------
Q3 | 73 | 37 |
Q5 | 56 | 31 |
Q7 | 40 | 30 |
Q8 | 8 | 8 |
Q9 | 85 | 42 |
Q10 | 86 | 46 |
Q14 | 11 | 6 |
Q16 | 32 | 11 |
Q21 | 53 | 56 |

Please find the attached file for the explain analyse output of these
queries on head as well as patch.
Would be working on analysing the performance of this patch on 300 scale factor.

[1]: /messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

ph_v10_performance.tar.gzapplication/x-gzip; name=ph_v10_performance.tar.gzDownload

�n��X������J�L����3?�sy`:�����8�D��f<��q�`��hu7d��Q�c���~?�y�s������������h/]:���keV��~���w�������;���1������TK����(�����j���	���Q&������(�����k���w��~n��3���������������W��)?�����~u���v���������	j8���4�]���7��������O��hw������������j�#�PB�0~���WW�����>��n������8�6����;:�8����.=.��/��|��<��xuI.�]��<�����������������]������?��?��������z�������������i����������<�������<{�W����>�����������W�/_]�t���g�WP6���<{q���wv������>���J��W(-n#����D��8L6_�s� �@_^�z}up��OE!|yv���ub�Li�"�	&��0{.��c\3K�r�k�YR5{��p�_]��}�����VP��Wg�/��0�Ad�k�����Z$.a���rji��=�LeY������2o��c�G�<�o���L7�C: 
�(J�9,���F�d�TQ
6�&�[�R��,�LM�iaZ+e@�s�E�m.��J�����W�pT��=��dw��������kr���;��t��������%y�r�oG�q�����S��1F���?"���N<!�;K��qk
��b���}����>�a��;�������K�K��r��YT���	��
�ML��@C9�K�� W�-0�{=��&P"�-����\A<�
�k,�9��R�I��&��g/^���}s���7��.w�.w(�v�p��[���A	KD���`�u36�9��Fl0�@�-������)����v�~w���4���'=�������X�o��'���Y'������1H�����������,�J?�-�,�]1��� L�=|�����\�q�p��6��x�o�M���~�$���K����o.A/����Qh��$�r	�Y��i�������~�����J5(���t�5��oo��?��`�FvU�}��}G�Q����e_`�38d�VH&_0J���'�/W�����(�Q��>�fy��0������&�j�k��39�9|o���I�h�{�o4�k��v����Lf�JV8O�2�f��f�p�Hm.U����";=������'_����\��+|h��V���]5gn�cG@����k.(�[�7���������1Jj�[����B>ph$������v��x�������������_��p/|����S�D�):9;������WR���N.��L0jN��c��Cv_:�#����%\��K,�b��@e��/�T�H��!ah�k��T���x��t|�����@�������O���]����|��W����;����"��+����Pc<�F�����g<P�X���xqQ��d�<��f~�B��B8�'\��M���:�}��w�������<�EG���_����JIb%X�`j�����������z�-������H;|�����Z2���C������g��]����?��e;v�Is�������N�&2����bg�����]
��o�����=�����������h�]P��W�WH�<�o�_'	K�'�'��')�E4z��5�K�����:��#������~����^�=��]�=y������k`�5� ��V�r��� ��EW�T�8���� |s�q��8S�i
�/�k�S8�,��:�P��u�9��sTo�U�u�w�;/}.��+<9K^��w���y�AbN)�mCG>���n3
�������l&�@�k��������-w�={��%��>����j.rk9�w��}�:nX�3��s�_��*@s"��@��J���h�C#�F�"k��]�>�>_@�H��:�'�����x�i*3/[!��ip����m��J�R���s�O>����O>����g�xh��Z�>u	������gW����Yhv�rmRSwG�^P�Q���A����;*��sx���.����l�!/��^������A���MW+Pg�:>lX4t���kX}���Y<9�;+�	�0�@��(����>�������	���H7�VP��\�q����t�5���=�}}������)��*>K��iu�3�&��Q��u�/HrIi[X�����
�S�����GB�/!�T^����t�P�d*��|�7�.����M*U��(dn�nb�v^�,��P�)��Y�����=��W����=����M����6�?n$S;����;�q�L�A�}�\h/��)m5#�o�(S
�q�i�Vx����x|yq������_��������mi��]��\���hFy\!w��[
w�S�Q����	��d����
�G(����$FB7��
��
�>�
�e�+a$�e������G}����_MF??����^�Zk��:5��!!f�������fr�L�_J0���p$-O����0�SP�7
�{�����b*:��Rs���M���}�����EoYG��I|��.��]�l5��`3=_Y������H�(7��������Oz^����@�����'����:����a�M�N���"F�8l��A����`��~=?1i��y�y��0�_�n,&�!Kx�,p�H	a����b�"e�"e��4P'�������op"���K?��pv����o>�����/��/�4^�X�J��jB2��j$���h6S&�Q]�:��Bm��NI�*���������M����*S�/��������K�YV��`�L��pa��X)���mb
xUjE������.��N�4�j��V6�����c�^��8.�; ��c��]���Co��2��rL�b�8��S�-\��]	S�@��,��.�����������s�-����S*�"j�?���?�9����@���q���&���t���w�?4E���)a�5�`2d��
%�5��2y���FPE�8%�X�2��������Z���3\��'���L���U�S�������+���|�!�|%[��r�-c�)sLzj��`x�|��d��Dh�X�]W
::�s8�^�RL�?�!\�9�-|�z=.�.,L������p!�k��t���8=;[|/V�-�Sc�W�
XGL���e#����
�`x���S������8��[����aI�wKI0��L�2c���6��\�)�����~M��H�9��:%��)�c���o$�cxp��4���5���o����J2�f���
�	���*�4'������[���s-/����&��6Pw���������/��z���k��#��-v3��X�����i2\
�Qi���z;�qp�f�0Q0#����8a�����6�E��.����X�.un��W�MNX���>��o��������7�~�7���#�e"��f0��M�����X��/VYky�;g��"��/L1��A�����:F0�9��
��iY����5��h�&���Q��Y��V��6��J
|M��I�
GK���f��x9
"��O��_�
n�:��p�[�y���a�cL�����t|_y�HL�"��92�!�>��*|)lC�i��g{�d���0XG&`�q�Y!��o��B�|�("uN��K��l�����bo��n]$
t�"`��mb�P���?��^;���4����'��q)�?�8���9K�`S!�����;-:P����o
e�?.4�>4������zV�B�Q)�7��lKu��	v��,=~$]����6��e������	O��&�gAH3�*nU�$�w2�@U��I��{C���sd�~1�^�e"S}��~tP���F-��R������!&"�������/��=������h�m��&�h��^��.�z�[�W���`���8�@��#/c��Z��Y�����n�)s*�#�����2���k�<����{����>�kN�y������,�� k��~�\�����/r�����q"
K=d�P{7���F+��q��}y�kI�.�~`�	K"�����<��c������1�[Z4����(�C4�1�$at�,�5��������\C-�tL2��fqi[
s�8z������Q��-���)����keL8����ioob%Jk����o�����{_=���o��}���O�������������T��'�Q���j�VA"��H�����7F�E*L���M%�in
	������*����I�`�����v"�k)����Z�m�sx���RS����S����������,���$�W9x����o�%C��(�*��*��b8���m���w��y<x�uJ����	/�z��f�#~rC,����(me���9���	q�x�ow�����,��%�pv�,��9��uJ�����?�5$N�D�U�x�6��A����X�Y�Z�F��X����-�zDw��6��\����������A�;�����Az.�XXdl]��u��`�I��o1�g�VO��q^���F�z�,eZ}Za,&��-�F������O���������?u�8
-z����Q�=�w������S�J�]���,T5������ICty8��(����gaA���yn��=���9�+��������cm������������h�w�.�������`��`��'��E�8��;�N���m
h�6jdi�&��|��g���~��|\<p'8'�4�J�m?Pe�JT����d4�c�;E[���-�c���|�5��8����D����\�wh����i��Q��&�z�rA"�Ik�ly*���[���Vvn�-Q�H��W�8iB���+�����r�+���u�R��r�1X�b�"�K]�e�i�,�(7���t\J�k/
� �n��c�a]GS-�-�������$2��<Lg��
>�3Y�MQ,x�RY�#h��l�v�='��Rp��Ei�.�MI�
��;��i�|LO��RZ���9kmq�rX�F
lr_{�H8p�hb�l���)=a+�:\E���J����T,t5����@��s��b����	����������u��!e(�����t�4;J?����:�f0��Pf���"[we�;�d�������@\i�3P���C.w�~��wg��s5L�q���-�m���3�8GXf������>P���BU���Bq��(h�Xe�A'��PV�U����_�Lf��R�CP��7�y��<*����?p���m3���C*�
q�H��;����K��Y��@�U��gJ�h
�j~�C|F�E&�D�YW���F��0l�%����M�Il�X������EOa\��+i��:�`_(��>+�uM��{Q���|���N��K�_�X0���>���A{��'z��N��'z������D'
���h��]��Q�~������UsB��E�|(�����R����P�,�y\E}���q�dy��8�tlF<��h���0b�2S�`khG�<2��h�l���l�a\��RD8�-�K�	>�����L�����`�	-C�B����[�9����������;�������>��=P����*�������=uy{vq	��S���������W��~���p����C|���_6�_~��I��?����o�����;�������9|����g����������3>
F��Pj�<~�e���4`��G��`
3���^~g�u�nRT���^a���o�s�y<��fu������N�����������;oW����~��{���:�K�7�����"�/���?<z�{�d�������I�=�Iz��tx0���_5��D����C/,��&	�S�Dt?�Q{@�O��s[�d�����)rU8w�\�8|�-p���)V^��QVgm��l��+���`���&�.�F9��H���>;�]x�(��!�?L�p�b��4�P���v�<$6����C/Wf�[���Z��
C��0�	�L&����VQ�����G�<�\}k�5����xX���Xt#C�t�cV���A0&����h�!_�__��&���X	cZ�M'�e^����Y��|=��$n���Qm�
r��d4u�X��6�4`�nu�M��-���V�0%� �6L�o��3w�����*p��TK��a�Tp�!�qB�.�!#���PFL<�+mx���dF%��������qAS����`��w���t,�[U�p���}J����R&������v��������"E���/}J��`����{�
"��8j�y8[�:�E�N�=��w�v�{�{v��E
%B1�A�*����K�c�1�$4���8���18����j)��[;���',�/uN������B�(`�nX���n��A�b�hyF����F`�M��JuA%{'x�x��_���q��R�bUg�C�i���������W��������L<�p��v��4�D8����M��QUU)���18��&��;����"cB�^��[�a�[��C��I��YKi6��\�!4N��J
0�,����e�*�@x�L:U���}0�&�
"��
F��w\��@�M�����C��A��
���������6*��uQ��{�/�r%�|A4 s\���\y�$�T�=1n}#�PFq
�=u�?�3q'���T�!P���v����2[�]� ��!��w��������D�$�;�K�9�Di�W����������a�u������O��*����N�\��F��f0���>���o	�1O|�dY��|�n�gSw���A��������F�x7;�s�a����FR�Q��^v�E>.1�K��V6W����|����wR��I�y���m����<�Z�����&<�
�e<�����������x�m�����j�eN��f�P�,��n1����$lQ�p����>�:z'y�2���q'8�P�n
}����j�w�_i��M�3����s";$��Nt_�"��!BSNc�q�/f,�g�-(r�����gTPN:��-v��oz��t��N�Aw���� ���:��\*#�������y�b���k����!���������[���K��-��B��FIcS��	y��BY������P���A�.��3�^��t�T#b!�p&�k��� a���4N�ST��mt��iH=�xYO���JP;�L�d��F)���YR�<�[8m31mp}Z����S��!�7�xp9V7�S�4��fU����������������+$�i�@�(���x��}�����(�)��g���{9��H��*'������206,���Ah�����r=y�����W<�wW�A�h�?�V��C�����E'�b:���=q�fZ��"`8�q4���?�7R8PL�?�q�6=���'��@��F���P5�L�����q���5\��:���Q�@��1W)Wn�yB�Zs���B10:U��s���UR������;������|nI�I���2�Rx&[7g�IZ2`l�$���-,'f���`�TIGlL�VZr�G�E�F�87�(���c����m7�B����Y%x�7��� Gy��E������R,��������=��'��0<�5�D5�����5��9O�p��9V�I%U�eb�L�ca	�f+�cc�)5���F^h���
f���qm�R�"Y'�e��	�1�X|�i�G^�kk>J�$��N���LX��17��a�&*8�V�\_����q����{7�Of�dN
:���a�*�i��i���r�,�X���HQjQ:NAH����<!bg]��g��m�S����1��:��q�M>qz��i���}]�#[D^/�R�z���O
���mIV2h-�������z�&��"cM�r�'��r
v�"�I�T����Y���e�I���������/�/=��/~����������R�KE�p�z�_��"0?R��r�����
8���D�0�_�T���2�k2�����R
hY�X�����+����'�T#�w��Z����>���o������<�����w�����H-A]�����0:	bn��As���Fd�0LP��S����-�zp���>�}}}y��[��x�tf'}rMf�d����w1��.mT��l��!h�J��$�q��mk����o1�@Go����x���in��_�5��
KGA(�;��V�D�H���c5�[rjX�>e�����I�FD��}�Tw��Pn+>f|Pn����bQ���������am�������gV�lE�8����Yb�&
6�e"�/k>��+9rRX�^w
�u����R:/S�la��2��o�5�>��\8�|}NI��g��:�!��z��iL��{�i
�����a���+� ����B���2q�N�������pN_�2^_��@
����f-T��iF�Q�?D&w?��^��������9Xomgzq6p`*���c��
�VL�ES-S����Q?��S�X������yS	T��WO��zI���S�J�;�LU�!&7�)G�����N��l	�
�a�����L�e�SK��������%�Ca�����+3��l��%��f�1�S��O�u�S��fz�(������T�o�?�Q���y��<
�<5���~�%��C�'�����0��<#�s������^7Uz��i�X�S���r��ep������C�Nj�A����`a���y�N�c��rBq\�`���,VvH_1�%u`�GL4����������3��f@���M�AC���~��
r��>���3�wF�-!��������M�%:��jH�� ��$C3$�>:U�q���J��o���~�j����o���W����[��"��
=c��j�o��zv����#P������O�{?y���_>����W�n(����tk&�q�L������;���Uk0�����h��f7��C+����Y�1��7�vS%m��Rx]�,��Mo�X�o�	�\�q
oD��b�x�����qA��2�z;x��mF�������0<�3�Dh�0?b�����;pHe��[/���>zT�����V@rb��>�
�Y�~�����HXv9�: �����vt���\��Q?#C������J7���C�/O�9K�z&������a������[]��F���������#�E������WW7�����K�Q9�3�Xb%/*��c��B�{�>���)S/�+��{>�f6i����N��F�����f����Hi'�_�G��:�$2���W�pED����|����Q��������Y���5���]����w��`��	�Q���;����Q���
�(��BG
�|���oy�>����0�����+*)�LQ��t!�h���x:-�+^U�&��n�����wu��0����	��> ����B�,��"�Ar��D�����^
���nq\�
(dp�K�g�i�vR��*0���k����LHiN��c���a����t)Gf��Jk�
������M+\��a1+oX���������R�j����������k���>F��^{����������S<��D>��2o�����2���Ty1k�V3��P�4�KE�0�����m+��@b[9D�j��4�����%���7�V�����`�F'���Z�[Y^e[>�g��9���s���
o��w��rv����3h��0X�����0XR�j���S����,�q��	1�KilMw=5nI�a�Wk����������:9�_0g|�9f�a�����@32-�im���z�Q,�Cl*�,l��W���bE��7>���B��
�J
cK�}k�J��jE���8�\N���6��J`��$����;O���VI`(�<�;��9W�8\sV#eQ|���M59�����@�4�U���-z ��&Q������`����&BD��D��

����?�{a]����$FuW�.���Jkn�	~�S5����h��q
�f�"p��4��������u����Q�	#��Z2����8#�8B�A��e���9}H$k��[�,�
��5����Hf� �x,��1%�����{l�����E���\�7��?���$US����D��4Y�����}k��'?&!@�^�^��N��r���A������`-�B�)�1��as��� X�T
����4�|q��=P(h��w��������Y��[���-�	7F��4���U���\�J��m+��:j��;�����3y'�_)M������W���G����t']���8e�c��D.h���f��A�w�(5��������s\x������X���b��Dn�!.#zi?7��f�`�j�����R��T6o>?����O�	jo$�fh���U|��e��*.��i����<��L��1F�"D��v6���5
X�`��9�2-$o��}R�}a�gi������w�~,s���]�	�Bhgi��<�@���<u�b��([��h�V�>���Z`��;P�u��m��JG���~R�!��[S��\#�@]o���o���VR�w�����{En�R�}�~?#�����pP������M���Jh�|s����s��3���N%�R�	�-ko1�.E���^��M�j]{�v�=��Rp�x�i�����8�7����t���8l��(g[�A������q���6L&��k���&/�C3�c�����&�qp2s1���~
���C �G<Q�X���P��E�_�h[�BtV�U[�M���T���)T�A7��x
s����f���e�� �e������2`u�����������wg��s[1����G��	WK���`�.20Z�<kF�[xco+�%��`&���ti4�D\�p������������@���q���XJ�-$�O�����G#<�!+P!���/�>���v8�5��`.mX���cZ����������u,�bW2<���x�3p��Y������oXC{�[0C>L&���(P�
�����1p�\��R������|VA����b�A
��]�����:(����]�g���>���A{��5:H|������BSeH8{�a_���2��"��7�b!�$�s
K
�-��]g�(����s�;�����liF���|�%��zx`*�{�������\�Zf�W�A���Lb����\$"��#1WC�:f���t�����,St���cY�0������U����6��y������8~|�����k�X������f�=�q�D}���v���w&�@�\=`�s���&�?����|U���r��^�Xl�B
�/�$K�u��U}Iu�����v�}a�N��T�_�<0])i1�s��Gg���1:5#��?M��\�w)^��$<������C�j`������������i�zD����P��uV���t1�&=��%��#0��)#�����e�ogE��o������\3��4XT��|�&7���1P�\�V>��`�`��2z���I���	����yc4���!	��ht�k��h�esy����w0`����\��?�O��1hd��������fA��l=0~-�z��fX;,R�-\�c����`��j�Z�H���O��"���PCo�@��p)�4t�9���c���v)F�u����vLjaF"=X=������3Pa��ZS`���Z�M���)j}OV	Fe���N��Hh�Y(����a��9!u�����U8Tg��+6�&6V���I�=[;_��7��^_��^�/.n~��NS�w1O��D����$BCZ4�%H�8K��
�:1��0���(`�G�a!��������h-��x��M#�i�u�*
ut���m)
i������������g�o��\w���[����`_m�:!�E�
_	�X��1y��b���E$;`Lx};�y���@��z��l���}��`U
�&'u	R�����V��u�k�1��g��+ Xv�1�(9�����p`U8,�^���$6��������dpA������T�U����(�iR���v���Q��/�����S���O��Z��y���	s��Q��+�5���a���7&c�{�}�Q3,f���T���M��L���}�H{�NB@�v�������m�!��1��u�t|r���a39]�[��;���-����%L+b�;�&���RtaF�(73� R��-�r�������+V����������FP���9G�����6D����R`n�3���KX����hZWe�����Y�
#[.[Z�S�e�5<�9���a�D�kd%@l���n
�d��:T�~����Y����BO�3'R�l���+��v���f��}�
�#���`o�����c:��<,��������:eSit%��F�Ug�nhlr�2&`���d�J�7
;�V�oq���k<��U�w�T��z	Q���g�����*|�M�&A3
 Hy�g{S���x������}5������W��+CJ������J����'C���+���EdN�	���GM��RB���c��=���������9e�!�RVQ��i���T �+e4a���n����\�{�|��<nVv|���:���\p���r$&9B�1�9B���������$hG�f"��:�a�I�q��x��x�S��u���Xf�+���ql�RS���0��A������3�@��;e�
S����C0L�Xa}�������7q�H�~�UZ�2�D?�e��/�<��4��������3&�������?�]������Z����G@���t����'�����-X�M�� �'��v� IU���eNac�������8�V�+��������u�O�FS�,����<f�;#�PW��y��z*U
���!���q0j�*�������\M���:���[`D`]D������!4N�����5�����������9��sH���b�������o���f\�y!D�6�+b"����R�m
uR�����=
��UT����p�K�q����P��1Z��*9�B�]�y��S�@�5Sj}S���iV����
PF@�M�v�f�Uj(*U>�;�@}fx�?(��{�Mv}�]!P�������$��u��^Q
�
����\�����BGW����yQt�1�#�\a���C�!�N2P9�{��2f�QD��� 1��}�����v;m�"��� �.�&��:;6P\~���������
�:��Pp]k)0������7���N��ujw$��i���P�,e�g���K���I���N����]�����Y�bf����L��e�2?��k�	/l�Wd�I�����eY`sjBM��i��t�I����6����7S)x�����s�����mh�"R1�~Q��?�A"����i�$U'�������5�����_��(�s�����?I��M���=X����m!�GM���Y#�c�l�����h�����G�`�d����_���$�h��E�YI\�f��e�����
6�6
�����c�jP)M[T.,5���/�e��u���}�9b�_\�����c�����J�)����_�6<������o���/�.i�Y�c����� (k=��A1	:2q���%:L���������}���4����z80�F��jr���{��E��*��)K��������L��:L�#1X�|D)�Ut�W?��"�t�%�4��{���h8�����M��{a���d�&�����K����L[�+G����
W-�+v>��b	���K����/x�rSZ�Av��-��K`r��G�_�G��&~�$����&J)�xh�Q���Y��-.���78�`�������B���d������bB��f��S��u�!R�J�-�����[�cU�X���V�IY���_����z��3���M��A���p�R��"�	��x?���������PA�L�?A���a'������R:n�����f� �JcUKV
�s�P�$5Db�rV�"����<����&43�@�>p>D�a�����W�u��D�fn�1s�v��*��L�'�c�V�MSH1����;���4�z1-pV*Jb�eo~l�1�!���&f�/PVA�f�:=����/w?������_���]&�y�kg�_����|'�,T9����T/�����g�-��,���7*b�V�=�a��Y�xY���\�j�Y�W��@!h0�r������j8SP�iZf�J����?s���_��b�7��?xO��1�x�lJGU��7�/�����bFf����zx���>�3YH���=_�m4RJM

�����4��^��(����}T|m��{o������V�3-�����A��L��Y�S=*�S�b�x��8dG������+3��q#-3��
((�����\����K���y����5�aS*��o��&.<��;�K}1�������Sz"�WG��Zm~*��7X<h.2��g��RZ��uy��Oe:c�_����k��%,V��t��:I�a#5,���TD�=e�p�����>����GO~��I'��������O>i?����Ow�����+
��	�]�jT�}��L|��K!d�jb����&�-7:U�#�r��4>	��r{F
��G�q���F(�qLfN�z\8A��������]���\a,���.Gb��u�	�Zg�X�� ����qa���C��{������u�D�;�?�c@`�P�����cmL6��4��nCK�D�B	f���`��6j�N���6:�kZ��T�������$'���d����4H7�H�	�T�+e��q����VxUo���+�_�`�M�$
�{�K�vO*xX��������h��k�����7��5�{����D�;�f�T���rd�sy�z���Z<�:Y����N+b0G�4����}S[�d7he����|Kx��xp.#��*����NW����obc�-N�����������o���w�n�)���i����9!0IO0�fL���Uv��s,#�gGf�h�@�-C���s�1QN���n����_8�t���e��{��~�����F|��� �#�`�Esg(5C�R-(L���q,��_�N�8NXj�B'�:d����r��?m����V/nT�_D���J�a�m`8�6����N`sj�k��'��<���cQ���K!��=3~���N��WL�i��0Jw��m=P�ou�����/0��0�2`�����u�L2PV�T?�A� ������x��tkh�8�[����/�t����]��k�0�diEB�eD�E��%_Z�����-!�j����V�R��Xp/��[�M���x��5X_m��}l��W>����6���_�C����������=�o��L���/��
tYK,����!��Toe����1w��H�?t�F��F�)����w�|N�~���p��m�w�r:��N�e>��b�-�D��0�4�Tb1�G�9�;v������0���v�q)�n�q�#����I���l�y�����db���8`�	z������jh
K���L�XP���Y�Id�h��(�C���Z��B�����������a���1o� ��qq�[����X��������wn��V�<��
������K��m��7_�>���?��g��a������~����EY���I������l�9��s�����~	�#?	����g|�V����y���(�i����&e��~=��f"����{$u������*U��B05��9_���,nV����bV���OH���|��������>*�N�����������+�x��������������~��A�%>)6���f�rM�5}�'�u?����w05�K���%�O����G��>}J��myT��eoV�#��U���s���Y��E�:�Xy��GY��}��u{�����Q���SW��g�mx�q�E�����5������������Wr���J����&�}�~�w���Y��9��"���3��A=��J)��$�������U�����Q�h7>��L�����q��p�i*q�Z�:=�4k�B�c��z�zs�Am����M�5O=�&�Q*	��'VgMkV��P��'���`����p��+cPB��<bl+ 8���9����q����Of�9g
�]B��C�:�;'V8��XJ�	Cp A/�������0�\�t���S��4�d���a�T�a��,���]rRX�-7	��aDl1���N�*<�(��=���O[Q��b������;KkU�����mU���<�z�u�R=%��cbY�@�����!,�Dp_�F��f�4���������b�r��3�W!�<4��kJQA�Z�fVq��ppF`c �� �J(�
��F
G�"���]Tart�^�����bdP��/U�ab���)k�1�����$�������,�:h�v���k}�V�(*~bz�f���u>�n�t�$�]Th�)7,���Du�H�Q�RQA������Q��K�P�
�t[UC�F���
��!�4G6���L��d��?M����T���_����L�2�W	����+x��<J����
.^e�k,�MV��s�^���E_���XsA��4T[�����wcJ�?P5������c�e(ut��\e��T���]R��H����OiD�QT��u����>�t���;��":�S��gP?����QK�SN�d�y��ZTgX�L����������h��P��g#�_��$u�V��%��
��?|<�d	X���4rU��k��N}��ZT^�3��6����Zs������|���X/���">�������p+�5�\^��U-��`�:����bT9��*#a�\r��A�;����T����N����D���eqs���mQ�bv-���D��q�Fq��?z_�����38l���"�'O3J�B�`�
dl}\�f0{�J
�e��)�����O��
+�4���#��z�[��e������P��wf&2��KU��ZA��x}�jy�����e���d�{�����>/m��7��3��$,��3_$�L�,�
}=��@CQ����u�^�Ch�F7!Vf����������z.^��2��
��r��UpyJ����\,����/��O��
n������>�����g�:h��;b	�����������
"+���{�iG��+BC�a�����=]�7�E����#�g��������/^]�i8����Q��,NG&	�����0I����,��C�_p�
�i�s#�Q ��D\��5%���nUo��F��,��#�0�� �*c���DT�,���0���H�S\�i��k�~s������H�%c�@4n�d,"���9�J�[��j�F��S�{���
�d�uph�6��mT�!����A@W����`5H�\���^L�����1u����f��������w������p�3.�p��-	fN�f����r�A���n�w�*��DJ����c����	�3M�&�n��-����$�co�t�[k����!(-#1��u��ej��-�&�#Y4��	wS���
�O��o�����mo������/C���f�U��4B����:~�jW�>���q�m�>�@�>���!���y��#,�
z�l��m��1�i{4������H�
*������Z"X��THc
��	����>��������lG,����e�R;#$��Y!��r�X&A9�zh�m�M���<�5t)(�.��d\�����vBu�k�1x�q�u�[�H��G�M�;7A���G���.J�W����W��t���1���;%9V~����:�*Z�CT`�E���k��F��w��������Q ����)��(4�>��S�����R+!\�*H�<���<h����K	��fl�r\����4����zC%gn��Y��X\\l���<o^���-��.�����/���~����X(����>�]��������B�j�TkdP����7������3�d���@Ir��a��]�������MsS`ZD�/�a����T��>?|���i� {2���Vk4SZ�]t��-2�uc�2o�)R�*i�<S�� �@+�U��GM_`���5mk�3���&����Q�b<=��1[��aN\��{hk�8[Rh����X����F�!q
���e��_\��������Y����
6����2LE�m>!9!p3����Y�E���\r��T6�B\Z�2^���Q�$����0t�c��!��h�1�Z���:aE�m;�9B�p>r��6F���p|f�'�y�"^����Z���eO&i}�I��*�Rt2�{�`70���C1�q���5�a���?���!������
�n��%F_n___�����1��i�Y�]eM'5O�r�`[�A��|*�8�� !|�nn�|�83���:�I�oIg�St,���Pa�U�H(��#�C�q�])��&5�^R��t�'�<���V�;��q������'
�M�`�?I�Far�����P`����cF����a���+a'�B�Xx�\����EG��\x�fdF����o�>6���Q�8A��� ��=s),El��%���r�0��5�m��i�����Kc6;@M�Z��<u aJ�����D��^�T�si�n���$
���9��Iap�(���`�~��p,%M����	]� "k�%���+�a�����)�,^���m���X�`y���J�vV�=�[L6��d��m��7`�����`M\!_�w!=���������	�U���N��G�N�_}�=0��\����{^G��i��A�N��tW�����P��X���6�o��	�Rn�k����d������c�n ��K�����S���8��?�����\�����?����0��r��{T�Q_nc��;�g��"�y���4�&��>�_��Q�5� �D������_���!����������%��a�3��g�D�o����V!�t�<(��VIJ�t!��n�T�������E�����&����{�w��MG����+>���t�T
��q�bP��Q%�e�3����0.Vm�!���������t�oc�=X�n�w���(M���;�I��XT�P)B],��#Y��`���&�^�0	,��I0j���'�E���H�F��Aq��W|�IfCG��s���&P'��p��UBh�Jf	*=���EN�����\c�rIQc��	w���RE�e|IA]�R�H�<���}�2�?���C�w���.^����Gm���=n���t�9V�'N�Vt)�[�!�%��n�9�����t��h?J&��a���eW>7CY`+�H3#�d���������n��o�PvF�e��%�p������W��!H�gq+@�j����� ���)�WC�Dd�����v���@��
��"��!-A
=PSY+��*6F%�~����o��R��F�wv,L���o�/�?_P�P�:��	S����5Nu#s�P�B�����8E5�M!�����^��Y��eq[O{�
p}�6��D��$bH��IM��T����Xj���`e-xGl�u=~�^D�GFE�;QL����d��Xa�j�l�.Y�N4P������=�[]T�����YX��f[f��	Cf���B��[��G��=`�7����1!��3��<�b��J�R'�`4e
� 9%�c���Ir���4�!��/�E`��5P\o������G��GG!3�J�l��!M��%^�m� Iy�/g$���$}if3�|5���s��\���v���C��O e�s��rX��������0�f�����T�_�[GG�����$������e/����`
�����?}������X!�8Y�����Q�JQ��B�J���C�������o0�C2q��8
�����8������Wy��h��u�f<�@����#(~��f�����
���Z���'��E3�P�/�w���RH-"�QZ��B9S0�I�61
tYi��2��E�Z[��[�.�*	������j-0`P�U�2b3s�Pw��oWW��0�����5�������8_��4���lmJ��	�4`�I�U��x�'�@�<�=���Efi��*A9Osc�-��� �����D�
�b]���w�Z�:�64�TNb�j��Aa"2�]�q���!3�dx�����E���� ��Q����z��TL[��RHpl\�,[�����T.���"����4L]
wX�@��N��+��P�J1;����UM6K���3"q]�K�i���(���ZZm�{Ka�� a��f�^�$��TX���x���m����8�8A��4

���L��k/���	�.V�����@���^�
RiXo-��	�!�`������� �Q6f��[E�+�ZB����
N_l��J�Y{�")��V#���Vk��"��6_4�Fpl����,���b��,K�QE?U����M��O�3u5BxX(��l��o������)���I�����������v���������0��!����1��}L���oF���9�c��L�6�RR=�x�_,*
��}�Gu����������*z���x��3[}�T��iY*dq:5v3��d��>}a$c�l�	��{���SF�y��}�$��Q�`/�����0F��^&����fS��w��m9�����n��T�������<���n��� ����6���)�$T�����V���������X�w�;�sV���p^	���R�f�b��R��Qn/y�:�J��[�{c�����5�0�`��r��_-���-T����!����~�D-�c���G�%��*;�:��<�����!�5�:�k�s�����B��
KJ�:�S�����E�+�M���:_-�%�t�j����wb�"88f�m;���6����`�6g�U�����E�*�nMt�3�����������q�u�aim^+qi����n�����������W��x���v����������/v�y�d}������Q��:�n�+R�7�k'�����(z�E����\o*�j���	��T�^PFR�_� OA
'E���y�R����}��A�^um���F����x�U))����M��X��m4���}��G���v���+�kQei���!�����R�Ko������@t��[Y������:kvZH��!�7��bSN���ZD��u������v��S���`�\�3r�Y�a�����NG�u�P~�����?��n�j'�^E{]���M�x}?�����E��f)���84V/m�/e9���?m��n�r_s*ZR�L���1e���k��Nb/Z_��iJ�&"�M%�����vb�?���Jk$�2y4a+����VIcUt��#"6B><T��aY��5���=��F��6AE�v�6�RU[����N���N���R)���������1h�k�/���z����	�Q��q�kt��r<��a*w��E����$+?.�H��uPA�,l��X�v�4��i�C�� 
���z�x'4T���xqQ���������r���'GR=��3�i�����dnA�~��-�X�I~h3�Z�+1�|~<q<`	���a�'w#"ra��[�P7*��lIg�����[�c����������'��SKp�u����r���=\�c�vj��u�/o/�}w�$Ft&&&������k�U��7�/����'�'.�����"�|t���5(�6�gn����},�����w/�H!��]	����^^��7�����2��},faN��#gaF�_����%{�:0_7
�
��8S
��������6/�c��$�c6�T�Q����R/^�����3^����
��:%�/�~bV
)(�)�
K\�3\�F�M0���+Xy~~qu����5�����}t�r8S6��u�:nX��A�j.��-�1�(v[I<�A,6����Y���R����'����2,$�2������t������&:'3��mg�������|�hkgw�j/����K�:�����(�{�w����*A8�v2�Q������Zj��B�X-��V��������1P�;�+�	���N�.eX@[��j�db!��Zq�@P�m@��2��LH�^�������7������4�?�L�����x?��$�N!�H~H\�</_����
�d�3W_�#!��P�c��aA�eQh&��|�7�.����M�U�P[�6��|�rD�r![��:&�66u��S'�g����������?fX��u��q���YH����#�q��]��<q,�T7t�aY����qXl�5#*�=_,�K��hCO�A�s���r`S����A�c�a���[
����_����8Z���)v=�A3���%�G������F��������
03���>K^v�2�������k���DPn�f�\k��lb-����g������Sz"��d�����*A$RJ���'&��	i�4=����2��7J�y�"�B4e\�ya3�d*�������9FL,H[*�@��7�z��O����~���o?�dC*�5~����O�|����]�{��(������d�1B����F���1^���\
����5fU*bV�7��Ah��H��V�����8|�q��[#�t&�	��Cc��5|Un0�@�/m����	���q
P��J�W��_�2���>Q:�;���/�����e�X��E���������j%��	���lvi3h.q=��^p�~�/�%55jS�����t�F� ��+r�0��f
�����h�Y��f�H_����19*T~�����zS6���zk�T���F�VNw>��jR��c���2���� ���Ek
�7�+-U�b�����[����tP��8p��q�P���fLz��A$�*V�������.>q���X�;�
"�8B2K����f���:]y(O�����5����iIg�^��������=w���N���#}����s'P�O���2�������OV�b�I8���4����������H����7����7}�x�K��/Ja������x,������>�70������?�������'~�C���^}�����/^=�~�':��Nt��h�0�����������|�5?�����������E���:��������w��
??l>����j~�u��o������5?�5�l������6�����5?�n�':����~���c����>�u��)�^��'���y^����|^�?����yY����>��o����_���>]��������Q2�OW��r�>���F�>]�7���������?]��_���O������O�����?��������5y?|�����U�tM���������O���_�v�y��~�2o�:�O������������~z��w��Ot����2��7h�?���Ot�����_}���}����v���>w|�	�':��N��P��q��aJ�=M�#*a���
v��)O�?�������e��t�wV��������0C��O&-�Ki������	^Fr�?�XJb���L�����q���@�j�oi�sy�G
�	��*4���}��j=�*����Iiql��%;E�FUK�����t!7��(�5�P��m�J�����f���\���#\:h!wqi�V��F������Jxghx���'�������/�v�<p:�(����Zy�>��5�D�h���t,�4�`���%(�E�rFX���w��_P)�\	aU��Jk��$�-al&��(��g������@���a=j�
gi����{����r��!3�u�Q�����7�T.�x�!�������$W7[�v�V�X��n;t��l�%
��p��)�6��������])��)U@A|����(7J!�x��&�x��&�����O��}���}5k�8v.7���}�N6���3{���~�"���R�6�uQ���3m�f��l� ���~8����w!l�������-���>�R� J,�~=f�.������G&�z�(a���k������|k���Unj���$Z]��a�7��0��h]��Pjk��nU:>���\���,�i��%M��["��,�	�8�����{�1����e����AS�����7���j����J'(`!����V�$b7�a�QGY�������`��B(��(�=�B�%[�z��u���Y"	��h
R���=i�B�;�q�]��U2��V��jVa��j��������Ae������������(4������
�C���x!e�e#A���c�������)	&-6NC5�O"��l��:g���0����+�^q��RA�m���s�u9�i�uWdH]�L3�����a��qS��{�l��G��I�����6B�feI���ZF��������O`30��ND�����6K	�Zw0���Y�lG�'����=���|wq��V��PA����?���}vy{q���%<����������`�	
i�<v
��q��~E������y�)3�QC�����!I�O���e�J�h��m�(8�Q�&"�����0��W��F�k�(����]�)�L]�E9���ye���UN��{������c� �p�C��	��(4yX��#)���b�i��Z�>	"��!��y+�>��xe/��D��=�9cS�*��O<�������	�&��c|u~�}�5Zg�%v�yV�����:k1�[���"���Av�
�ke��T������8B)�`�b��W-g����gO��� i�/��h-�=���JJ�5q��<������<w�8��8U���2f����\�je%��F�����'��:�P|����r��)�Jb������
C�bCny#Gn6L��eCV���-<~%�KN	G?����^5~�ox�G�Ij����,��`��5��G��<b|Kz�^��@4�x��AP��x���pt�0��X�hqLP�b�	��0.a�����$A	�:�k!6D;���N����v������[��IP��'�o�R�	��_=���o��}���O�����������(���(N�q��8m$�#G��(�U>�*d��D�0�5������{���R�����%��d��m���,���H��s3�%�E��������U/�*N�8��������$�h���7�����F���~A����8,�o������Z#�439�A��-��}�I��`��S0c�>���1T�pw���	F7�;PW�����u?hEP<����<���]��E��	�(������g�0SLK<Mt�FT��u�F���a-�H	bU������0�����Zw�\�_�Zk�`L5TO����r��#������?���y��]���OU��i�D����^<a�H��,�7�-��-�z�uw��6��\�9���w�h�kLC���9�!0�����PUO��������Vj0^T�<�4j:E)G��L��D�f����q���8?���A3��Et�������{��V'
K��T2c��I�JV.��[�`�>X�h���<���0PGsG��c{bJ����<�����0�}�QR�.�4��7��J)����zJ2�R���#��"oYi�[+6����- @[g�uU\�{y�z&a���f[U��Lr�4-����4����D�����SS3+s�.��eW/��W%�4���N������X���:_[dN��T���b�s�`��#q��'A�U5V��u���2fV��[��!�5���0l��C��/#1m�2��8����Wo9V�{���\�3Jm��;�'�0%5|�Y���x�o��E��#�?�?J�~|,��Aar��b�O�X��[��;����&�XM�c��0��u�zF�a�3�0p�r"]ia�a5�P;�L(��[�%N����}�����.��8�� ��,����h�RY�=f�j��.g>�t���5�fq
��&�<��^�Q)���k���b�F�����?a|�V3l�(������^��SG���[XB1�';QA�LM�����7^�q�|����k���z��S�nV<tIV�2����r���OW9�yI4��EE^)�����[Y��)5����j��?�	���T�}�y������o������8F����8VYl���8�y�������T��e�)(�jq��Xg
#%��r����o___�������PK�����M�*�&���$c�PN�R�D�8�i�Ue����3��wF��E!h��4�2�i����dX����6J�N(E����h�K?��v3`��$���7/�y��r6TN�>gTP4�����V�n���FTm�r������d�i�@7K�%]E5���k�r��:���$�(k�RC_@T����`W��M��^�z���?X�T����GP�;eBjv���A���z:*����..��{,���������������Tb
_�[�{���D�h]px(K���R�.�WU��h�&����R7,I�@�<d��������N_��;J(����d�A�� S���*���,Z%�.#&��-��odc|�wV����vY���a[��L��pn�J��
��\SNt��_T{�B��s���g�3r$�)�n�p�s�5���)��^�t���)~&����XN��wPU��#���je�TU�>L�n����W�����<#d���N�5]c���;���������]�Z�����%���������Y�����Y��#���,�Mi�
�� ��,G����;L�K�d|�m5,�>�I	���;�Ux>�E%��$�D�����$�KwT��C��wi���#��+��������q�x�YB����"��j�'$u���!����	�Gv%<?��lCT�)�2h	�$v���J��aE=�	��|j����������dF.U�dW�-����������w�^9o�+��V\#����$/j��Z�E�J%q���`����cx:B�I�i�]3k<8�IR��R3k��dV�r�PP�.����Kh#l��v)�]S2�7��G���.�M�4e�����;!��|�5�h�q�N���;����5U,�E���B�����u-vIcWu���V0�6����6]
tp�Jg�N+�Xt�JgDK7��w��?6:��1z'��T���e_�{��8K[����s�~|pA@��H ���i�����~R��jtX5j|��w%'B�x�8x��q:>.�����6�8,�{��I���&��7�w�X����>�CDq�d4��P�0]�;��A�.��A��������BC�Y/���(CI��(��9�����.���W�0���)D_#�v����0Gf�~@/������1�I�@�RT����� ������a��K�+�����j�����n������ak�����<�d��3� }��~&��L$L�G�T��z���5\���@;�z���p����i����#���a�1JXr�)�!��MZ�A~�"$l� 1 ���A���H�{t��J`*I����}a�)��s6

�����t{�������4�v����M����d����t@-&i�"���k���3�Z
Ef�X�����"j���:�b�F�k�a3��Z��G�E����a-QG ��6��S <����)~N��g�V#�X�$�LH�5L���pR��]�U��F���o��<{u���W�?�����mZ�1��C<uc/�
����h�Kf��Y���3S���x���s~X"�)�}�>J���V�~��
��r�N&�a��T�"%M)W��J
���oq/����?<�����vv�[mS��O�6���k�^�~EI8�#i�hEe�������p��[R�}B�Z+�lsP���l}wxq!����Y,Ro��`z$������q����$�q�w�#q@.�L�GCu[L���rl@x�2@�L��^�_������C���4QX��M�it�������=GG��R�C4I���
Om����H�O%d�d:�A�7�Qh��0l�N��f��cf��=Q�M���ObSsS��)t^"D/���������;)�B��s1�?(�<�����,m�	���9��A��2�	F�)���BD@)F�A�1���~��[���{ ���d���I����b.p�Y!;���1��3��1��8|��8����#<APV�����&���$z���S4�l��P�����1����e���q�jj���}��&%���a	��b�7Vi��?>2X�K4uk��~.��>�����rd(NM�H� ��?�������F8zs������~����C��@\!#�\�8
��	$|FJ�0R�S�-����bx�jb�d�?�	��Um0`i���vf6� ��>T?`�i��W�Y8B�V!�,�{
U�����@�9du�8�Y�
�p�2����[2{���4'�Y�m6�������.�CC4��C������
k���C};gT���F}[B*��c5�t�c}7 aYkc�w';��	D�n@��o�)p��4�i�2:���������,���u�Kd��p��P�y�T��$
����A����If�>���@,MP�5h�`C�ST��%�t��6"$��DRIz��.!�Z+��>4^k����[NA >D�@��H������k-���x�Isz�E�tu��:�T��4�^]�VYk�ZQ��6���RI��t��G�K�j�����q���)�����\]�?�h�j�]��{�&��p��T�}��%(dBA
Ti\B�&G;�w:��������f�w��r
���v#J�Q
�N���h�8�����vy��}��
�w����,]_4�8a��O�&���a�����,�pc��c!=s�VEj�	��A���[�������m\������j������/�/gi��A{;���������C5.v��_�+g$�:;J��[dv|���-��W������\�}p#,�>*1�W�����a�����U���X��6x����G�E��z���X8����9M*��c����9@���N[�������2Q�"�\0c9��:�������x�04��%R6�#�����'O���?�����
}��������#���?��0����3���\����������k&�)�������A8���>Y�&UQ��fF������82tmA
��Ri��I��Ig��(�m*��� ���R��LS���9��$������g,1��w�x��Z��V��dD�&���	L���h2Yx�*�3�*���G�j�\���Z����QN������r�p8����J�dn(,��T��6���o�j}!�J ��D�/��������>\ *�3��{A���=�aR��=;�$6f�Vsd�����Mu&^yUqhSGQ��#�����c������X+�(�:H0q����f��Q�������?�a�6������y��G�N)�7�{���gs
�P�����q��E�!����3���u�E�L���^�t~�?�b�<�����H15/T�5
2!�������0C�Ak�$[��R�	��&
�`�{��3�6�"�h	e5�����1���W���a��PQ���9��C,I� P&,��F5,=�M�����p(t>����(��T�)��G�f}~�"G�����U����������p�hD�|`�)2� 	��S�?i����������d^�,1����Q�������������W��Z�,�s�\M4m�������B����g_;��'��G����|����3���y�����$k���s��)�%������}����������:���>��U\�������#�'ZG���y�D5��;�3���&��tK��l�4v���@���~��?>��da�ir��,\le�";t7�+�?�'���aD�?�\�O\�gi����v�����K#|�Q&#:xB?�����z�4�p����u�]���^��q�����b�p�����g��3}������q{���7�}������7W����1�/C����O�6����8`�O>��`y�`m"�J�*q#�%Hm�wFI)���2/+e��8BY!���T`�iUOsW8��|i����� �__?�}��#6���#���&��#��7�2��K~��L��V@!���5U��<:*	�u��X�1�j�f6I���F����KD��)�vr�,�!��f� o�+0e[�Rs�a�[�h����D��������+������B�E�*q�YAS�?��!��e��f}�T�yp��O	.������q���N�83���v]��B)�K�f��J!(_�i��Xa��^c2�tp9���=!��+1��]v����Z;u�����5�'����o�
�:���0��P�{�z�)���tHA�%���f�I�(�l�|I���	���Y&v[�	���-�gK�[��AJ&j��m��a<����X#C�)�<9Fi:�B�@��V�R%��0�����z������������/���%�;��D�0�Q1�����V�H�J3
LLLpi����R��dl���V�	��d[>}��?n�����������?���H���������=:��d�
M1'� �y�t>�����[&jx��K��W�@#l�������.��w���?{���b��������<-f���� V%j�GM,t��Z�f�(����3��n�<1u�L��B~(��,=��#���6�wF�[/n�Mv������XI�1a��jm��4�/IK[�{
�"�c�`����b[.�J0�S�;���D��Y+�	DB�N�da33k��`����	�a�K�Va���I���c��n���_�qoy��WHb������eb�!F�"��=�wX�In
8��vwY��Jj}X4f�
(	�H����(��a���X�����;�5�f��M@�%���[�%�ArE�~�Y������|j�{�5Q���`/�b].~"-��LAy�7"�#3
�p�J#�@.��(��������YfwS�Y5x������1��������$1sS�8'���9x�V�N����q)���p<vCz���W��o�����j�Tpu���.��*c�t�2\�d�.@���*d<[9EhcT�������6��������Y��4�;����v�,�AG��9O��6����;����	����.S����y��{Z?�C
I���NvZ8O�%!W ���a"�<4G�&�GV*.�p����O�@��Q8�@��KB�H]�$��]@kKC.����"�6���3������������n�S:�	�7����y�����xp����#��N��}O��).$�R�tN�7O�HJ�p4e���C��i]��<^p� 81�/�nJ��jyRX����������3�t�r��������/��V+�R
G
��-epF�N���Yr+�(�h+�/��-H������jW���;���t@��)����-��p�Hb����%��I��/��Xf��c�����&`��%]Ob�z�%�Bp��f�����p�x���g�Kx��o���GS�F�&_QZz_�����U��{���o��z�7�����/7������k%ep.`����9������s�?z3zG==���DZ�� B���*����7:��6l����q2r��2���@�����n)�{$=��x���n�N/��,�
��3��I�Ih��:*�o�������{(��/a0/������8b(�+���u���;tA��L���"���#�];�KKl���m������45��������vw�� ����\Ru���/�F����75B����p����R�B�m����������6��z��w�T�k���i���'���m	��`������"\_��s����vV���o��������v���wS���^ 5X|�-�c�G�$DQ}��5{�������v��LME��&�	��	N{���������JHI �I�iY��f���0lM{�&�S���MTt�7LT1�i��DugyG���yp[+�0i��0�����X�#���nAcj]��YMW�x��O��R���i ��8
�+���W|m,t���|}�@���SeDR-A}r5�Vh$�G���(W��hQ�����#(��)Q��o�TPL��� �*k
�> 7X���k�:��Q(��#X��c�JN3C��<��9�%����6�t\A�:���I]:��1�d�e���5�{��#�@ ^��}wx�&"�g�6�sK�KC��%�@�X����Km�\&�D�"��X)�+�p�R
n{J�������e1-�,���@u<�1�8\��X�s�0�pm��l��RX#��=�-�
8�"����`{+Gs����m��B�y������N]���9j��w��������������bR��aP�����L^��e�����X��9��$��1�m6�{��H_���
���ja�>{����)w��4���d��M+8�v(�eKK�)�q��(��v0,%.�1jG�����S ���X���������QS)����p1_8Bc]�%���u�+����p��?&��u\���>�N�T��T;��M�HZ�8j]�������#0�m��4R�V��l��$��5�1�x:D��8��>w��K���w�]�/����9�����N�{��yY0q1���D����$�
��{��_�f��C�b��
�1 ty��\b��%1G$�����q�������d���[=G���a
�ky����Yvx8��T|o)����b�V�Me���t��BK�R���)��Q��M-����:���r�B/�aI�p�IC���A9����C��l��bJ���V*EL�������U��Q9����1��#��PRS0o
�z@dq�1�����?
�d�!0�3l���i�Vj����L����2Od�S��Z��5�v������o���5��-�����!�6���I�6�:�K6�T�Y�.�!/P��J�:�<:���8�;
S�$�� D�D�c�*���$���t����������P0�d�7O���������$��,$��2�|�Z�;���KZ'<�]U��^]�����b��T��b��|�_���"��,z����e��
1U��V=G��19�SL|	����9��Avm��Mm/k	�����2?kt�����;"��5Mya K���k��R��������6Z���a��t��v������(��������p���4{}%�F:���=w���mrg��vg.]��B�n����r���\��/�wH(Pa��>������Z�~�K��6����,������%s�?~����5������%>B���\�a�&�I���4�� �D�-V��
��Q
�*��7v���>O�+���6��\&���~���o������:��_?�~����oH������o5�S�q�l�F�������|^0\Rb��	F
�"��[��h����f��T&ov4E$�Ag�1H�1;�*
�@���8��+�y�M��2k%�B�^
@��`��s)���M"�G����	�1�����7���|���a��!��2-:.����,T��aA�@/�02w@���L�uuf1X��XET$X���#�A��
G����I1����}0|�s6}��W�������~������ze�U��������������&���=� �
g{.������__ C����>��������O���}���*~Ljo4�'�4�P��%tC����5s�)�RS��z�[t���$G
���b�y�U'�2N	f;���Y�6�}u���?�������O������VX`�W�w������/���?>�*.-�y���v������	z-�����Z���k��9�o����_n^���}�6���(� �U��W�l8s	����e�S�(�������7w���y1��S}����V9G�;A��e����H
���\.J��h
`/)r��[,�E���A?�6�oH�
*P]j����Y|��Lx���
�h�1�����s��>�T���N�LZe����a�B!�[�i�|��H������)#�B^O�N�����da����)�"��Y`u\��U�����V�4���i��;!�"���r	XB��d��\�|�3�$�^�`�z�4|;�`8�����T�SfC�=X#`����a�`4E�|3p��H[40K�G�e
�CF����u�.0J�K>c�Q�h�	dR���)o3�I
(�v����D'�
*����-���i�u���u�e����n��0W����\����"��*�=��$�����U�k*lg(�>�a�0a�����b-N��qE���������8b�#�=b�j��G�|�$�L~�/�=yy���-��|���y6
�����<��T9QNFo��Vvz�SsPPb@���#E&����	8��f�[���
�~x�������w��
>y��cZ:��$odj���G�1��(i��9���b�c��-C�"��,#��)D	S\�J3�����r�G�Nq2�m:��8��<�w/W��l����� ��!|��#�� �� "�`o}.S���%�S��"��4-�-3�L^�z��3�%�����
��`�� n1;QYN�z�xY��k3��H�E�df5$L�����������B��
0C�iV����Ny$�4=������y�/�O���\�)x��F���"�I)/��s�U'��v.G���M<��R����+���%�Y������������5���>������mk"���3���p]W�r��aj%�+�D�cp�2P����8V`
��+�����Li���Z����0�o-���5��%oo�y���$
S���t'9�9+��:��Fm�q%���W�h�����t^�~5K%�b����(��'
[O <�Z>�c_n*�iA�x�\K�����3��~����[��`hp��5��E"��g>z�LM�fom�)s��%�����m:�&T�n(�(�6��Do��o�D]�&��������i
K��������K`IZ6��
X[����rL�������c2r(6>��R��������������]/����js��	~�^�2����T���d,���C��������c�`X��f�aec3�<v�(|;�iu��N�3v���D#$����@31�cJ�2s�u�|E`���o���qp/������b�"J���#�%QI��$c�U�}'�J�[���I��T����(��n��B�t���]`\���;�P����\����q���L1���?�����?����q��8\�� :"4P����+���HB��!��S!��:�����!��Bz�%|
�$��������tT��J��
�����S�?5�?���C@"B��������?����>9���
]� G���\��V�X)�. ���r��,��T	�7&��n2�,��%m1u������Vd��y:�"DjP�dp�8t-4�A��i�r@�F"K��w�m	a��TJ�X{�5e����-��m,�bp���]�j��0������S>$�L���+���v�['b������#'�����>��
&J�F��~���l�)����P$8��7/�!�)���R�l��)U���K\0�,���J�K�c�^��g>00}��.�@�*[N3�y������ [��y��O�������88feD���RV�N3����hD<����`Qr�`�P�0�Y����"����u��L})�%G~���;���qoh�U�t3��v;
m�=U|'��(�+��{[_�owuwe�����1��@07_�S�.���h��g�4���#j�����������xj�l���y���QK=D+�����n�	�iF&C����v�uu��Y(���4��l����!�?��<}z���O��0�^O>_�@@jS��fi���c��\2y�`��@���4H��H-F�kw��g�X�}� �d�05�*`��]�������8��L=�� ���B��D�����4�;2~Aj��(������}<F��"~b�#�D����b&<*���YC������$q�6�<�����4B���[31����9���=��}B)�[���%1����CE4�0�������z�&Sv������c�s~2�L�Imi�XC�\F�X��u�2ed�(c:$U	����>�[L�X������0���l�j����}�'e;�Q{�����-�5�i �{�#S������9\} ��-qe���u+z?6�p��8��6��,���f
��o�v���Jdl������7�:
{�f���11&�T���>��!@���D��I&u�:�\���
�_��5Z@��1�������6g�y@Dz��J�4-��+�5�b~M
���FB��Q�o���o�o�U��u3�Y��I���/��e5G�H��@���m�x��g�����H�����S�y���PoQT��i���n,�:�/"7+^U����325�7^$��SoN����[=���uPZ����+|���[�?���w��d��k��S�:��cJG����Y�yT/�w�F���TA�%�G�i�d��������g��v�hl)2#1F����?�OUj�@�������5/��>Lq3��&��@��c�~�~�$p��N������'�{����x&��|�e�#�k'����l���`���E��t�y!G_|A42���Z�
n&�-��*,�B�	�H}����z��-ug�Z63��������+�0O���9�z���c���]��o��m��M^�qoz���$�P�#��/D�=C��p ������$D�5v�tn������*������)a�;nQF����0_��r��O
�&��8�z��5dZ�im���&�Q�H�	�V�;&���,�Qf`Q�8|�Squh.6���\TE�jV�	`-��p�u�$�����E�c�";L��#�\��6�g�0�q&��*�)��@|H��1k���0I����$�$DY���1����jo>D�8��=��6&��}����ff��U��R)��	~�W5u���\��A�����~R��OG��DT��X8bt~Or���Tx�p�X��S�k9�"tQ��f���`|oAp{P�l�8������.��b.���B0[�#&��{^O���6:�Ks�s-���1W�y���$VS��w��D�s���TM���.2��s6��z^���+_U�;=��S��(����MO��ZJ�0S&t��7�atin���D�&vG_����� �-]�2)xLa�<�%h�u }-���E�P
�O��	f�]�FrPx,��v�}#O���O��P�-����I������{���[�I����w7������o��N/���NM���w�wG�e��g��8����KB�Y^��7��\��isw!>LF���j�x����-��&���+��N����%�����
�$)b6+�h�F���I�Lk~Wn��U��9S$���l�[��>X�Ocn��W8�^����o?��}
��G��WF�c�Y��+����H�5'H���5���^3���5�@�X��#��*�;����p��V�m�����|�M��M:S�:8�
c�*H%������K[K�:N�bh�y��y4]��D���V��w)�0vF�y�l�����0�3�&�L���C���F%��S2�gif�������r']�X=^����`	FO��	��o�,/�]��g����\C���2���|�65{����We.�o���}�CMh�Y��g'��lF�s�����"R�eU�hZ���H#�+a�)L�Q���?;i�k)�����C��`���3����F4j�������\����+�e��?\_��s���kG;y��;oVZb-Z"���C���T�����C���w9i$_��4��">��[��E�=�@+�����\�D"h[���@+������q!�J���B�c��@��+��OR[S`�5X�[�M	�	��H��-���n��vE�*��
YXY�I,�a�0����B6����H��u#a��#(�4�5#�mC�������0��R�P��#m��{D���Q���^�x�������r�i6���u�#j��������m�������^�������f�3[�	�`�P���:~v��?��7��.�������{�~};��wA�j�q9���A���{X]\���+�U��^�?��\�VvR����5S	L&���~%���>1HQ���!���+�y����Q��V��X�I���*���
������#��}���p[�z9�;��o������Z/������j#��x,���O�b2����p�����?G[y$��K��R�_���s��b��)�KoL�aYdf��,������^�>]G^F��t><����QB��B/�����s��zS�08�V�n�������ri�����N)���%	r0%L+�qvB.��s�W��f���mN�;��e�3���8����4�����
�i�|��4W����tF��7��!���������H�w���8TQ2�����p�Of�sK�TN�����fd!w5� +bN�YYi�;�R��eh����F9�@8M#�1�N�������b>*�0h�Y���L��E.&�����aL�����OI���Sh������VbJ`���gfI�s���R�CD��\)���-a������������O����W�\\����U:z)
[���������z)��lDN�������/@����s0^��<{u{$�sf#@%�9Wam��!���
��=�'1�?�C������^����V��m�<0P����a��F�2a��T�E&����B#����:vk�bY������c@���",T�6��c0�??{����o��a���&-;�'�f2�d�0�5c��)��u+�~����/S8���$b�O�G�p��2��*��k�o�
G3�?z���W�>����������,�������@5�|���w���:P�pj���/���R��V�z�\����+B��\������R@hR���v�[�u�'��r����L����v�����%������OX�����?��`��z��>K���D�%�_�A�>�#�����o����
�{���<��K��}1Urg�"K�1!%�M�*?�x�vC�������	���c������:����sbk���*�h�C���5f�~��M8����Be�Ku>�`��P������z���<�p0pT������L������z�������o���~p�k�"� N@sj�EL��P#:�-��Y=�7n���|&cgbK�n�P9"1# /
w�q�0�4�o��
�Q&X8k{}���LoS	�4F�y��������.����	EW�ZEe<X��:+���m(��������a.������oV,V��������e�.���"�=�N&9��?�����G)�c�/q���\��gi��aO;���=������0F1/��R�c�I���V��h�
*��
N�'	A�T2��������C})�o~{qo��Up������7W����1�_�����W��\??{���O>��Er�j�]�j���!�0IT�c��5�����D'e�h%IQ8��jcIuI���' ����4j���(�2�]�w���r��^��i^e�&Tp����4�J0��X�`����M\J�X
�������5���5k�����a�9�9�����KYC���p��N3�@iS�Y��@{
�H�5h��R��Z8�|�<a���
�
�d�*C;p����F@)7N8�vjx��Z�8~L
���p��-L<�D`���3
tzM����v�;��q�
�fS�����~�a���
Cl�O�15��}a�����+�����EZ�8\	{���0!���?�V�<:����|�3]���T���[)#L��[��y���s�/�&�M4EB����T�x0�m-OLq���:W��V)��.AR��$,{��M��G���^Ly#�UA�EO�1���`��a�����0��C�T����$|j�(Y-!��)"�V��'��?���4��Cb�-����Q4����i���X�>}��v�
3�Q����_�o�	�w�23<y
[����^��,3��Q8&��^�e�V����.�x\6��#�P��mh��D���3�k:��V9�"��`^����`�G�PI<����_b2����0U��V�����o�����=U�'�%�/�$��K������9Z�&����7�����A�I.��.m�%�D%A��P��{K0h�����WL������P*J�-���CX����qK�3�!����Bmi=�k9x�����Q"N����u1F��C/A����-�Ia���MdF���s��G#gv���+�\��s%!�($���>�x>��dd:�b�>&lu��vE{?
���^?y�3zo����I:8q���__�>y�V�P}���W_����G�������;�������?���iQ�����]I�O�>�*��5���_]}q���@�s��>���x��AIuR���O���tzY`oQT~��������!j[z����0��d��iZD��*��Dv���������]��z���v����|s�����K��W����<�z���w}��7���'���'/o���uR5�|�_��=zP����a�T.���?������'>/��S����b�,'������
�N}V;�}/�8��)V~a�QV����l��+�k���s�"�qSv�2��@�t��z����������������/�/��s��CbO������^7������%r�����`B�f�g���g
=<S�k}����H#��E�x�lA#+�e1*��Q�i[H��?t�h����<9��b{��VS���B~"�S.���H0�P��A�	�1vl�KaB^R��Y$���=E��v�m��Hm8�Dw��bvuL��j�y�ef��#�D�Ah�L��c��j+t*���'��o<m4Q��lQ�!����PD\�j�8���\��-�cC���I��?�,MbQ5c�����X�:�"",�6&H,�JR
/[I���EuFD�N+���A��eR����U��e������T�"��h��"7�����ja{j�9��m��)�`�85�0������8�t��I3��cR�f����%366������n��/�o2���,U�Vu�F��0��C*8�W�&�(�
�����(�+U��op�#D,��_�'B�������N��t�|��^Cd+�.������b�#��������r�|��n<)�m�Ni�r��m9�6�x+��p�[�XD�ga��T=��@�X�����mC<���G��s���s���;�-��6l����[�9�?������>D�k�
.,W~�g�>��KoFr���'�e�����T53�9�8,�2�&���/���/n������Z���$�����	r����5{��T�����r;0�d��c_/�S�UNO����y�g�3��L��������������������#@����o��������Q�����)�1B�.�)@�5'��
��P	��
'N�\r�����j0(�M1J�K�����&f�2e`�,"wt�
�e�����f��f�EDz,`=���l�B3c9�^1�����H��x��w@J��^�
��+k���]g�y�7�l�~�1���i�Z:��K�:���!�x�J���c��TY���:
��Lt��M"�g��[2u�5^4u��#������M���S����o���������������`U��&-�|C��:�$f�~�Ibt���Np���w����o��9"���7��t4Ib�	�����W3|�2F���c2��&��[D�A�Y9��t�*N�h��	�HrY�p"b4�����E��!��$h��a�.=>��k[GOE���c0f�� \IE���=�gK>~j8-���X";��zb8���Y�x��@D����v%��z�3-z�5v�E��Yu��D�)r��9{�	���f�0�&0�����~XBU�
����	}/b^� 5{w���!2,5R�?�8�����Df��0;���fs��eH�]N�E�.�q*6Y��.�>��N�������\���S�y����w��xllj�I�}���%�!7�����@����}�E83`,Q����}8��������G���<�����E�QC|���~�����.����H!�C���<����}u�^V����e��k��xd���E"��`����N�L����F4!����0���M@c���F*)�g�\��Z�:����K<Yf���^f9W�
����-u�`�#n���K�bb;I���f�Y�vr��h
���&L���V��3��2)��
2��1)�V���8�9��;*TgC5.�<V"�-��b�*�X�U�t0w;�f�0���rj�p{���o���g�������+�?oJ��F|l����\Q����Xy^�$����0��&(��oJ����;����$ �����������O�������1]vu|��O^��L }PQ8��!�A�c�_^����s��x���x�'��j.�,��~���g��k�p��������}�����<D[yV�|��g���}���W_<���|�U\p��X5,�������k���9����a���r���"&W��Bd��2
(v�����\�����?���O���O�����g�b����:Z6��~��IKY���kD]�X����6�d��
�b��r��eJ�	�4�m�����'���i������@au�i�Bv��Q��A�@��$&CzS$��Z����@��
�*����J!��|��T�|;�[��p�P�r��d��40A�D��j67���z���I�j�T)5�-!y��)Lu��Rdg1��p��6 �X"��0MZ�����w��5�����xzLgW����3��b��#@�0s�]������\����X�[��,��HdH���.��6�)�j����&�k
:�x�h_/7�N��4���R
�`.5�
�����0r�%/N=��kC��Va�"9�i�E(NG����r[e���.��L�u�"Fh8*'2��-tz���Z�����������%���O,�k�C��J�\p����8�U�Ac���9���`b�����3��i�
�-��^sI#H�$���h�?a#[(�	Y"r(��knd%�1��M;����W$��L\{(�|;
q� �9� %�R+�4qc}?H	y6���0Y�E���>v{-D$�zb��E_!�R
�g�M(�����q5$��<e��j���Ntg	��m���o�#��o�\9PTs(��6&��f)',(�jit�'���b7|�-��o����A
��;�0�l]qg����p�P��#1���b���1g���Z�;�r�9�i@Z�"��Z1��c	�D!c���QIIO%5|v'����X�F�����V�4�$Z�;��������.�����vw�O�_������������:�Xx������������u��������{�C�
�[+�p����C7V7Tp@��J;L��{���y����e����X�����)�9�]cB���5j��g��xw��"	���fQnHrO�{���r�C��j�|�
�98�Z`#���(�s�5���
z~%E���5IEL��9�������������c����+!��C�1�����������SEd�%��e�R#�az�}r�Z9��9�����^���'Z������,�����v�0W|Z"��D�<�����Af>Ll��x��~/`R50�8��82z\����!����7I�wc-G����u��f�gD�b���g�Z�
��j�jad$V��^��Py�A;m�[�&�4��$���
c)���x���7+����F��,
��c�2��2���g)�!G�����&����`�R|n4�v=NS��b�0ekJ�'�����@���`��X�<�)����������}���,��.�W���$���a��-
]N�kK��@V�Z��Ul%��O� ���-C*PAq=S��1;��9���?-aJ��������\����z]Z}��qXm���Gi!�X�C�������7��!'5X������1,���c��������_�@"0nk��	&f��U�F�D��H��f��/�xi�kn~�y����"�����`=����h��m�B
��F�c!\������u�u�0&������Ax���mb��1V���=&���^u�Zk�^�WXF��5fx:�G���D,:0��R����rF����p[H*�_s�H}+����u�&���2��@�����.Y1fE/f����KQ����C�P�l���;Us��#E1z����i�����5P9�
��45\�.��.��e��d.0����%H����}�����Sv�8%$!��HP+��(��2}�p?��w��-d�E�N3W�B���U\=Y�
e�{)uw}������xZg��������;/M�>b�-Q]��dB��k��6\�W�M��1�I�Y
�#�&�|��f�*I��&sy9I&��i����z���1S�������%�SYS�XEW�V��P���7���`�e����0����1���_`��_�?��f�������Rt`K8`� ����hW�l�
C�
�������;��c��*C��r��L�9�v����:)_A�cm A��B��K1�����V�Y��i��a��3��P`:��v�wJ��7T�j��9��hr�R�%uy:9V���k*a1�`X��d�qF�p<`��Pp�*j�n��Dbh��0k�B����8�t�5�}��P��$%�N�%���'��H"2z����G:�|�\0���L������Z��|��e��������
e 
X
l�#�����#�3V��2��`�+��H��vCJa�J�(��U�
��0�n����� �|}�]��/2����|1���q�
��"K>�K�z��z�	.��~�����X)j��z�8#�f��9�a[�����[���TXM��P�PXu�k�:����e�'���mj[$#��Km�l�q+8��dB7�D��\�k�gb�T�����������o��RO#�S����>��e+A��qV!w5G���_2�m%��m
��}��DJ�P��i��0��8�
�p�<x��?���o��|�������������}���=8����=��tS���=:��d�
eE<'���@���������m�E��@������#�S8�.���N���c~'�?"��?<T\������v�
�������o�0'f���!���6�0�
7k��XY��;,y>��v�Z����\�� Kc�m��zI{����,v��~�r�}�����>��z�V�'�A�0+7�7E�'�,���H��O�Q��3x��:�������h!��ia(��N^8;f��������A�D���JMf�&X�*NsaP����n�u�������I�4�)�p��c�#��<E]	'�S�H0#[�A�13f���f4�����+�c�Jg�6�VD��Ya[xC��98��D���\LI*���B���������7�S0����S��f�y|�����w��(�D�A��������6C��r����y7��~	E-�IB�LmH��b�l4X�1�F1�f���z�\�KX����I	��BI����A2vF9.���Va��)XN(�B%�#(8���>��!�}�BB�f��} :�
{.���K_�-��Oy��	����D�m�z��P������N$�c0
+%�������G��������S+�*>i�]Xg�N������Y��yIS��x��k4��F���{�o+���7�1C���#���j9EL��Sg�2�b��qX_�"1�!;KC���)��Lqt�&�ay])�xN#f��.sS'V}I2��t
��^*�r
������q|�u��E�(�Y"}�=t���m`�p J����]���'c�Q1�*P�����OY�n��\��A)�5�a|���m:]=�n�$q��N�Do��%��x
d�)Z=9��4:w�����S03P#��;;�K���F5G�I�hrd���L"D����"A��h��o����?T�(����%�����%�s��y���G|��O���R ����&UJ��O	z��F�O��Gg��`5<X��[�����}W$�lYE<YJ��:�]"�1w>�����0���u/
��$�8sm�O���;�oO
��G
V��OY7�A������81���i��������9
��1�9�@!�: }�4(�j��}[*����Ti�*��2p
��
������m�pL�Dw+�Co��U�3Yn&5������gc�l�9C���fD�>[�Z���8V3��^�5��?��4)@��?:����	�v/��9�����.�?;Z���-������S%�������b�M[����B���������@�1�����L�K@{ �jD���t�a��J�<�G9J7��EP���I^����c��x �yG��0M-�`�vH��Fx����LD�0�$2Y�g�PH{
]8H�o���tHQ5�GtX�������a�(Z���
���K��������J)$����+K���g,�4
S�	63�?���`��6����.[X�R�������c@P=�!�H>����L�F;�O.�?����CB�KPe�R^$�A���~��?-��i0����B������,��������������������"�4�>���8
f�0Q���C�a��!��u��`](��$}��������;]/�?����w�l�pv2dUW|�������3R��(��K�yZ��S�?�����P�D�'N4���.��s�����]@{Z�9��<��we�)��j�\tt}������>�kD��}��,>~��q����ks<���D�|HN�*jcy%ND"U���������)sQ��A>>#6����j^&�t#8N&"����|����3�B.���0��h�wd��{��]�t�����2B2��)R�,K�P-�y3
�Pa5�5(������l
���8�����gF�N���3���(N<�C��Jq�6�j�@�K�x[���~a������x���W:�Z�tD��e�|�Y���_��uX9
S�u�����3�/��%�;��O<��(�v��jA)��T0i$G4�@s�3E��W���J��	#$?�CP%���X�Q�4J$��4����z�(g�x�1�7r�XhqY������$��W`B"
�����&r2��(�X3����}���}��Z����}��>����_=����/���|�����O]���I[�t�nK���"�7�0w�3�y��<����A�����E��J1u��&>����-�@7}A^LNh�����T�'�)H�0��|���u.Z<A|=:\<n�����&[#�s�j�i��e9��<	K#�`������t������8�C�4�3��T����;�!����=I��N�C�����0��m
���F�uh�(/����"8_?x���Q���'��_��7o����y���b	�$r.���E����RB�5���~�t������	i;LD�;!������b"N����;�s7��`+��S��d�A����#���	�������,�����/�x�?�gw���mJ������M�Q�t�d�i���/�"�a.
�����|�N_�E�cq���R����K���k%.�_gi��Q[;m�����<-9W*5���a�`���������]W����p���N�e�(B(�c��U(��S-�h2�����A�_W���X�	�y�5e�,f�)�5a�����^SX���u:����O$��[h�&n� .���M������g������<L(��	��k&��2���������0F��-�����q�K0�83)�1A�����/	���}7�$!�;������ 
�lZ0A'��f:�
��d�Az�s�)�������AiJ�Q�J���i�\��CX7}tF�8�4�UU�y���N�y]Dr�XX����������oN��/��
���A!�H/��9Z�����)���3	
-�����n�z
tEGh����#��n[�g|��h���7~���tz������bR\��q�_o�|}������~.�����im����BjoJ�t��D+�cw.�� ~9�n� 0Xt�-(gMa���&SE��h�RA��V�Q�	
���)mN�9��R�(8b^�*g/qL:��1F�|F��9�Y�`�`�|�r[��)����b�-u�0n0�;I��f�Z8[�{g�>�f�l�9�P	<2���P�����RJj��;E���I�����'�i��j;x�>�;����F�2~y�!y��������	
P����	xq!��Bj� H:�������1sU�W�l�%vc�K��uG^���:(�KE<��L.�������\���Zr�`�O���t�C��S��4���?h�1���B���p�E�qn�<����i!���}K�;}��;�P�;}nq{Z�������F?Q�I@��5y�	���|+��NvN�X�����t ��3�;�4����)�k������x����<���45�E��	��8����
�?!hG2��9�������b���/�Q��4����0�c�{���������������������R�����<~v�dOr��'�Jy�I�Q�:(F��`�%&<�����ow���?�D �?JA	
��\���[�QqiJ;+���K����_��]�rK�I���v��{rj-g�A.]���!�|�CU|��,s|�������I������.-8L�����0�f�$\�=����gc�RW]pTu�q|���T!n"�L�G���Y�Lj���Q[��)8*�����gW�w�������?�������)����O}y���KX������{��G�}����|t��/�������/�����?���S��o��_s���gW2pdc`�c���WW_��#<��� ����{8��aPR��m����,����^�[�a�8��ZCYZ�������.$2k�>�2	XYeM�����/~��fxk�>���q��A���/={����oO]��z��]=�������6#��'���'/o���u�.�x��������_�pa��'�u���K?����0��#|��O��K}����;���%����#�8���S����g�.�lt��_X~�����<������y0r��l%�b����J��(�K&��e�>5��U����������c����?��-��;t?�-���_J��.*���a)�����X<��{v=ag��o@x*���b�#�B�t���
�p�kA�0���T�����g�
5�-���HIs�+���T�c$1�T����J��pYt��t#�m���U�b��l�j�Z�~��go�����[|�������^P�@�cKbv����m�~����P�-�OA.W�����y���3�Q8_��r��0BI.:�/a.���J�8CN<�	L����L�Yus�m�B�=Z%G='��f�e����gV����a���NI����A'��[i�������,m����.�/{Z��SD�1����H�_T��i��/���.�CB"%$��'����J�����^l��,������'G�PR�jnc�����O���[�����y}���&�����r�$�L�2�;�LK:P8@�&Z���+�-;��.�����b�
���:/M'��dHL��d��,��-�B��
n�J�����ml�>}�����:��������6�#r�p�l0�3/I�	�������@(4�*
�*E����L.�4/�D�Tf���b@ �{o���`����qJ�C�M���-�?��P�_p�)�{EtM�3����<s��}y7�-�R��!8�M��uB�2��+��E�[M��-�H�"��V���i�����hwG��%#l��O�S��-l1�Mq�!it;m��0���Bid�}?b<���F���[�"�\Z�
���C����;\�F7A4�����Ud����&���O�>���O�kG2��2�.o��6���#��1�EN:��W&��9�;�,�x��r.�~�q�P!��0�o��fk~k����������,rU���-���X1T��WY�$�5��<��
:����� �<7)���c�T%XY � 0\�F16w���8��g����e[Y�wTW'P�H
m���R�
k�U��ri���n�5�ZP,�������,}A�X�����x	�Z����5.*7��/���3G]����59$nI$z*�%A�F�V.<:��%����O������H�����������m�����7���y3 ��L�A��k�w�L����������E�����Q4��>�
���|���E2������
h?{�����}L����1�J�a��sV���������?�<y�0#g��L���z�FG8�Y��a�0�#S��Yg�/�vx/nJN��e:N�
���>Pi�Ua�@EH�g���vl��L�St ��
1����
aA+q�p.)����h}-�L�{k4n�
`+�/]d����A��Aw��s#s�F_���S���\�
���N)�&��f�c���!��x��To�oc�Y3F��W0uJ��8MK2#�P�Af�0>�C,�$:{��`4�Y���3�_��P�N�hO��4�����y	m��7�(�4<�0���}�������`8$�?+��������^@�dD��0^����0M���V��{�{�X$��m��ptb������<�*%�M=:�eu���Pd����-5�|<A�u�Z��I0�����u�/M�+��3�&�w[R�v���>�AVc��9Vs����f�i�v(�
�<�+�t?��Bz�bSXs��{���v��>�0;�
l>���K5F����RR���N���0���G|r)�I��[���
�A=��iq1a��Q�t���A,����f<m�k����L`�y[	�XP`�6��0�c=�#md���J�8�b�N���-0����L����?^k���>8����������qL���N	��r@#���R�c�T
sP�;)�Gw��
�}������
t5������3�U��|;M��������f>p��D�b�.N#0�z�/�V���! �O�cjq�C�4��	Xpk������?��5�z_a_���fT����$h�wZ%:���44U���h��;�U�$�	�~j������.���il���0M���R��I�p�tA����E��3��/z����2��b3k
 (jc`����m �*�b=r �N������87��3.C�.4���E�ol;*�4FK�#��`��]n�R�}��]m��G���� ��P�I�������-���I���}u0i�v(�Q���b�^�c�hu�@Br����)@���|v��������7W��yx�-5J����y���M�����X���c0=�`���y��0��`PJ����@�z�\�)|h���8����}�p7�%Yx�z?W��B���V���1X�{�9r�Xe;��E��0�����p�8(��Npp+��c��c����e��m�o%NY����9���X-;��g���9+F �t�+;&�5b�C2�yuEJ^��r�����(�Rc���f�h�+��U�l��oQ���w�����-�AD���s�z@eS�r$E��p�:&�B����`���0�m��$��M!��m�1�n�9�����A�����o��_��B�t���&%�u������`J��o�������}��v&��T����6X��W����(�
"���C�.�F��x�=�p��Im��]}>(/���o#x�S�/�V�@ �0���=�]���"T�)��Zr�T��W��$�Z��\��#������>n�����)������zN�+�#��md�x��L��'d)�����Dd��p�S�J�p]�|O�M�����l���V���A����������O�M�C��L'l���u�WSc���S���������q!���s��x,�o��=-�1��L�a��L��3�OY��MDd���a����!�K� j��">�Xx��H���1wo�������������Q�}���985%q���KV���N/{?J��(m�?�f^����%b���H�0.���j�	Xbc�N&2z�4XY[�|�� ���a!�f�:�$�PR�!��re���K �t����5���J��E��SmOY*:J����2������y�@��3��Ra7#|	��v�3�(T�@�U`�)�'�&�����o%/h�e�����EO<�W10�u>�0�Y�YLlQFc-���S�ukM����l��s�|��e/��|���XIX��y��������K"(�/��	�ISn��	��������$,���'��G��UG�~-5/�4J�hWRL`5��rv��&LX���=DI�+L��\j9�L��������uCnd�c��/@��@F���� �n��(9;M�������+w��e����|���������7���_]�����.:t5���B�����C��8�}�.c�����F���1��:Tr�����hi��EK�2�>
%�A\��:u�J�"����6l�Be�\����S�R�)��;hh�-G�]4�	�o\%y1�p�4��`4�Q"�����&+���D=�Z��,�T����BjK7i�ss^Q�W���X��gZ�vL��(�����R9��NQ�w���-�z/i�X�r����5NT�w���AH���	i;���]'$����h,���w&�9���8����&��y+���N*y��>i��7Ve�?����[��/���;k$�m��v��O���i���}��	���������@���e��?�r�;�R�Oh=��(�����Vm�}�N�??�x`��� rzp�i�u}�$j��&
��C*�.vk�q8���E��B�g�%����N�C��^0!��$����k�`6h��F��1}pG�;�� tuz�/�5�L.�0��y$�Zn�p����T��D���C���i'��:������Mk�0i.�0����5����7m����72��������h�!��#E
�Dj��F��A|+��o����:��,L�,��d�G��b��BEA$�R�AP6�qX��A,��D��o�T����!��aY�9�!8���P��B�FQ�.+�T�3���\~��K$�4U�,X��E=6E$��T���:�a�/�4�k�Cwx�&�$eUX����X
��{`{��=�)�jc� k�#2�&�)�
�V	Ga�������;f�&��	�:��5U��c�f���
�10��c8F��<N�f�8'��G9��_�T�
������8���@���aJ�;R��`
bc��)_�o���o��-)e���^�lZ��L�U��-�����>V���T�V�p��EC|���JO�CNrvBY-�{VC���c�z4I�S���r(N
���A�����&�b����\S�����V���)
j8��<�(������EWM��XP����sU��e�b�N����C���rf��.��G]��%�Q�F�d%;��qY�=��*#<r��`�}���t!����1��I������,m�^]���� c~�(�+����}(�����N:Vg���"��s�I|�<y�}��\���������<��7E��^���d�	��X�	k���O�!nC������0��T�9��b���o<�4N*�85��2�uqN�����F[xC�H����:�7�T�Y��W����X�1��6@2��n
����v��=��f��������=��a�K�bk����<���-�E��
�D�������@�j����\������
�8u��y�,����Dz�� 5� 5���Oa�d���;�Mh��&Ct��8D�U�!f�*��#G$�A��L���5-���a�2������)�P2%�LO[��&���V�����A7`f#�Q���
�QG��,�~���m�
�xy�S��o^�+X��a4�CVG��@����%^��Ds�m���0�sv|1��!���������`[%k��z9�Ac�M��7?O ��m���b��
S>,�����jn���������7�^��0����O}z�������������.M�W�X����Z@6#��i��&
Z����6���@�4Q�%�f���R��
Y�*V%��8�����s~�d�b�F�^��wZJ���{Dx������Yu��WV��'<�����T��6����S��T�H��i������n�Pk�{�
���F���cv��������%��K��#��mXj���y�Z,+����l�]��XP0`�X8b�Q����h����T�eT��AD2�$�JD#U�l�#/m��=OFt>�.��i
h��6^������4�"
�����P�����$�������w*=~xv���w�x�=��"���{���������������lz���t�8���M_���{T�	I%u�t��Gp^Q��,��h�0�
Z�1/M�f�S����|�7�.���c��m��ZkG����i"�I�T
}��E*,��Ax*����1�q8Eb��(W�-��HA���oC9E@�qb4�i��B�Z�����.NS$���4%�v6}��������,}_�Zq�=�c��������q�9���)t�":�_6�7Q��>Q���vm[G�����8��I����?0{���!��n0��E����_��	M�!����}�o�+���@1�o����"�[t#�s�q�F��(�=��c�p��e��W���pK��7���r�����x�j+�����RqF9��IS	�)~<�)V�cQb���!_UFR�t���f�`1od�����u�R���g3q��*���{�'z����G���W�������|���/.����&^��+�;"�3;=��W����[�g�T�S�O�:e���S.���p��WoOy�����p��yl�SK���N �k#m��K*��t�����F[�X�J���)��95���%���l���g�;X3c���,��.Q�I0!b�V����d�%��(�r ���W`��#��*���w�j�x�,�����6U���`���b"�e�nge���������L%�m]�������oso|-������KX���KAE^*A�'*��2�W�~x��i�~���'|��:p5kc�����O�`	���?�*u���F�V����K��w���h���sx���A�Fh{+���{��Hu7��������s)���/N�	�M������W��h��yI�`���,�&��q�PT���	2(��5����8C�fnd����
�����XO�7���7m�d2�������*�w����p��#V�����f%�O��P������h&xy��v���2��@K#	��3�a�6��z�a�5�=��Om�����!�$����x�2�D�n���U�?T���-[M�YGb��e�c�4�����n`�Z5k�!P
5���1K��X�e@�I�C��5�qp����N�����>�����g�{l���(�h�J���C�D�����c}�@@>O�eG��#�O��!����@~��\H�vl;����#7�>j��	�W�a�|FG�����#������%����#�$VO����G��"�^��	!�)�
��}N�������F����	,�
��4{)����`�&��C����s���5Z����Z�]��p��`rY�����sN,*�F�=u����/�c#��(M�	�1�v�(dh]"��n�F?6v���7�I�_��K�!�HW"q��t�I����"<��;7��Y�.�hP�Bv���b����85D��<dB?���C��j�i�1ch�[��"Q%h�#K�-[C��5�����E��Iw��ZPDb(�4L�]L��O^HkW��X6���B�	�:��a?�����%ef��1\������2F��:��x���n�N���~r��g��w<����5^u�������O>>;���E�+
7���EWy+�.�qSr��&��QF�IN9��z��� ���v[�IO�p���3�S��J9I��9<�����7�E�f������q+�%���1K"����[%�1������?&r���Y�["��9��������>|m�VrSL��`�Y�M'���n���(V���3}�n���@@G�pDQ����K��~{��2�m%�v�%�
�=71��9n�N���r�}{�O+<��'�~��|g~���;a�5~�QD0��L;���g������2���@t�-�,�f��r���CmCV-��|����6#���+F�3�R�%M���rd����i?#vv9���ke���dNK��a��4�nV�
f��o�_|��B�jQZ�x+������LK:����]t��U�z�)V6�w�'@�W�O�-��1B�<�4�8��.�D$X�����^*������F��@��d��@5���G��������v���K���<Lr+T����K��&'�����;jC'z��)+`���?��4c\����t���)z�;E /����~��/��L�������O~xr�-t"\|G��G'�t�/���0o����,,�2/��8����X���VM#X/����DZ��Q`�^	�G�'���z:�D?;�L�Z%yKg����	`���H�`�'9w���1��Qx��:�_���{������}AX)�������V�[���"d��_D(QK����c�nXK����]g��e��L�qK���pY.�K*g���9�*k8�������c�$�����(N�����e�\<������'��
��4���T+�����&�}��:��T��n���<����I����"�Lh]
����������\���GD�B�\�vKx�:���R�
G��+AZ�b�b��������
k���������w��j]7�����dT�
��%��{ ����^;4����Z$&���`�9Nx���@>����H�f�3��k��������~��}���jFw ��<���������g0�%�J��e�c8���
�
���A��s�\�����x�ZX��K�(��j��\���#�3G_o���[�����%�u��z�c�.�<MO'j�(<���[
��\+��2�e73���`/��k��K��P����R�=�t������G��-n�m�_]
���`X&��n�U��������w>�{�����|�����y�n��o��-�
3��)���N�t��O�<�&���/&NO���[H��N?
�K�����<7���[����N0-	��<g��]/UL�U�$���!b)���O%'L)>��whk��i�--P����b8�pV�"BC��t��h����T`i�&Y�]-Hd�}K��?����5�����Tl���h�[�Q��&%zV�
��p�x���w�P�n�*��)������������j�l���z��Vh}]��O��w�v�Gd
�h%e���m����M�w6)^���W�b����j����I���U�P�8����������������&�2��0x_O�D2��XAGz51
f���K[%���A:���Th�f����:�,�r�|b<2�i�|���}�
��&�aV��,�����f�zi�-������gOoO��
�:C�$��l���n�oE���U���t�^)����"*���Z�����*��r��q����V�����gL��6�^�����q%�?m�7���B��C��l�v����2_�4�i������zr\R���l����xw����4�*<n���T�Z��m�7��������}������q���!�ZU?���F����?��"�
!��_^<�����,��(���h�����������G�Rx�H8�R���rV���y�S�=��4U�u��`��R
3�T��M�I�.�(�@�_7�������?� ��9e�/�B)���'��f���au�����0���2�$b5F�im��rM�b��:�$�1#yd�3�Y �7}�Q<R�nkL)aD�i��|F�I������
b`����:Bm	�����U�����Nn����o>������e�f���������r=��*�/HT�9J	:y=r[�����_\Fh��oN����M���?������/._�`-�kU��Y7@nPu
F���)���1�H
&j�*
/ QSJ�^����e�Ev����b������M��f�'m�.��uh]X`@!��[�a�O}�r �����>9����O�;�B���'[��$��<e���S	��H�$(r,��<����������TY�<C��r�
5�������&���m����q{�L��0�U���������h�Y���������Ji8Au)u�����V����q_��z����U7��P2�����f�a��>� �����:�]1m� `��lQ�q*IL���Z�6JKE�Z��x�(�iP7��C�/�@I#��������P�%� 8��:������i�0a��r��r_3�'	�r�?���n����S�r 
��iH��Eb����7S&7�5�%+�*�i�0�,9��U��M)i�V��,��a)�Yk���Z;����d�l�>\�R4��
pZ��R~�?�lRo�[����"�S!WNs��s��@��q�'<����� 0��Ia���n ��
^�m��"�qX����'��0B�{RraAzf3I��\E$(�� %���IJ!�v]?n�=���v��f��57�	�+Wer���a������D2.��j8�V�mk[vs���+�2�a�i:���M����B��cs.����E��p}�J���j��m���\7�cQ�g�	�m����^�k�
�MDm�j1kp��}��D������<�tX�����o�/�w=�)"��G+*��/���n���VP��	C�Xps
���������)�I��f�������+����C�E���v���v�f^_��RX��)��������>��z���-�ag0{t�����*���[\���sq��;�����qj������0$�^R��40S�8��4��*o=h��9��9MMa��3b~�%�2�'�������+�2����y()�,]!X-�#���&���3�K������r����I_�I�	*',��C���	��	V ��a5*c-6]��BXK.��.������&��Lk('�:��z��y:w��e�0�:h��M��0�4,�Hai%w��y�D�Z���
���o@�.Px%��Fx;�7\9gC=9qOC_�����zYsJ(�`�F<��Ct�D�-��4�������m�J��+9�0
SC��BVp]MTG���P�P�{,a��2���v�};��A})NY%~a�\B�)��!r��p���[�w���9��9X��V�8�(@L:�SF��YWq��	�'&�ZD���y�i
p?s ���L"fss�Va��>W����K`;���|��"�w[m�����F���C��T�?���:�/P�R�
g����P�dB���6���Bq[��FS�ly���"��e���d�$�(����W�j����t���x`��q��B��m��u�d�����%7
�]���Ca[�)�P�v�L�m�y�&�b�HpC�,p|�c��*[�������6��!�J�%B�G���Vo�����ov (�,"��R#���"�Vsm+��a4�@���]�i��Z���X�:)��+C��=i{OQ�gh{b9=fF/�����E�����C�%�veJh�����(4�������*
���:�C������t�Z��7I�5�_����������IOM��Z�j����N��d�s=}FDBcJK���d�R�<I��8�����5�`���K������S�D��?g^�3E���*]�`�����}3��i$:�85���GM#Uk��!r�P�t�8�2���z�LC�N�@�� 4�F���)1qx�"���,�7pk�Rk-�	G1���1�Q �G��K����H�D4fn�n�q��~����'�bN	��`Q �����'TG!&I=��BY��=�����@�����*��[�;GC��:������S��rq�3xF�n��7���HK�����*6�����6�RJdB���^�UI��b��j��^���M^�-jr�|f����I@%p�<��!�-��`�B�t���Xh���_������1���vv��g?;��cS�0Q�n!k���"����a�8l2w&��
�R�B��7��?����9-#��L2�������4X�K�a�*�$����}&�O�
�SI�?�������� m�Ws�����0{0���� �~��=����`�����<�?��t�X�����
E���t����?LN��C�q#�">C���L[1{��J�����r���6����<r0��!a!3cZ��c�$��(cjF
cVh8���b��� �������d�u�Wl���e��B,�Sa	��%M�ig��&��G�����w�c�\a
;>�����a��zKv��\:0���,��1'C������)��"��6��4�6��s?xv~�(6?x��/�%��U��K����rr?�_��+-����in<���-Y������(��=z�8���M����p�6�\�6�����g��*�	��������}��k�?�����9��h���k���j�����I-�$�z�r��I8x+�%�"j�-,-g���r�5 �}��F3(b���u�[W$G4#I��r�����$*it���`������9��>%��+c�s���|W���Z��j~W����1�������W�j	C��&�c�1�iKB,<��g�oz�D��R��`��E�L���i�H�En��L����J�n�u�}���w���S�x�.�v�a�L-�*;c>����3F&3����>����qH�?�t������:���&�|����6�GP�4?����~s�m|A�����m�	��F+�,1O�Jv����Vy��I	f
&����"�i�-"������u[5p�d�6�,"��g����aLc����Th��tj��s���sbq
��9yMo������y�4+�
)#1��A�7��=�����������������o���>���h3w��m�y�7�����@�Fm����<N����d�i�*����w�r7��&y�n��Q��e�������C�(���}�2�5H}�I����!��TN(�)���h�@`���%e^H�t|��z��������7�r��'�����n�pJ��[B�*H=��<��3�$�l�jH��y���I
+eRq����Cv���������#>�D�"cI�J$W��?��'�����y����>�������_��i�h��e��	}��001�d���m�Z�aMRm�=�L��e�/��!�'��������	f���GF����\�1��"����[�i��85y
��@Z��2�/t�
r�h�@9m3�e�WF�1;��p�;�~��%��F��<-���c��b����������&;������i��J��F�bH./���N!l�������������?kjucm3�A�W��\��H.U=�\�dMU1'Z����Q?#�4�m�%��\��<���5�.�����;Q/Lc�G�n;q����"�v<��<�����Mh��st�E@o#����W�sF������e��4��qe� �e��e�R�X����l��!+��5���V$�x��b���2�&$�dV��x����1r���Q,�CH�����>g�
�zL�Z>����|`0�������N�����h��I�Q��@	m4njT��+������\D�!p���)F�U����T�j���9NI��bs�[�Z5�v�&�x0�H������"�n���
T�����Z[��5����#����kwf�3����kx	��u�����r��Q��
�/�����������[������_%�'�	������pkX����3[^QB���z������U�X�1z���!�C�f���5�c���]�M7L�z/3,��0d3
�=8��s�BQH: �yI���?,��!�r�L��2��s���R(K�%���'�N�k����<a�������ZH����I�?���;pI��K�����9���
i@�*�T��!k�d��ZL����)�Y����L�����-V	� �r�lC�J�Y�,��a�Y"��>�[P_��$���U�If��-�X[����`h�
dNN���d�j�lY%�Y�fE>���!����=�ve��v��}d���mXfc},R�� ���J}UXf�L��	�Ba���2�����S�}E,���M��2,������������
e��U^+�8�`d����CUM��{C�#���G^�"�
u�)�Y��>����6j'�T��2���N�aO��N>OW
���������\�b�j���&��^��D���J�#����7qm;���-���D�J�f�~��S�%1C�{���/FYso���������v���g��;�n&l��u�3#39#!l�V�iA\���c�������%i<���i�@?����]�N�z�y;���3����hx��LI.$�0$�E�9t�x~�MhgA��N�� ���SI�D��X8^q�?\�����1��\9����.h�}7]_5,�t
^���K�_��w?����o���?�K@�>��	��v���o^\>�k7��3�+�
�!���N0���h81���&$�4P����yN���������w��c����&w�"�jOM-�n�Q��C�C����$�vpy�6&�2`������9��"���Hx�K��a�U��-�1�������(��g|y'~����mA�9���������2��*`&,d��qK7��X�f��\����z]%�
}��
��b:�0N
Vq������*Se/�s��������O�����z���<�2)Go���Z��@9���@���.��
�0���e�ei��y:2�N*��Ht@�Y�R��`A���q����I'��;j��c3At��ND�x~��:T(~���o��D)�����E��;H��GB5��6L:����H�4�5i�g��.��,m�(r����!�F��78��[�a1*���pl�q�3&KC^���i�6�q�'� �fYl�R��� ��6��(��3�$��j������I��������E|g:cFY
v7���Q���P�����/>�l*-����_\#*(�J!i94�z���UZ���+����Zu�_��<�����K�^|������n
��fQ���� ���/%�b��$�kJ\�Os&�������AY+�`�Z����j�����{#�0���P��[������9~�Y^�nV�Pv&M�����@��@��j"G�����B;y�rj;�
���)��T:�<��#F�u�8������G	���'8�F�|��7���|Z��x���P��9����#$����8�����<�T�S�N�8e���?4�_�j�	]-�C�M]�w0��#;�8|@��h��/)m_�\�{F_-GAA*�lE�A��`m�[P[�c����i�6���%���yO��\\>�E}(		�F��LFtg2�]��c�`y��7mR��n
e'��~��9%�c+Y�aSE�mF0����`���������^A���N��_<�6w��<���Q��Q% =zWO�a!��eGj�|{r������I��%�8��6�w1����	�����
0���L���&`����0md������s���J����E�UBR7��`
n�
�6��S��.wS�n1�r� �q�����$&s\!M����z�����"@.�H��Cb��IL�f�3���8�<e��@��>bM�"�cI6%�XS�v �������Au�i"$��%[�Ei1t�>9!q����U�UG�n��hK�=o$��@l�|p<��)WS�	G�v����B`����
��1-�q���`�������qbD��C�;����1��<��|���;��0bv�?��$lU�n2�������5k#����M�>z
�"j�*������"��c��J������-��,��@���.���\� k&������6G��77���Y��R�j���.A���x�@�b���@�, ��@�}8�=A���@��S���ds�/b�*�#�u��tL=��[cN?$%���v��r��������C`��/�i���E	���X�{��pK�����nj�`v�#
N0'$u�j���Vy9f���,���V��������K�#d��4�m�8�F�,�xi WVz�kJ0�$�X;�r�K.�������TC'���MW���T�?G.)��v������7�\Z�LT��'*$\H�v6LTG����e�����j����Isf�I��>a��1*#���	�_}s�Ts�cyV����L���$4�C�;�0�(d�.<%\)GS��|����[���Rt	�S�Q��b��,�2���ls����a��k'A��u'�]�]t!N�I���u���#��"�s���X��������nTr�VXL�5�2�B�~PN�)��a���E���B��-"��\�,������� ���k�V;����1�����n�]���Aj#<D�Q�R7~�r��[h�������g��%�SzW����j��=��"�(�9�(�>�]j��[i3���]�i]�N��+�R�g?�PZ������|����Q���O�f� :�hqt�H%��z3����yv �G�������
U�JqaV���hhW/;�	y�����5+�������Z���<�|�V3�)���!�������y�7�FE�y��>���X��Pj��"1/��<r��-�^d51��%,���2�m���Y's���)s�f�+6d��,j���Z!�s���{���-�eN�(fN�2S&���m��\o�Z���+�����@�$��g�F"\�{�hBU�m�7f��},����hOL;��s���X�:��l�K�W��}@�3���5`�����cfe1^�$�$MmB�������~��]E��1�R�C��S��;D[���nG���-��'�TO���ps�<@-���Q�&��sbx�[q.Ht��]5�P*�N�$��4�o�1���jG�=kfJ�N#-��Io�������YI��9��S��h��Cj����h�f	V�ce�?Z��:�'�e������I���84�Q�F���0�2!8g0f��~���@��fcZ+eX����7��Z�L�f�rXuM%��|j6���t)sx�����l D��	q@��� ���n(���f����3�{�(Q�)�c��d��D;�U�l��K]�X��E��O92������h�X�R�y���b�l����gBg3Kh7���y:�Z��|��l���W��-M�<c�z2�\T��%7��m ��(V������*t�a����������p�:�D�$��W�DE��������>�/�W��i�/��^���������F"��b���=D[��.m��x	<�CYP6@�&S8b?E����A3��
=!tm�����P���PO1���JjG��,��tFz]�:��}��tzF%��M��+�IBt����!�tT�����pB�8�y����=������6�7f�0X)b�H����s����+?O��(��@RX)�HN5u���q�� )�eBB������0�*Q>��Fh{/z���3��	0�#�+GF%66a��3&�5���bXE�-�����J��j������7��H���MX����&��6"a?�+'<���I���a=L����������U\3c�|��3���J�}��������u�C�����P�������4\Z3X%�pjD��
�$�):��r;��t�'���>y�%���m�,����N2JQ;��q�
y�L�1���/�}Wp��Z��������;����pc��mUX|�5,d��n������`��3�7{����
�/U��S
<^��Y'qY#[`�������#���"�S����;^�ib|v�|�:��	�g�d��'�Y��^����k���t���u%�3�����2���D#�T���A�6�H�Jf�o���h1�^��O��:�9�
��!�s��y���������v��:�2���,�2�a�/E����K��j9������cWx����2�!{0Q�f>k�h�E�5�4Hv^pY�W�e����+��R#������Jv����|�����w~�����7Jz�]�Oa�����1F1o�~�g�����R��L����TIg2VY%9A_���d	��kU��2�-3��������&m�������Se��a"�g�ZC�`�&��y�S3K�D@%�o����t���3�"������4��F�
a�`D�(K�k:�� �M3���-%�����[X`1�9#�<T��vw�S�^|g6���h��2�J���x���D'Z0��erf~hC����|~�Y��5��9�[�s��c��	��4���"O�J1L��lLD��;��������e�'����^�\,�L &�$�vq����C1�.$F@	����?m�F��U;�0e�'�F8!����I�d;��f��d� ���M�	b���T�'h���t�\�ql�v"��b�M�Ag�)��^$�|��/��s-|m���oK;� ��,�X`���d�Q�y�e�D4����rj�Z��%2��S��j9_��f�k<j$%��?��m���j�[���q��Zx����?�IO8M5%��o�����?��}��a�O�j]����������<��~~v�����|����}�j�s�
��7I��f�N<P��1���"��U�]�*���r������~�������'?��B	?�����}��F�5oWI%k�����\E���S�c�Ep�n�K����������~"�Op��?T0��*��i3���Aa�_
WP��Y��rM��b���8���y�r��n�'4a�g�`��B�&P���Ic#����X��a/��
�n��W��-+���e�6�������K[�G_���y�*�C��s�1��g+h��H��g���S��E�������h^^��F2H�m��������4�B���y|�C�,LV��Q:�k��Fr!����H`��$"�h��,�x%\�i��]"+!����+}*�y�$S�\e�be�#�ya3�@�#lg�@�a�����uh��m���o�N�c�����������[y�Y�U������w'��d����������~B��n�U&�e"1o�K!0�%Q3��`l �F��a���������Na,�k�"��P���&LO�H����FX���Z%u/����5�y��H1`�e��Z�
�v>�g)#*���;.��x��w��
c��*a�v.�Q�L���;"]�K^\����Z�UH�y�
��*m:�!��R�A&
,Qa=+�L��9�CkSp��%�t�yZuF��������B�V�8�C��t�e�~���x���3~Z�YT=��J���3��6����DK�%�-lx�]��
��h���5F����t�����gV}
�����%��`��*�,��/���0KL��N���^`c�J��o:	���v�6�����]���b�,�;�>��L8z����+��|��J7�]K;j���7������n�(C�m��sNL����k�(ev��s�g[f����`e^P��}
8����s����[�g�A���f�'7�#��A���� �s��=YdG10�h�Km����G%b^L�"�[��t}��[KIL��:����D�v��4��
m�	��(-�[��Ii!�I��J�7�WZ��mX"�bIG�������������w11V��&lY;��Gk�%��IR�.���^������Z�Jb�����Q�Z�*�.��R7�p�h��.��t@�Jk�=���R�f��9"p�e+��Uk'��Q����K�1-f����<t����B�{zA�0�_�������;j�[���Ua�<]L�5�	�1�v<��H���A��XKH�X�cM���t��[��\���
nn,^��"nnp�D�n3��=!����W���������L���B����w��=���q���H�����&��z�(�)�u�z���v@�\���"��h-���J�*����}[�TYo�mY��
����+�IrlT�/�Db��5G�����2<�#���%�a �Y�mf���cP=���3�I�Fh�GZ�B(%�����P�V�����Y�y�o<K$������D��2G�@J�24����*
�-a\��M�b�������)��&������o��v���2O0w`����y�~:��s��n9b�bAe�&�q�����AD�z����L�S�{F�=v@�����ol�OX����94�f.���K��S���b��9����t�\=�%\���&��&��L�>|9��,���4L�������LO]C�1X�$n"p���H{�V@![38���+�����o�t�&��	)	��6�v�[b��`��B;<z��2MV.��DpXx�a������
�
'l�D���'a���p/��Em�\iP�5���6�U�9��EB�����b��Si�����\��1�v�RfD�aB�m��j�34�#"vcY��S3�q���������Z���D#��r`]k����X]z���;O��Ep�*����yh�+�h�0sU o!�h
�3�x1�u�j/��E��F\(�j���+�A��7��R�V�HU
,�U0�k�Q��)`��|�_���3��{J<�Rz�VR��;��g=hz���L�J���	G��4��!��6�j��2z�3�~��r�J/f�N(����
#��N�&o5�L8�e���_������_�_\br��?;�������

=���i$c�q��m������J��� ��\��=Mgc����u�x�S�X��>14q!���q-Z��G>�����O��o}�q��$��?��������@��������u���~�������n��I���5�
�w�{����7n���������uy���1��|�����b�t?����O����k��;�����|k���L��������������{2~������
������7?�>�|��������|����I�����~��\�z������5?�h~�4?������g�i��z����k����������/����+�����(����Gx��}��W�������R��(\���^���	��*�WF�����
�W���Y�zV��������������+���?V���=%�����a���U����q���+�����/
����������o}���kc���J9�����hl�^������u{p���/\]�Y�������jY7��l��>n�s�������a������]�F����R������7����+�F	fW�y�z0K=.���#of&o]7>��m0�q���fiF�7�
�vp���^�z�������Q�����p���W����W����p��7�p��{���W�����~���[_
�y����w��|�w/�5����?{���%���k�����?�����?�����O��WO��������x���_��=����������b���O����������=x���I!:���?����������/^�u���c8����'�������_>~����O_~��?�?���~�{zzb�NO~����S�������_���|��g������_����������l�`n��[��GQ����O�^|y+�h����������m?�����n}
����.�T~�����b6j��eZ@�~Ls-]_����r}������#V=j_�|�1��3;4^?6�\�#V=�'����]�����#^���Glc� ��?b��;b����B��!��Gld?�_�~D�����s������z���|J����x����y�������5b1�F�yy�������G���������w�r>�\?�*^?��\?b?��=_�#��������~��_#V���������s�|�,�(�g���w���~���G�i\���X��|��������X�~?��9GN/o_���b���������[��;�c;������>���[uo��O���������$��$����}m�����?��������-���}��N��P����]S���Z��v����0n�O�����������o+4o\;K)a����k��C�����;���{�V���T�$|6��!H��h��Q`�
.6XS�oo'��1S�y���G>�C&��P��~"�h"�hA"��-�n^#�L�z<%���DFt	�b��t���jp�5�i�8��H�\U��"�0���1��s��cD������D������)@u�8�;[ji�(�@�0]����3N"�i��!��� f�Z���:|P�)����D�&��C�K/������!
K�����Z����/r.�1
�������0���T��/�Jh�0�����|��w.��`�m��Z�6����S�Ha�v������_`=�{��S�r�g�����gw>9y���wW�bU��Li��	�+'�W3�E����spHEb�(R\\����
���6T���o�yxq������S����������o�������U�1S�V����'��p|*����\l����}��S�?�*:��Rj����Q�?D��,i{���1v@l�=�8j\���]Ps��k�|p�R�q���n�H\F5�����,��L���w�u�������������_V��Z���#�D���"�����z��V1C��r�D�p:��N�4R�"�N�j��LT;�������@��,�P��1#.W�0�<`9
[ ���a+nB+�� C�,���]2��gG�>A��Mk�]������q�@����d���������$�	�&�T��Z��4�9�{8Z��L:l�������p%����Z�R��
��������HO��[~��!�	�����B��Z+`�!m��
��O;�8����-n��(Q��)t�g8�a.�����fp�����_�V��Xg��uB��@�#�C-Hv�D3�bG`��:����_����������?���u`fHB�)�NW8�[he�umM�G����i���Z$��3��?��9H��y�jG��-���
1A	���FDs��6b������Q�i�0�DRk�Ch���C|�s7����������geB�����sxgg�������_^���3�o�l[�s�:>kA�g|�`���X��f�%,b�sce=>�������J��v&`�
2��R�3�����$�~q�E��-��a��H�a�l9���i�c�������A�)i���b�� I\��Eh{�l�u�q�`w�}�<��i����V����0D�����(vQ�s���r4H������
9m1V������:lg�D����������]����g'��q����7��Z8a��a���s�#�v����[��A�4��
#a��3����������������e�y�����?������/._�&����9�G�~"�q�nz{���02�3�
�I3�^���p�93f �+n�D��>����4\�LIXSN�4���a��8�Y�����y����u����}|����>�o�~�\���������8����%;��L�2~*��T�SA���z����_��7���R�`������_�h��s�{��m)�m�������?W�7���Z�`������hkY��h�K����3�4�B��O�������N�.���G�EHb_���1A~OP�)9�������3o����H��O d�'�������PD!��D+���L1Y
6:��1WLJ4���9�`yr1�#rH�u�����7(����;RK�����'�#��``k�%E$�a�
%���6�l�m0O�q�'-�6���� ��v���9'5�c���RPc��,�6H!?G+��q�������i���i������������-h����Z��S�}�w�b��F�E�����s�L�	KA	�y�&��B�L�	X�F�o�1��6�w��B��U:�b~���_���q�R�<�0$,q
�IQ:����ID���2����ET���B��;r��E�����Q`����\&��xTH���
�/�e]��5f����;uBB���<N/�x-���� ��r\9��"5BX�V�j�W����F������%�`����������Y}�]n��\����{Y���iS�;.���^���]x�&���m4�'������1&�Ha�0��J�NJ��N�~�Y)��F���0��%ul���<����$�'^�~�����Wc@)�`3���2u-D$����H�B���R��`#���-����f%���b�qP�a=����dE7^h3���;��H����}���Z)_�ct���V&�Y���e�%�����08p%h�4���J�4f�
$��<�k���6$���������,D���`L�*0�
L��������sK���ml+��o���Y�U�9�n[(&�f�t���]'�5��P^����LG���+J���'EJ��]��:%P�FN������`������~������l�h�#HVZ��8
������m�nF�������� �D��������P&;��-z����^��0�K�@�l�H����L��I�V5�����tk�����X��SKb�����no�K7�@�<x���g_?������M��)�3�#I����R��5h�&�����=����4�z%lp�0�
�cn0>��_�Z�V=k|�e�jY�7x���ON��;{�������!Y��f0Q�<^�T�p�tA{}���g���h�}��e*3�@��'Ra~���|06<k
P��'<t�>E����P��3�W$<�9O�i��W��
UL��E&���P��4���^�9�]ZG������q!M���	�����w�V�a�k��^�/�Q�**C�gJ��097���T�/h�jP50r;:y;�n������~c:1y�EzW�rT�0D��HL��Q������6b�3���01��h����a���w�(5���&��buH�v,E�0�!�R�F��J^<�� pe`�)'�?���2���g*�!$�������MJSa}���%��b��c�Y�;��0���:�):w�5�&x(����&jz }��W?����O~{���m���=d�:��	&Yu�8
X�m�6��3����� ��S�s�b�I*��F:��P�����N�+���~	S�<O����7�e�8#�UMe���bDi�Q�@�j�3�D�A���X�����������E�q���$")�d��+A�9�S�f
�\�[A�
��9�����(����b:	��m��[����n�������y����mu�9X�b�2,�n�k�6�B
Yx�@�n�jd�#�b�H��u�0z���Us���qHbM5r�:�NS�F��ZK"��3�h��
��3�����f�3��b��oy �����T�6�-al�H����F�vihe�e:)�'�� W��+*���T��9w29f��b#m}��
��(�<��v�����(:���'��?���}�?x��7v�s�����\��� ��^nQv�S@�2D�E�6x �����"})O-�8fS�A�����6�d"��(�%���i)0?;d�1��\x�t�D\�l?s4�]��3�\�F�.������	���r���x�G�Ar�K����'�hgt%�������eOL6���BHcH���X����)fDC`!Z�t��zs��%���4���I=���)y�'�����l���T,���������<�c��:����{��g����?��<���B�?������p����w����D��r���H��_B
~�����m��#����#D;���%�@���TZ�q
�R,����i1��">��q0�5�<�3D9���i�I��mO���Y���KIz(W�Ii&�rc�3"E�&d�L0�h�<2L��YB��f^mo!��4���P�!'A�;��T;G�vr�<9�l(����L�b�2��8�`��Pp�k���$����{�<,������'����H���VpF����������T�	s0��JB��D��LX0��O��
�c��9�$�}?5�U��0��u�u&�-3Q�j�#a�����p����Z�������x���;�F[|gH1lB����
V�;!�S7��Q�����
V�5�c!h��r��������Sn�&D�����#����K�\X&���1k	���#*�,��yh�(���?o����^(�U�!5"TV�)��*��S�Z"�K���u���?Z,�Z�Ju[����fU��Z�ya0�k�2�E�a)j1l��6�xh������oF�a����[W��LiFLXv�?�'�6ej3�����v�\�3/Q�V
GZ �Q��2�""��	-{-=��7�/��DdG�#~�����>����-�������P�:�����{I�f�i[���-���^:K?��+:�	��(zg�H`S�ls���W����J���PZv��������;D�q�(��:�n��/����eS�b?�R�V5���P$sj�sr����	��:iU�	��p�z�����k~14�WA�
���Y�~��v�MS����1!�6��-����W�ly�\r����e9�.�Y�D����A�+��VUW1��ys����?=����5�L�2m�&�'`0�������Xs�B�)>/
���=�I�Q4��mc�_<������B�CL����Z�	������ZT�y����f��yc[
���r#��(�J���V��n��[X��G"p0D�����YP�����>���=���_|s���G�m\�^s�	.s�l����������4�Q����j�����'������� m�uX�����6��(:�E)u�d���w�	��s4�~NW�0�]<��'�P��������_<T7$s6x�|/��F��|#��"F���i�VY��u �6\�&:��*$|7���b��D�
��a|�pQ6x(;��zfRvV��9zY�����P�XK�e�� ������
SB ������3�e�SchG2���|K���Q!�������
��h�f �n�#���n�b���_/E��S�i{����zx��lG%��a�
�X�	m�s��F� |:���'J�f�{Ul���_n�?���f�5c�\�W�~��\���O��_���i�|�=]�����M�Y�RiAXmrB,4�������YL��5�������������G�j�T�l�(��!*��N��,���t���g+�E�O���`��x���@��|^6�U�5^�����#���B�m7�=�����Ql'?n���wE#���z#_8���-�C�%2�����K����\����$"-�C�����_��������P���6g��0Q��5�������b2"\RZ��cXc�h�4b{���R�0	9��h��D=mN���X{^Dy����6g�hU
�b��u��TNBi�U�/���f��ug��A�s/r��	�c�A9}L��AV0Z�)e�qPJE������p����	�������O�������C����v ���,��O�4���O%�[��*���F70���ZZ�� q�~�R��j2�UB���T��J^�ih��������	T\QM���`���Y�%|I��C���pB����o,w'�i��H]�@���O���c�s�wXK<��Y��5�A
+e�a�|��jOB��HKx�vBBz�3$��C�<����Fh������[0�9#2��#���$���
�������qK2���XV��%���iy��MxHO�',��^k�Q}�+'<���i���a���Q�2�>��&>$������K����h����p��a[�9o�����F�F�J��������X�����W���L�q��lY��1���
�)jW&����		��"��E�y�C�Q�.��5R���L1�TAT�.��>c7e��\�\@����>V`%'�^<�����q�A�����1i�;���"�!1�)��<j�0�~�|�b�c��5��|B��<`��^��1������8a15��g9�|�*gh5��dh���+�7��W� �]��MlUp���$�J	������n@Y�[��Z/���e��e�@o/� b��N\����e�
�]6]��dv*���umI�"0�<~��K�L��m�����c�����?��@�f�/����{�6�5���	�m/��x�edE����b7�rr���\F��5�Z}Q=�t�mi�m��"� ��&5]���n���pC����{��v3u�����v�k����*���{}��������j�xMZ��*��tV9M"���\��|�H��x$��v��r%���[b�o����~��A��\
x_.��$7
4����{����|	�Xa���fO��S�_�;�O��������i��]�A����T���J	��/�j
7Wr`K��L�;y�W�����s#�f����������k���'<uo�n�S�n������Mu�i/�w��g;N���}q��Z`�G'�������X�����t�����$�,����b��P�>�$eB^�3����bm���~n8uv �����#:��
��9�HfA5�o�Dot�������z/�B��)=�����,����H@Dz�\�)*j�\H�"�J���pF�(����F+�y��_�g�B	��>����#��u�m�����vz�0�s����?�w�N8�vz�x���nf��'��d����������!�>1��z�����r���R`�B�1
��F?��Q���=�`v�������v��0��%�Ij���8����)K��
�Z%a"�zy[�)�e. Xg�J�f�7i�\8���2�]�����kh[Wx�[�������wD�l$`L��:��H�lVi�iq��4"�_��\��>`�"uv[�����`���D�:g�0�@Gzz>���7�E;�R!��4�!/E�B
h�~�n���o��i�gQ���f(���B���HsP���_�"_�o�5��H&��n$��������j�ub�Q	X��g#�i�3����X��;���n�^!ou�g�����z�s��^`k0��a���4��� �k�����
W+�x����b���1cr�F�*R5�4�v|�����]y0����5����iIg�^Y��Vw�%��;U<1'A=�)u�97.�k�[���'8��n�}f8��-�g[f�`��Wv�b������u�����W���x��O������i���]�U�����E���<��Q���sJ�������y����A����~���e���
���$�) )���e{gm�w�u
'�����OB��B���W��jU�iGB!&��V*�����7%�t|��G����p
e�����R��,��Y8d�.�B� �,0�>��!�G�{M����NNe��tU�|�
�8<G�����k�8����5���c�����T���m����(�,�P���uG��E�.���h��2�\Oy���C���h
v�g�O�&���6
'M.ye^������o���Pz��~D����xo������3j�!��l�H�f��������
���EaNp�e��4�!�"��J�C)�&F�0�v�Sy��1��)���1k�i��s�%���l�G�	.���qh&AP��Ll����������8^o	�:�o[�0&S�am�zR�L9�<5n
S�$��i��u�� �����zcyB��~���m�����|�x�=�P����Y��J>��I���m������nd�.4���th�j��|��z�9m:a��T8����FT�����IQ1S@S�,�N��3
�����Au��vh�\�L������u�`4G���_�5�!�������J("���<g������t�����\���$�0%���M�T��twQ�����*����w�lH�
���S�*6��W�d�����QP���%'�����_��_��������������E;����v���c���j�=���^��E��a��`����[}C��E�+�H���Z��7����t����$�	%�:mn�&���M�w,���kPZp�D�MU�3K
q�{�����i���{w"<*��
��$q�~�R�����{^ZZ5W�t�f����<f�JUMIf�X�E�Y��+��%H<m���/pi����X��������/0��=���by!����^\>||��]��/Nn���7?����;��}�{�}�W����������O��C��_����m~?z�D�R����1|t������n�{~��{7=��0��C�y���O�,���F���So�*�����E��3��p����:@B���<lj�k��- �n����[L�$�����G��_,Z������_��O�����7��N����N�������y�)�?���{y������O��}�����v(kMt'T4DR�*���:���o*~	W�_������o�D���.��r��I��Q
����
�N}V;l>�=���:�X��
������g�����v�-��H��,��"YO��������W���#�	�7	o��Z�O(����r�6�Il�����*��2�c~�:�M�)�Z	WF,���Z�����s����7 ����m��Go�`���������
I];��/g�P`Jw�CF�`}	k�J��a�FZ�<���QU�!p%��D`V�D�;F\����n��e��@�7nd�OG��-"�p�F����\��}���Rv�8���{�i�VB�h�
I������\)=�A���re���2^^A�\=�����������u��-����)������S�7��?c9�Q�?D��,woG�]Z^����@��O�/���`�jOpJ-�c6d�+&���V��o�8�E�H��+����wf��2C\��=f����aPA������9�6� ��7f��������u�"tXq�:�8��YXbb�fV��d<GG��a#SA��4�T��%�h���Tas��UV�I�b���!�_=MsA�H)�[[�����J4�S�wj��	0������k,�K������@:�1�X't�Iq�9&�jr�lA��s��q��R���6��W�Y��|�`�(
/M�`�O}��W���8u�>�W�g��L.*��`�oF�������$y�R�T*��p�����z�V�:���{Y+��8I�.�SS��n�r�e��I:�P�
�.$�����	f9�nm�G�mY?�i�X��X|<.e��v�28"y�m(�U���>cmqS��O0�(����hl��T�s��T�*	,��q>r�
_�4��D*�k[�_�l,�`��z0�!b�����UA	��E�����l��<}��
�f��3������j]�����
��NO�i�;�%���O���7������\
�������E��v���K��A=ep�rEb���l�[)�:P�P=&�[��w}G�[����v/���L�'}}�"�zD�4�/����'��b�Mr�N�6�0��;+�����������g���#8��b9�+�����2*��EI0"��qD������k,��jGy�-�}�+.��AdyZ-&���G�����������1�����3
�&`����y�*�>�|&�w�3n��N���R������pE�*�ix8-�
���S�����1�:3��z&��O��c�Yf�P	�vT>����^<���:������&����W������]d����J�n
��\�������F0�1w�Z�0�Au_K����{B�N�q���R������21��a���������)"6�������I$V�65��q��^����"wc�A.X��I��aD��l�0��aC��E"���Ap..�fb	m��		������n���'b\[�c�����^"=����F�}G�����t[���	��fbu�[������s������XM��N��Lg$hjMjP���d����ro���M��������v7��N������r��"\�Ek��v1����;)�jA���uZ�K+�������`�F��e����y�Y�=�V8�u���r����Q��ZUS){eAb�%�4���#T0�m0���:.������)��l5�X��DIOV����oI�u�>�P�����[�����1��#�v��%�4��3���W�yo���8�	9�{K���-���g���������x�����U��j����(^����8)�W�mhx%�����t��n������,�������3M����j;�H�KH���H;l�Kx����7���2d_���lA4,c�+�r��%���}�2���>�Z�k��������G*��	�V�����!��mff�s��&���{?��T1'�va�p���w�����t���z�}|:g����h�\����MWeZ�U��r���DB7K�*LDZI&������w�R,����C2�:����<y��������;���x�����T\����f����U\%&�E-H��+o"#���
����6U8���fV������[��2�I�����Q�Z���JSa	��0��:[��.���
17N8�h��p��*�v�:�amm�RB�P����Zu������]]�'5B������O!���!������U��q�_��)m5b�{��-?��V�t��3���A�w����vr����7�/^�|?*I�*!��%��S
!��%�:M����E�L.jn�uFo���7��1b$\5P�&C��p��J�x�J��I�)G��>�� m���U��_����JFkT���P������"gRH39�v$�������%
��_i�:�$B�=�m�����H����&��p�zi@g����s���$�xrz�s��F� �wg���}�P����a�S���K�<�;��\����
�~���DK:���|���_���B���B5J��1��������6�cG��T�	.����a�����h���0v����l�g/
yt�%��t��d	o��Ax))l8���&��A��A�KihyRri��[�$w����>~���o��a��d#R'����I��J���W�jF!�A������J�=\[���gwbw�AQ\��T�f7'V�a��� ��Dy���n�05HVu'q9��'"�����74�����3"@=�IK"��B����g�Z��������Ny��'�<+���)�����D7���>^_�(/m����T�5W�U]���K����+��h�������)����������=��md�xL�L{��'�8TF�I�P�z��t�{/j4+�Sh�����,�a�]	s���=�� .������?8�GNc��E<�%n(u���������;�3���T����%m��w�s��o.��G��_����=Z�\��4&�p�������h���~�V�����pm�������_�]������Gi��q��!�v���5[`�g�����6�K�4�e�
&U�	�����V��6t��z�3�T�&4���4qa�����3�5��y�cO�s�i4�M;Q����D��
.��V,�hh�@V�����:��i�BB7������T�GY�V���
I�[�b�Y
Whw� 4�ZN+N���4�JV�����m.��}}�L���^�������q8E��Q3)(��]k�.���i3oThw�TvRn�y�s���t���m�����_�P.���Pv,����vY[4��$���]Qp�X���D5)�T�v��:@YI������n�m|���#|����v��~��=�'�``!�C�Q*�b����U�Q���0�**n8���,��NY8��iI���{��� x:Q�@1V�f��
h�^��%?(��������V(�U
���������CbK�{�I�P�d{�|����z���s[q�'.�^�l:���L���"n�K��{_����e�(�qI���:q�<���#)��EV3a&���j	$s2��������M���z�SI�������I�mm��O0@MF�}��<�-����P�����0��b�}_�9F
�g����`sd�_��SYS�cm���KP
p�%8�zw��5R�� k�N��B`������!�;������tUn��U_�[]!��p�������4��U�ha��e�(��S������^����r����?��g����
{u};u����e�>�n��P����s�Mq���S�-���G5�;�q���_g��������!��CW�O5��ZO��1�4E��n^*[�.3���K@����8V����M���^|)�QF���(�7�%'D���/Q�_��k��b 5���8G&���$��|��I��3�
^�b�|C��������y��x��S�&��X��T��mq�E�<�[�b�3Q1
��o��D�7he�&���q�����U��2��NF}�������+���F�&fFF�G��tT�-Wy����i<J��(�A��p�#� $V���D(����N�Y��6����^���/^����P
g@9��*8��Y���2axx%��Im��/�����W8�@�@`�h;��')#� ���u�A���"��QR�@��dz?��Ah������w�Pe5L�[��4��&�,�R�8�ZxT&aA@D�Jq#��4��F���z����M�8\�~����P!�����Z?���g�.��}G����%��\�5:?G�����qMv��r�h�x`���Y���x��F�KuKb(�v<Gv����=���1]�[���Zp=���+J���a��F��x�Y@G����������J�CZ!6.P�:N��%/�I
�,���opTs�}���������Z���S�'�� �Y�*Q4�����&��ZK�^����7�������W;�6��e`��G��5��`Qe:���TL7|��S$��� ����xc���PB6����)	��Ap���m;�.��WZ��Js��������������\���J����}�]���^�O>��Q��Y�1�B��?����S�
��v���=
��(�
���}��V�{�3�)�Qzd������D����g������O@B��Cm������^���C�(AA2�������n����t����>?U�VH�/ ����m,O��LS�#��2gg{��w���f���M��11A�s�kKR��f�����|F��7?�&R3J5=��TC|�8���V-U�m�f\?��o�Q�m�|u2��9�2J��-f�M�J�j��J�ukE��r5�<~����\����0�*(�~��`a�jT��?I�5����C�`AZ?~L/��������z�q}e��j�XR��Eh�*'Kl���������f�/n�Em4�3���"/���P"��R^)���)z������00��n��6P��m��KQ{n�(�|�����
gg��S�~�/<���"���_$3>�W��������nG1���������0i��L��MoF����k4���n��.b|G������6��q�A�	4���J�F�a�MA?L\|�m(�}t��Y|� ��S6�Y��]�W���/&M%����QB%�{������y��TS���a�%�F����B0����M�#�s[�r�7^>������?��q�
���~��y�z���7%%U�W����x��?�>�\>�����r���� ��Q����������<���<v��%�p�)N�&������6)�d��"���0:P��T�_������T��#	'�g������bj�{���Y1���l26��q2���z����C�6�E�BJ n�H h���j: 9�����C���N���	����]d��w���|V��4�(��@������_rk�M��X ���`�]�N�c��n� ��@� �x���_�9|�`$��[4������Fb���H�����0����J���i��v��[f+Xf�U^-O��n�]N�\�	�Um��Sp�������4��~��Uo�P�7@���=�xWjDr���Wo��j�<H�6�\�`����������Q���XZ��uxx�:L�eL�F�aX�RaT����w�~�����,V����e�@�>�Z��zgXu.Y"O%�{�LBV�;�E��o)fY��$�>����)$1��&��j<~�sS2RLc���rY�k��=x����7Qc���?/>����/���O�^��yo�_�������4�t!�3b\Q���)��,ob�_VK����w�nb�m��Tp������Gz_��|:\�����
�4a��e�g��D��������Z@23����O{h�0�9#�G���ey<l�'����y�N��i'0�� ���q(=/o���~�����N�C������hv/�h��q8E�8�q!��6��lxI�Z�>o`���#����x_��a�Z��3O3lI;*�~�K��4�#+9{=Z
�XS�0�![�c�H����p����D������9q�rK62AS�-R�"ZX�x���W7��5U�%�x��%���m$&���m��GC�NX,�����gZ��Rz���XM�`[��gSL���w����{?�����cjL�_S�c���>��O��yw'��n,�t>2�Qj�|�A���c���@��0�)�emwg���A����:�p^q�RS[�=���w�`��k��C�:sir��
\H�k�=��;�r���p}+�L�r��?��s��	���aU;
��=��[ b������e����{X,2�c77G�A���4��������<�=o�I1�����w�?y����G�W����V���������+��@���s&�,R*�R����8s��R3�2Q0hA�Dpo���/#Z^3�n*C�m7������[�Yu)��_�@������(\.�<��'Rx�D���
�S�G|uy�7���������;���AKE�4��e1x9���Q%z�J	$��#��	����\D�/�W�:]�MCp��@)�������qW�
�}k�u6�B�]1E.�N�#�hbcq�i�2b�6���Qb����
�!�Ew.{+�zL�d�<����Eq�51���j��v��]�M8�jm!*��rW�=�:K�}��SQt��k_����f-��������z��������1v��1���j�E�;"8��Qm���n����G�����iy�^�w�W�*�C����3��Z3qr��1����������[>�E�}A����0k��/���}����\|��������>��e�;���\=��P��4%�	F9v�._������y�-J!��L���)S���bec�
yf������`�sq,�oj3�������G����!���)��#���7���Y�W����D�A��e��wL�%e�����8~z ���'�;&���������].b�S��i�F,�6��i�<V��E�����������������F�<��15�>Z������Iwh��p$����l��DFi�u��+��.����!��tU}�k�QS������4
8 ��J�0a2I0m�/�Y����j^b�� �PU-������pJ1��N��[���nu����m�U�����5�_���s�eSU�I,���Fr�7d��������
������o�|!3���)j*����+�g9_��f���CETJ	����;R�u�����9fTC*��� }��Q�S��Aw�o��s��t��g�}�����<��[Q&���?wW1����797��/�mJLZ"�bq*�k�����b�e�8|�)�,�+�Xxc��3JvSR����6��������N$��ZRN�����C�V5?��$CFX#eB���=z�PB��a��3K@R�m�5]4
���4!-�hb���N�[��J	�"��K��p1�'	�v���cH���z���L$>�cd���>���b`�\�%�(	y^
���������T���_>SB���J��"L��<T�4'�^z_���a���[��B�I������,��
�G�S�\H�p�=9qH��!�����**n���,��m;"Cs,�F�%�
Z+��].)�p������_���A���������/_]����G������y����������k�L���:F��G���s���,ib����0:�$�LN�M��,~�8Sf
����y9��1�+
����g���$$�07��5�h��w�P/-��;��������_�����������V�����\�W�B)���k��*!�:�n��f�����*CZt��%���o�
��m�M�M���:"����Fw���2y��O�fO����Y�}�'���-�3��n��imWgXoj��W�%����HT�Q$���L�����^j�y����������`b���`�gy
��u���TT�9M?����_�H������k�f��� �W�E��@�i��l�X0QM�lDr@%o�p���2W�S��4��=b~�iE�|a�%k^���'z���g5\��9H.���D�P)�i���kUJM�*M���r>��O,e��1%�Q�/��SC������l'���������4qm�A���.v�����!��b������s�@��@a�tM:o��6
*�eCE��}���"����$���v�C�$2MO<����5���`�6��Q�d�h=Z�'��2�)U���n��4(+4��t���%�T�1F���6���2v��q|��WkkRk�2� �r�?3��
*���Z5��bMG�Y6�����TL�� (:P)`o�D�3�P���!�6�	U��<~J:+�������	��(T���HCj��T�PQ����HF��R���}�.B��7���;��m�~������V������	���+0���%�^�]�G�`_R��#��0�5�b�0�W�:Y���"���/F��G�����v\��-��lD[�v�&s�c'=�w�����=���~������#2Z��b��k�QNUAg��)�`s�X�I�T�)jz����p�M%�h��)
�X������t�#�F�zJ��V�G�'�������BR�6���4��4�c���{pa��ry���J	h�BEY@�+\�e>l�� �R`k��
���+����B���������\���h6�l;r��>��F��� ���L
��e�{k�i��TCz��
������b�Zh%�c���h�2!$����ji�
'K�} ���8�2oYc���R��F�f~�i�����
�ezr���,EL=���.$�]����������J�f�5#����������&�i�,w��O�W�%�TV��(����sRWL?��$.�	3�I\�)���%W��R _o�E����\�j�@���ib���U���Z,���EX�(����!r�6��\��	��B�A}u(�R�R)�_����s�_p7&���9��h�{pk;E��}������t������s���[�@7<�����VX������2\&��(]y�S�� �Dx����}@��W&|���o?��u�t��^�xUt��f8i�Tr���4K�F]L�&!����+���h���_D�����k��7� 
pF�SRD������z��X�,}!X-�I�WMbG!�m,�����)��H�e��4z��=�������8�@auC�%�i�kB���1U�>�DL�F&b�����P8+��*Z�q��<H/�G�������<}�����K�z�����_|��"�=�O~E�AK����$!��������y��qB��@&F8<Bb���B��F�{�""Q����W�;���(�29J;A�������yS(x����)t�����~��k�2'�-��1MUH"}l' :�
�J.N�JW=��j�
������m�]o��"�R��NP���^m�I�6J}��EksE�g'�B@��Z)���X�zd*�������+Hi�*��)S�N��-�(w����w���J�i{�n����&o�_J�!!����MD���yrP����b{���,)�x��X�0���|4���a��BeN����a�%"���,VA�'������P��1�L���X9�aS_5�`8����*(r"T���I���f/��N�W�6�dB�=tJ_�<������|}.X�3�������)�.����}G��E��0��J4i9I�z����F9��-o/H+���q�^���\�\2����:�4<��P�F�V�.�@�9��J��*��E���S~�K�:'/W�&aj��~y?�&�Mk���s~[�����R�G�����9E�|���l���pnw���p�����-nYv�&��!���e���PU�7�b���>�t�C��U��lSz4T�>.q�O=<;���#?�T�4	�3l^f����&��g��(��-l#c���Z�F� ?hDB�S�j�&��2��N��@ib��S`�Wok�O\�N�q���<����]����>��QjD������n^\^��L,P|������{��C�}����o}�����?|�����j@'��g*F�Gc��c���W_<�>��� �G�n{4��apV�R����?DY`/;�,�7+�>��/�8��,�d9Q9B?�B����e�\����*	W	�v������k3�7�g�+?��������'�~���x���v��.��o��{yC����z����]<yz���g�}3J}�'��������_|�Zw@^�����u��:�~Ws���/��%������G|:yJ���yT$����GNqV�s�>���^pqf�S����GY}j[����\y^��������\	����������{�b�'�������>�+��O8�O�f�=��}����+��W�������3c1/KT��F&y�M�{J�T}��(y��F���q�M�"�}�E�C���i��7��X6��t��FR�%a�;*�k��[�$Q�e7��K���g^��(	�
��5&fD
��V�������N)�;��5���n�db�B��	*t���G�l�|C�eLW�+��;���j#��T����DGmW�/_���e+v<���>�Y<����OG�6�Z+-����\��_0���f+�]���G��y7q�:
L�	7���g����i��������������� Ge�5�86���:����J�C�e�e��W����{�_�~�d��������?���m�z�i���'M��h��-x�u��5
�:����.�����R���X@,L,���A0	��cz����#��:�y�L����2�4^ITX�H&����q�Wx��=�(�FX�G�|�u����W/_]��	&�=5�����3��<SF[�u\��g�X��J�"�V�*~����'	�cx� ��~�>��������|��M>������A���,Y�09"�a8��3����/�����+za
\9�enW���[���rC��g]t�Z��|�/���V�F�\k����H��T���w$R����B#�
�����6��#�m����r�������Y��K�*��-t�����^��W����t�����yq���~��{�"�]k�R�7	k���0�k�8�b��2����(��pn�*J��Hb���PB6����)	�Z
���	G���$T�����D�P�S��b��b��6B^�������Z�F�;G�������q�U������`��C�s����g�}�8��n8m&��~�)���|I-a�4�����*8�X��9�8o��n)3����
������W7��^]wD�tg�"�Y

oM�f�%�
����c�����]
�����leP/�^`x$j���;#i��	�?D����"�":���|�n|�)��5iZ�1h���D&��R+<�R�������5	C�����v���Y.��
����8{yL���p)����qh���]��IaQ���fN�d��6S�li3�C��������������w�X��~������'
��H�I���K�]b�t�_�����^���;��V)	���%���x�F8����`��������/=� ��h��U��gz4o'�����A�(�4��[>��C�P���"�P`�d�
g��I������t���N��U��An.u`�z:�N2����Dg�Xy'�&y�n������/�H7Z�
']'�����n���j;+����t��G�x&��z����N�m��}j�������4���P�g��?�?���������~���/��������������#����5.���������I@���m��~m8����%>��N���rY��)d��8RMS�V\NZL�#�E�+����gB��	�����Ek�B��wo|�!���#��{u��C#�o5,��
��;A�p�5�V�9�@�"P�`�6��Pt�BpnC�\f�R_9X��E������`���G�� Rq�x,�\2���`�����g��p����>{��O�:�E���~d��A���E#A�`�$�o���� ������R��Z�;���Ir@0<K�)�D�F���T*d�kd'�c�������0xHY�@���
�`�����v}o����f�2o(G;v�~����3'��n�v��&�����$u//W�d�w+w����j.9�6�V���W+�@$}���j��ju<�#_�m�_�v��_���A�j����:_�����eW�R���:�n1l����>Hw�,���vKW�i�s��� ��}�Z%���*�b����Z���
�Ud_=_��r�rn���:��{�����-xB��D���m��u����HP����_��A�{�2���Z�L��*�R��Yl��j��&b�F�>�[�Z)G�8������v�Z��Z����{��*W+���]�w7e!��=�;V
���������BS�_�3��i��.9����=��Fz�����x����@�y��\�����5����?�h�c:�f\�����>�1rW;��<�Q\B����V"���"��YQF���	8���m�2+/MI�Q^,i:��$����'��V9D��L9�7n�L�� !�<���[@�N�������y��vk1�Vj��z����s��<�_/��O�+�p�_��NZ�����\�1���N�DU)L�:VW���x ���N�Z�P�o�T�4.�		���������Z1���p��IyO��%�
g�w�r��n�mh*-�i��h�G/g��tu��hRx7��z*_�4oK����9J��^H��k	���0u	�D	�"�k���4���I*�9Dc9��'Y4o������AH������,F�s�?�wp�{?6.�n�	#�FD�����H`�;�l?/��(-�>�����������C��~��=���a��Pq�/iU���*�@�2<'�`����U��".p��P=[����!X�
,�H0%������g�m;�J��
�%L�q��ZY�	[����{%�C��r��2��
��������T�{8���|�"��E���������I�m����^�������o��.�*�1Y�����D>v�a�x�H��Z�������/�x���.�O�y�����+[���d���FV+�\Q� D�������l��@a�p��+0,��T�����/��ULKv�h��%��8�K�W'�W���.�|�����>h���������������������p��p+nm'c~x����,�1��gv��=�+J\Ie\��t��/�O=��>z��(�Z���u[0a�,D#xoFfvF��-!S��F��yrT��3��=�F�8Bc��a��z�A�&1`����!���h �C��s���G��#���G��&������y'C�F�_x���n*�m��X |z�}u
�,��|��J�;�h*��!E����tC����)x�d�.��������o~�����?�Z��o/����H5��Al0������� �o����g���9p"�&�Nx���r�B�E��'���Zw>����w���x#�+l�Z�* �q��	�tR��W�}�C4:��l�_7�<��Z�_9������K0_D�#�kN�5;t��������$/�������_��I�-�&6�;��]��|�?�����|�U/�#�����p���w�|�jr9�����p������*ph�s,i��&�>;J�2��e�����D\A9e4!5��i��j�z���F0�w6���2;X�8p����M�=�t�,�����8����A�)��lJ��aA�����VO_�������1�!��3��IH\o�pv?����.��F��q��]���O�����a����3���F�j�MJ��w�����w�<��,��fOjA��:��Z+�o�<,�����6��S��:G��l��JS���m�6�u�7z #�7�0���y&t�g�ysyV��4���Z��8�<b@�yF�W�Q�u4w�3��j��P��������?������.���1�<�RT�/���TB�=������8�%����-��K���K<��8��O�&����(��;���s��Fc���%���1�
-�FzM��f%-��S���|�\�Z��1�0��rG�C
�������C�b��1�j�����������^��Y���g�eg�T|��y��F�P�{C���/U_�K<�~FS'u�g�`��kb�-�r�
l:^�
W�KJ���-M��<@/��x������_�8Z����w?\��E-������>s����������`<�����=����D�v3c�i*h����zc�i,���q�%��Z��&��&�O��������(����UR��|�a��V%��s�fz����Gd����h#>�P%���&S9po2�]:T<r�V�R|U'D���q���'],���_����!Pv����3EXn2��w���|�v hnQ�sV�7W��=�m�(Iy�H4�yWO����3E�b����"�8��.z����������&E����bB��#pN��-�aA��?�f���7��X��g��$mb�����s���J�����E��bw�]r'�$N&
*-��G&KL��x�HOK�l1
����'*�|��e9@���J��@���F��rD~I���&nB�x9h���s���/�����*Y��o�u������z����q8��'�4��z�7�gd��J�"!G�aQ(\j!]
)��\������2l.��<�<���������/���O1��f�<�&���nkjL��Y��aI6%�\Sq��Q����%��E����
���(�P-5��'�\[nN�wy$VQW�	U��g�J_5��&�gf�8���\��Z�FD�h����[&�(�jaP�Lh���)���c���<���e�F���(�3AvL;����bD��s���@����`�����\���M!��o�v���yj��Z���0�\�K�B!aD����$��]��������a�U�a2[Z}T��X(��>�e�S�J���h&�8�����M+�PU����,��v�����������b ��E��R�C�v�������m�UkK���T��}��!=�~d�y�����k0����� ((��s�����f��+�S#�-,zjh�2���Xh��|]M^��R�grR�����J�b^�(�g����E�Scr�<�/���m��GvC�\X,�z�F���
�����e���3��.�P��|��`\��}�F��U�;��Uh�%gqQ�[��V�W|�t���Y]�DA�	?���"=(]31c\������T���d�4�N1��n������=(��������9���N���uc/����#@pm�D�x���_]�����8 g��;>��������?I[y$L��8����?�2
�����l�("��#fl�+O}Q��>���T�Q8?��������!!���~����'���}u��z��$b���n��(e�|BN�w�x�v��
����b=Q��(�����8�B_4Q����$u�����3�s=�l���7K���V��hX�0I����e�H+�`uf���1����[B��;�
M��1PV�n�D�3��T�%�������'�'����/���V��
h����Q���h<-���Z�]j=��x���3�����7����x��N�fO���{a�5H.�
rTc���?V ���*E�p��nB�������
8��p�A���BzhrDO��< ��>�E|�~|�����/_]����GT�+>�Wq�xy���wW7R��[�<I��VcA�~*������Xk��Ii���T�o���
�Si���Y�LI�����	0�$�&�&�my���(@��Q��Q��������(��r>�TRV���� N��NS�fr��I�s-j���Gd�[#���!O�D�(G�,�3�J��g�rn�`�q��"A[�ZK��[��3���.����	�',�'�����(��OXS8
����f['<�v�'��F<;qG$^
�/��M�J/z\�`��lB@�W�>h�������`��cx��ftU(�����t���<�F���3�:s���������F/@b��<`f%J���"m/���Y�j�\[�Q��{�k�����'�GG��hnK�/,��"���C��*�p���4�N�{���Z�c���8��?w!���L���I<��\���������2?1�R�`���E�6D��{/s�Tl$W
���^�B�&2�Mq_��U�8k�7��}o;�w:2��c�y��jm��������)��M�YA���r�4����G�7���e(�py��~����'*�P��������%#��*������;E[��h'����8-�J��p���y0��ci�7��D��q�����.:�F>���NGr�6���O��'���C��.���;��=��E�TXB�&b��,W����:%}�N��Q��E$��\�E&��|����Z�p>0;����T���az����>W�r�(v�%Z������*E���4���sf�)��-:4U"�@C\�e��!�3����?��6��p��1���!��N�p��1�B	WS.���`����Mdj8�a���D6��)\	J������z������/�j�)��
�!�n[h�r�v2H���0�&��8�m���	c�cm�a����-��p�0�"����D(G��	�y��$U~���Q�����$�p��Ec�f��a���bZ��*����nq-O���('do���|���S�
����Z�����g;;m�gd����i�L\��E�iW���������|��>V���6���"������0a�U/��
we��~�])��F	wu�W?_�o�w���6G]���S�QJV��q5��KC��$����D�KwQ]h��#�RR
���x��P.���%��[�+�!ez��E�j.�PJ�����[x��G�|h���2�$[e��~z�<w�Mp�k�+g����Cy
�����^Hhw�i��~�@�@Q��x����ATQ�5U6�),������7���������q$P���h{,p�q���RL��`|	M��+�UB���w�������{��9<S������Rv����4Z����Q���@���������<G	����U~Q�����?E[����S������'������
PH�i���2?����=z����T��(A�!��l�S
����a�|HAk����o���P�f#�
@
�}E,��\�!��D��4&��E��������KA}�5."�J6�4@�K4���Y�gi��:�h����a�5=�,4�E���#�[J�����1�����%���`�V�{�?�5n��l�5��]��$'����h��$���K���0n���Gi��z��N�&��1�4�q���\g������m���R(E�����E���/9��P��v&�����H(&�$Q/���#���aU�o-~
��=���Ol�����	8�&�=�6�F��1�0c�;��v�o�7q���a~�C�T�7���C�wg+3�����%��2�L��>�Q��f/�k��G�t�RT�nai�p��U���Q�$�n�4k(yV/��z�!M�cg�]!R�P��O��j*�S�g}����i��kc��:�-��#o
=��������8F�qT�K��Q7E-b�.�3�����������:��U���@�/��y�;�-}��?{�������!-~n-��8*7)��E����=���+V/��H^�$�����D-���������X���v���z~i��������P�g��	��!#��\|�g� uZ�lLuO2'n��c��:���F��&8�����s����;���?hy�'X�=�"��w�Fr-�,�����"���$�������-������0A���U�x"
���}@a1-=���W/�~�m�.(���q�Q�\&n�zy_����*��k��@�t���O���Z��*FZn<�����s���SC��V^ZL���md��f���$Rnp�v���?Y�\���hG��Fr!
w�@$
�&�������SA>$�E�ZE�|���{����gbGI��	7�7e�0�{a�J��\"l��ktX~R�Fp�M��/�r��� �Q��=�����?y�(�����O=y��i������E�{��(�����[�C�!��u��*���r�)��	��p�X�~Q##�:��!��d��,.(�\�^b��><#�t&�	����<x��Z�A����
?~�0�8A��22	8�7�XUaJ��iP��i�sXMh�����9���1�	�OD�����	�@��Q�,iy^��*m;�-�;B�Q����J<����G�q�1��t�,g3D8���S�,y]t���e�����s5O7�T�O`\��f�Q����������M��P�>�M����aO��:3D�o��R8�<��!��q��F����XU�0_,V�o����&��+q-��5�������7�&c=�U�jG6g#*�������=Uy��WS�n�i�#g(�T�'\����~��us'�M�����9h���J�+l:]L�f����5����iIg�_�v�VO�����;��bZ4N�M�5#:1qB��A��Q"���Q��������!;2s�`��d�a�
.#m�Q8���k-����|[��@�4M�J�9�s��y��������w��F������U��&�s"����(E�������+����0�S��L#��I��R�4����@SCcG�i��7?M�!MS�2`�����c�Y��\��P�N��%R��]�`Qgw�66��@�:�Z,��l�W�Z��p���?�?���?w���f������x�O��N�fO���i?w_5�h���8��5T��r�3����
�8,o�5���x
e,K�d�q�����M����},��0��fI~��*�[Q�Q�A�j/-.9�f��#�&HK$��Y�vD��+���Q6,F5QM�����L��8��mX�@�o�|b���^i�\�v�8a��Y��ZTp��8A!���.�������b�w�\B�L��i���kP�Kp�B�TP�L����/�b����/��lV�����G�J7�<��h��D^=3_��+o���hT��W�6�a��8a�Y��0���E�E+�kJ���w	@�4ha��K'��n��6n���{L�{�iLe��*����?���K��u�5��Z�0�n�K�n[)�$��-m�+Zc
���R�:}������u��^|�!���|��dG��\��	��I����/����M/���D�����6h�h�W7��R�)�LbI�|y{�i�x!R]�(<�:g�B�������?�5\���?�#�f�FT�e��+S��?����;w-��x@��=<@��91�2*�,	�����BD"#��8i����T��N�I����$�Ke$C=uh'�Ba$z�h�F�+UGk����rx�����7eI&����,IQ���~��ZI�v�����w����4�����F�6��u��R(�fiW��<��<k$Ag���L���
T�E�KP	-�7��P�j���;��-����W���	�_~����-�u���J?��8���0�J�]�0��s��l��� hu��Z�9x�s�6��$����ez����86k���Q6e,Kq:E�GuS�h��s�l�t���;3�1�+�����RD6zS���!&���Uj��f�����|�|��T��X��N����h���	Cs�z�h�8m�Q
��h�>��'�NPM� 5��6����i��@����j$_M��Q7\,�����y��\�sbk�p~��G���-�>2�����4�H��r��9�O�[�_QG�!��$D%��^�S�N~���|=�_j���V��~���B�����8j�M��,j�)�&��������]|��Yq����5�E����5d+��C���$X�=7%�9�	��&����T���@�vc�vWbGJ��|����{��������__|��gO�|�o�����������Q��z��w�^�N��q���L��!� ���K�zo���|���&�!������q�O����q���Kkw����7�a*����ci\aiS���>Imf��q*���;�
$���I�~�����mN���$@����7*�s��F�EQ�Q��J�������]Z�J����p6��G���91CX*8'�H���C[B#wH�k'��(��a�v3'�,���C��#E��&��Q��
Q��'��8j������s�*Y�~s���_8o_����q�#lT�z�����v��_?~���G�q��[��^�a]kU�����oo�CX,Q�T7r���ZB���q�.��m���
���2�
��i�d�r,.�^F�^(�&���(b�.���C�V3���.��������u��W)����3��i���ba;V���D���1o����j)c6�<q� '[��"y���!��=����jh��&B��J�J$s��x�Z��r������}�	(|�Nj?��P�m%9�S�l�a����D��w')��pn����[|��$�{(*f�Y��]$�0�X-��S�(APi�A3C�������g
'�G���v&���tYa� �E`
�4��_���Lpr���*)�
��j�r�x���K'��%Z�&��f�I&i.Ng@	��Q�;_9���.�EB�aP��8�#k��DZ���b*n�� �)V��������W�s!D0	/L\�������+�M�V���L���e��3D��y��;I[x�i����'�@��V�QB����rFC��9�r����?4�y8�@w�i�`�<�G�������;b����`�Pw��,c�L��x�����7���C'#u(�������5����O�0hb����8��k��j��m������UG��x,~{��RI���L�����^����W��,C@�����Dh�r�������4�gX��X�������.���U����$xv���O-��Y�yG�����/�vMd�(��@	0�.S��w����%x��{���)��{-��8�����8�__�� Ih��U�Bhy}�����rJ�)53v{QaL
_
�o1��L�����z'Y��pf��y*��C3�(Qe�m�n�,�%;bT��>�L��sN��i.��g���d���-������9�_q���m�?�Y�?E[x��iG������k�
�����
�A��'Ee�:��A��2�c��|/���?]�6D]t��>]tOu����E�y{�����_�V��D'c��lb��7\�8H�"H����(N�a��p�q5/r!E�Jy�t�����%y�>�A�M���1�F��Fh0p���s8M��q�M�`��m�L������uT��#[����r�j��Vp/�Y�B�����Dp������]Ug1k
f�`}=�R,q�����1Au�h����[E�Q��rZLnr[�'�s[�����Qh�s��wM"M�P��c���Qe��N�qq%����=������j)���t�^�|,!e_WC	��a��3�!kia��ftSZ/�����P�heN@�/_]��>8����+�G�XGa�M��s����������?����?u��:i[xzlj���w��e6��������t�f6�(�H��"�&v�������c����C�>�w�+F�B������������|����_~_C���c���<�?Q�g��5Z��S�����6a��s��CXk���h@�Ny��������z�T�r[;�n*�7���Q���H��JM�@�0�hq$�2�x���_���^f���*'���8)M�x�T��KL��7�'4�sKF��H�d��,���K3��6N��>)�������-����K����8
��o~(��m)�'�:�)���u����VR�U[�+�|
�c�;K������~������8P�����.���XhC.&�N�]n'�9��� i�y���p�)Q�'�f��5b���������f�;�e[_���?����������0�U�;g����qP��B}�^��Y�U�'.�i�GUm{�L�b=[��jXF�S]\#�7"/)X�@���6������~�-}U� �2���/���}����\|����8�\���O�GO����G�O���!W9<��?�P�C���_����/�������r>��R��s=���Q����Lk�����_c�?��?r������/8G;�q4d��?'i���m�D(�����a���Bkf=�z�U�����G6@��z����g[���	\���<=~�Qp\�A�s(N��g
2L��t��l��4(��FsAc+$�
I+>M(zsz�L�k%)x����U���B�%q3&�l�%�<��b�Q�t�}%����$���7ma�]��3GY�2���_��{������@���Ne���(Ks�_��%<S�
w������|��v��(�{!k���h�p��U�������1��}c�5h���9��qp�u�^�*�y[SS���9?�Q��7�O���M�*o�Q�p�\��ND�]�:�Q�	A�r,�8bt�(}�^�N&6��Q*�J
Z!��	�k��cma@�>]!	y)��'.*$�G�J��`@�!���^�iHP�HY~��~WH�~�%x�~���o;��F�p�^	6Tz-���aC�����vd,��>��������N9nw�r�&�������oZ%cP��[T�NW�2$S�����Y�����!����'�L�31-�3��.d�aZ�3d�2q
21l��WKh{C&�Bb���1��B��w^��HE���3���8$����),�/��6.Zc� i�ni��M�=�y�����nwF����$��i�bx�3��GW[��������GI�����#S��i���L�9�w��j�k���e-��9��e���~�.5��#dP�T�&�����;S���	�@A1a��3Y�L(>3�^���O���W���j?�y{����o/|D��;4�._^|��a���9���&
��k�����`P�Q����Z)���gd��!���"������:�M0�g��&��;��UN��>{��\jm��prc~&�k�A49L	���Q��b8�[����
���5���=��%dP�g�%0��`@;�0�P�(�VH)�������4aP����h\��]��K�J�����=��k�?k�+*�������{��C�-��R��f���:��@i�u<��Z���P������6�	��B;8��������g��{�`�S7��sZ:5S�R�]�dM��E��,��[g�1g���������zL�zP.�p�>4�n�(����k��j��x��C;�`'�%�Fh�T�M���G����XW�<�*�g3W���i��pV��&}\��qK�[��\)��:�T��dJi�d>9�yJF
���(��JQ����F������x�Z�\�����}�L��/	8���&��>�8�r1�6f���P��*���Zx$��|7���m���������Pey�O�����������4�����{t0��&��S�]A�#�<���<�����C��D��>����.h$�8�.���v.(����o���U���R�P��>I;��X�NZ�>�-�R]2�YNi�]�cMT,e�����A&��i?�A��{`*�+2���~��|���������}|3�i������0Z�
A��	s����i+�<��s�U��
��b���������#�-��jDt�	lv�{M3�-���N����f��G��8G�XK<|���C uH�nf�)J�'
����y��sk����c�S���Y�r���|��g7\�����"���l�8���}�_)_Ji�>Ww�J�����m��:�~J����u)�Z�(55#5\�|���M���6	�<�l�����H��z8D9�Z����n��&-�^f0�����gF�X7R�	S�V����r%�����a���f�����5E�jLK����5����t��hmv�;���4�90����9��(��B������� �������2����?!��|�����J�Z�������d���|��5/FZX��"x9�b���D
����+n���U0�|Q=���X
5�c��1L	��A5��J��p�j��;F���G�����D*�5�1��8
 ��
!�1+n�%m�����VJ�-��
�L����R-��F�)�����O��Y��(@�����p�b����Y��u�
���������JI�PF�llD$�J��D4QE�������>������k�a�/�{���km3�����������VZ�?��U(���nW�
�R��OJ��*e������_����Un��~-rh��H�AK/��J�4�,�M���L�A�r�I�&&U��m���3�9^u�_�V0
0I�����s�i���o����"@���"�?1�����3��I��������,�#�`�4�����1���'���VmWf�N1�3|���R���A��������R)��"��4�)�W/������`�lD��:-�s�J��C�lJ�f�<$8�(cy������)����"v�e�0Jf�w<�%�3�F��>k�`���W��e����M���-��6����|��u����*F�Tv�U�(H"Q�l4��������Q�	Q���<�Ke�����k�1���Gk8���m����^�w�W�*�gbe��4T���SE.�T�)�uh�/���LY�(�������:d�2��Q�e�W�}��������<�3���N�V�:����_�F��x��X��	3�^���GP�U��n=����?�T��H7���Eg��2=��k�C{/����D�J���82��2Y��e�)<�r����r�������t��8�pr�B�vr���� ������0p��s*o��3R�D���`�<_����i��,Q)�h>�tZ[�0^�:�ym�LN��\��@�Q�Hk	���������}����0
q��nt:1�����W
o��{�����USw�O������_��R�na}�7��b��M`k��8�#
�E\K�zp6���U��Nte:C��D.)�A{�w���	�Z9Z�����*45�P������[�-���X!!�|��$^{�bl7�������N6��������FR>e�������J�7�x-��������;�3��@���������3Rru�pR��:��B����:��#9���� �������i�K��������T
��4m�����<=�}�����?��n�_5���hebt9�0n����nE`�	��������� d4b���d\
�	�S���#?���8�f��3����G��/����W��'�e�E����'>������T�j<\v���A�?am�he�������?��O��a(�h�����et`n#������vJ�?��Uo����g}����������C�}��s���������w��O9)������/��v�����'�n�����|�W��g�1�x�6���A������/��v���?c���[�6��?�\�{�A�������8(u��>I[��vh'����.48�����x�Z�m�������o:��;xt����(�r�����	��q+>������%��q��r1���&�|kA���*���n2m_.�!H��	�e����W�'�=H�*E���Wim�G��c�b���j���Gp��*�����*n�2wy�>8H��YrJ�>�i�uq�����h:K�L����0��O�?��������z.���i��[2Aoo'Oz�~�B�#�r�&��������9]��;&�c�0��J�e
�.D�H^%�M����~�*(k�D��m`���U2��Y.���o��I�����8'_��[���1�����K�?d�{��s��I��ayx;���r�%��5D��`Y�
&o���;�Wz#$!`�}��1���>T�A4]��C�A��'�=YWX�B�R�Rl�����Cy����B��X��!�	��� f������[�O�8�����z�o/Sl\��O�j��{8nqDm��������xp�������`p����u�j�V�TS+�=R���H��]���4�Tq��������6��4B���9wn�����G��1�H&R��m.
�kB�g�����M�e����<!8������"OF`2�K��~]�����i������m�����-�6W����V��g�����l�wo�mh�+�A��w�1�^$����8���k��!$	~�@�L��S���\T$��a����c-���&B�
���^�<�.�'�Z���77����>�����"���2������`W=KsS�u%�J|�n/�z������w����j(N�t��>��xx���C�[�[���0�n��:����X���l,��D�+^v!���Jb��Jy�NI�����{y���16GADS�-���\
m p�;��	\�h��G9��e!�����=������|�A�~������{C&��B���Fm�����������='��!�c�����j]%
Z�t�Q��U9dnIEi
e�I���FZj��n���#wGr���8O�?���B���O�V(�v��-&`������7~n�vn�����dw�!�����M��p@���b��>�k�C�������
� |Q�����l/�A�o�'#4�mfz�'z|VfF�+M"��gxU�Q��V7�EW��"��
����j��gEq����)�_&�Q�6o�Z�
�$��&+�_4��>��"f�t��E�5��x����W7/.�pu&"������G�=��!����������c���>�G�UD1`����������`��~�������=?�����gz�U����q��Q���N+�������f��o-�ln}\T��maP�rn�b!��u�-q��W'�'����~��Ge������.�=.����������^�"������?~v�������}����J?��2�t/�0�����'�u���A���K@����_"����; z�g��D���G�A��j|�gU8w��:��9�g�;��/l|�����<;l���u��(X����)zp�P+��������7����G��������������{�6{H�������3I5��}�
YK����~��(�*%���ot���L��C�%�&�|���<�
pc?F��x&)(T����S.�����\1o�k�����TK��$�p<����M���vAfA�2�>g�|��!�4��V
/��q&��k�!Z/
��h��d�v��6%���"�0D��G�|mc&)�Z��9�~/�P�jn���2J����A��cJ��pX���DRnp��8[/���b���z���������$�+Cq.J��N��#�k�>N%e�i��s������������	E���K���Hh�P�&
#�����;8�5�s6i�
�I���%���;G2�l-����� ,+��.���@*�C><�t���S���b$6����/�u���W��#�q���$�~�lPk^R.���&��y���0���0�(�������]�xC���nE,�$}mp$���H�De���v��#6�}������ �4��O�\���+J�L�XM�}���C�&<x+8���=���C������b>������J�Ic|�b��5�w�,�	��x���%A5`d�;\*`���:��(h��@*�^����]|{-�������� TW���������=E[u��n�r��'����s+u�M��a��4.��s^�����I��d�<NH"o�^F��S����3E����'&�9K�	���I�1bbT��r�������M)���h�&%|k�}L��$-h�JA�K������IJ_s%�[��@`�>��n�'�%oy�Uh��D��%h��,�'e�I�l���<A���\
��`��(=�����$�a�B+Ec*`�������{O�����l�����"R���M#�	�������6	�7����ih{
�"��hn��E�S3�V�/��������V�M(Xk�������}����k>JN
7�6����[�u
�������b8T�kT��\J�b�
UY%6��*c5��EQ��8�{L3"c�+eA�I��N�FF=� Q�v�IU(�]�;yB������&���K[N��Q���#u8BB��>m��������f!�p�6�d��Z����l��Q
�*���w�!��DNk(��r���*�e��v��p���@����4���X�UzST��6S��p���S��FA��_�Q���h1�����Q���Q�~}������	�9��
���S-��>��N�v��3������/��2� �nT��PI������*7(�i����V�v����c`�(�3�T�%�g���B;�/p��o��9����5)-��I�=S�Q�6!#2��I__ ����?�Z�����M��lb�x�[��@�����s�Y"%�M����D���W!��WA��8YBf��W��K7�Z���Vo������&r�6��0t���i��p�T����s��m�z_��������L	&��[n�/f���JA�s$2 ���Oh}s~1�t(��P3�h���^���I�A)���!��Ct X{���}��V3�PbP�����f�1��i�;���
4��M��@t���3�G@���9q���D2�s��v� 8�.�^{���"��^�F0N��\)Hf��������5b�R6�@����1)��p��vw_AB%����_V���{������p��|��0wI�$K��y�����Xn��p����7\C�rw
��\�_����]c$�A��R��gD�aU�����!QM�����L;����x���R{�kI�,�S�8�Npx��Nr[Zkv�Ok�i��>���IktR)�:�P�1es�Q��'��a]��z���&��Z��*�<M��&�\���	&X�(&�� ���-�a	�w�5�`�c����-����9d��J.�����G
����.���=x�?o������q�.��-��3��#;���s�[��8��b|��'<sf�i�l�*�k�r��VE:�@]E1�`�u�r)��/]-T?����,���Z�R�"V��B�����3X���>����_���}�ag����U���v�p������+��9��{��iY�jh�<IrD��!F`0��\]��I�����n�������h�O���NN>�z�
G4������P���{��_�
�������jR6}��R�	�(U��G�
��k�+N%����q��������e'��E���	��?=��j�v�;}lk��uR7&��h��C7���/�c��"�S,�������#W�p������<���J�����Z��*U���+bJ"<�h!��[�����ojK>k�Q
1�j�����-����67P['���R]0U
����>���z�,�����GO��Ul���j%$-��l!�=Pm����
R���b����I/�-d�XV>��\�~�PjE�i%/��U�r�.Q��X�k���,���t�����D�5��_�BC�����������~���X����B�|��R�{P��P�����%B@;����B�%;����
��@� ���;�e~�h&��_F)IdV;��"6�������_�Gl6.M�sp�:Z �e���������(�g�P��T_�&����h�3�m�v�'�����mi��q��b
R�"YpU�?�G���7��Q�;#���^c��>Ei�|��J��"��P�������q#��eg��@n��w03�lk&�'��������x�H���� ��o�dw��~�����y���Z�*��b=�o���683J��b�n��9wP�.�~eA�L�=(����!9�u�e�u�kn
����m����)����R�?1 0���|��p^�.��~��J[M��mo�����F�F���"G��eQ��/�c��z+��1��h��'h�jh#��=��A%����8��!����1����V�����u�c"�������1�����q�H������,��-]��v�|������rq��n5���r��[�6w�����Y��)%�
'���?I1}�u@�}�?z�G��\��A��h?���Z�IX��q���~0\A`+���|��S���� qW���=`z�B)y����%v����|_�=(&
�r;b���A����<]S���v�B;��u�w+?J�-4Z�wkuh�_�X[�k��f.U�k]R��^���@ �����y�W!���,mv[:��	���/Z�2��sR�qlp{n�6����*e��bC��K�!
���@h�\zsu������0
��[�����bZ.��u1�|6�h�z�3�x��L��CYX_�30{A����=��	�����J4e��0��(h��4�u�!+	A��0��Nh!�zj����jE2����kjB��btR
Wh_F����E;���
b�UE�u���v��f����'a2�X�+3�
:����Ri��TW���T���7iv)*���1�7F���q���t����(\�����J���kP�X����4�4C�[pFoS�`�r����1�2�5���PY�{#�%u�Y�qG��4�����b��g�����(���o��U�Bo��`�����(��rfHB�M)���&�sgtjZ��Ah���-	<����j��f�Mk�J�j���!��Q��.xFh
:��n��6i��-��H�1��z���;�t����i�����s�>p{���G�m��	�ZMv<y�C/}�N�c�L��
Cn���b��XDaF`0��t��x3�t��" 0���\m�{�m������?�"}y�c�q�&���v��	59���a_[�9
k	������cw��cN������h7��9��O��T������J�>X�b	f���"��5d?����i�/*Z�����|���%�r�9���2l�
}����TU�F�\!<�g)�����12���$�Cx���?�I f��\��?���%���;G����lg
�}l��z�tc<�g��������b�nb���>�v8|�4Lvh�%�iu��������:�^�@�>���Ojy��/�/������>�M#���o��:�L\*"����f�U��U���B��)s��SQ�^!��F�����O��E���Rw�����$��`�$��$�Q��d�h������Q���Fr%
���J�/���\��5�0�[�����H"���y���
g����-B�(����
9�d���&�m��������
(�M�����?t�<x�&���������<�]���3�5}���~��W�=��~g���O��:��R�5q��_�����9H��^�x
0���{��m�"@?�%�%�C
j+�w9��9����+�RN�B����R$�:h�����kx�U��3����Z:T'���1�XS)�,c��+�O#AG�:f�<n�;�gF�m?h0�-�>���1�%�/E������n\��8�I�]-\��������<#?5.Q�
6����/�Zu����ZXd�"��	��AN{�vM�cu���T	6��1�
C�fP�p�(C���������IuPx�V2qE���
b^�e"����\�LwFv
����b}�������<�a����{�M��[tpH�A�Pm�o�j����zTHHZ�<pp�;���^�BJFe����NC��M���#�F6�8����w��`��	H��q_6�bz����J�+l�]}�q�����v��nn�%��xd�.zL���b�*F���K���e���Gl�S1���,���e1�"9������&�ro[/�#O�=���4�)���Y_�%�%�s�����l���|�a�,���xeu����`h�A�w�S�t�������&"j��,={�������}���w�n{��+���X����FDm/��/��������I!�{ !��m,O�I5+A��Z�,_Zvu�4h;���2�W_U��F0O*�X��b�������5�+�������h��B�y�L�'oKc�(�_�X�\��� ��|�������
^S�+�4��*����}y$2X��n�+�0i��=��K��]�J*�h�g~��c��������@J��?tA������Ln��:]&w����[��d|���M���Dhw�|���!��v�w��3��t�M�����^*�����c�U�����"�����r����������3x���Oo��jg���w�/�_��������"
%7����Q��q�_m���������<���K���w���5��.�;�1��3�r�n��t���t�K��:����x7�4]������/����#��`)�CbRB����������?;+AZBG�����	^x���>,�-P�DH�EU������7��m��������-���M���� �&�^hM���)�����������:����y�����E�}��|��t������?������6i]�����L��!�E�R������r��"������\�]�q��%�o_����o�f"��=��cZW��7��K����0�c����$��]�G@��\����6����%���9�h�)�����+2����uh:X2+�^��eI�`������L���y�H�/�,��sp�����}�L������56��� ��(����;������I�T�o^'�=��:��F�:1���p��NBF�k��rIB��8�P����1tO����.�����D������i{���1�m�F[R��`� I���<w*)/~�1�r

xHR���r*@6�I��eQ�I���X0s��_���2��E+-����R|I^���X��y�R���_�5�E�>}�.7�F�q����;�_A�����TP
<y0a�'j	%��������J�	�%F������b��������gF)��K )�rr�!S*��M��1VM`��0��H��D:���	�
����;Cgd�l���������4^c6�'�'�\Ho�����(x�������<���W_����'����o�~x��=�B��d��F7O�y�4a~3�'S�jk����?�}"1�D�}:����v��z��������������s�l�J���a�����Z���b�x�t���&����I����^[Z-������ ��	�f�VL���u���C�YX�}�UG��	 q��6���P��iZ�ElS2�s�Tt-���8������B�@V�yx�������S���v��������w�����Xb�L�D�{�EG�K�
�`*�# ��7$�����;�(�9%�h�f�>m�wo��4��o��Y��6���7m������	y� p)�p�X�cF����Pg�a=�s�SwQ�5�fB�wv�����z�����)���Ypg�	A������J�TM�Z�pH�:|��("�D��D���p������
���:Z@������,(F���N��<_��p�����S�������xn�i
�A�������P������wB���S����6J����p%=���'�\5����&b��3�4v��y	�|������
K��c�8U��=�Q��5r��8��7B{cn�HM����v�)
wF��&`��g��YX�|RP	0���i�?�������cS�������V�����?��ae���n����YX��Q�>u��>"�g���/nZ���������>5�g\�5a��K��rL��_�� �J�0(���\h����S0���?�v���j{����>"�g�|�e��9�)b�]���3��o{���I�?/Vk%jq�Y��)��g�����*���������72�|d0��^���iG�?��6s_�O�g��~��dYU�a't�O�����W���%8��Q�
�HY�F���,�?K_j������Y��?h����1���+q����mxW��;�^}A@��H ^J^���|h��R����qVQ�S��V
�����Pzlh� O��	k�ZRN>P��@���q|g>�� ������;1B���
�$�Co�)>��N'�'t�j���U�t*�T��8������K~��2�t�cT�}��-���+1�L#�b�/~���H��?~qs���z.�.>����@�#������J�O�?:O�w�s8I����/�������
.�?gi��]�����w�����
�#�A�������4`��6���C��v��'�v���H�cfhLz�)�
��K�PX�������_������������6@r8H��U�7"�e�����zyU�DY�2B�����+�!���<P��h|�e�Rid���_�|���z��W��<���g�����$����[���G���S(���������d���B���E�2H[�'6���(
����LW^�3�-���Y^P�5b�Wvr���L
�h��Vm}�O��4g�)���������,����l�5�>@o��!������m_�1y�r/�@]I���qh7&JE6�@�{��c/Y�����M�~�b��vP>6O
,
(��=�Y���<�~��N��r�k����&�n�g@s�&.Rh��;!��O������2fY?�Q3��e^��H�^���������60���MBJr�N�c�Y!d�=7��4i��d0h2����-h����������y����o�,�'m0���:��sN<�Z���~��U��n�t�)�kMk�F�Ul\e�4.����-����QXJT��k8�����Z�#�-���6�z����v��go�o���|�������vA w��d�hN���d.���>����3��iB������?�<U����������[�9�~�)�k�Uh�S��jC������OJ����7�|Lmw6bW69zOwL.�/���WG �,��<;�[4�AQh&L�p�sU/A	
��=���ku��bm����E-�e��$�Yz�_.�P��r���,���`����������S���*��3�8Z������l������A��@�$�
5��i�k/��:�Tu��`���T1�VR����&�N%^F967����I��[KK�p����e��iJ'9����}+���W<e��6�����h;��E@�����>��;���GT��t
5��n�����*6��j�]!KJ�>���o�^��dx{�V#9�<l:/�
��0�suC���c��qG����1F }7��A)>T�)��D���ToX�#R�GHtK���9 [}	P�6t��6���1�]����� �-t��B�^1�?����I���?��D�H�x��b,(�����X�x���VIhw���o�}�`v�e�HB%o/:S�X�L{���8�QtH���g�Bm�W���vK� �K�����?[K��R�5�sW�\��a�T�����7N��1�%��{��F���8�R��
lS�����b�?c�a����*/���)E:��p�Qv�Ap��S~^J���(�EL�4��o�>FE>��?������L��s�S�G{�G��_�?�h�7�K�0�]��]���;\zi�vi�[�]��	/��e���P�`���v�=S-��o}�����������hh������q����=��2	�{9�PA�
�0����LUW�����&6(����F����-����z��b��������woNh�FI���;����g
{Ok�w�����w~���/t���:�x�}���O7�/^�����������G�?��C�}���:|���j{���Ve�_����o�T)������o��z�|���AP����h<����:������u��0������.��bS`��_#��M��kX�Tc�w��������~4.V��] �qDiy8yX@���������z��<;�K�n������g����??����]=yz������n����_)O�k�����S����vf�����������O|�mA���D���[�.]F5�r�R���T;6�}\�����'l���E����[se�vn������qW���t��Or�w��Z��8e�\���2�������������^����UsrND���Y����%HIK�\����s����4U�"z\F�sO�9IT�[c���2�c�����<r.<�Uip�5����T/�\+��z�2����"db]>aY��?{��_�e�F:�W#�V���5jg��W���Sg�]���Ud9_U�y,a@����<�` ]�#eR):���P�Q��*o��X?DI:�Y��J�:/r��������u��Rp�.�
��E9{s��>	�RvU�|,���T�aZh�����������������D>�?@{�W%���;G����G?<���O���A�Oh������1?,�&$m�>�Y�,�:8�� <�u����'7�ni���������������������,�A'V���P���6�Q���M�	�H8��V�����(%7���N���k_1�!��@)��l�����4��wc�/��������Dd�O|�4�H�
)����r&�%R1*��`tg�K���t�����@�J|f�Q��LA��=���6)��i�}�@d���*������x:�1R[a���?=�����F�b��|��%��U��z�f+�9,���X��7�-���#J<|�p_��������Do���Z Z��n�asZ	(e�I�(A��&w� ���S�
tey;I�1<������������J�A�t�=K��nl�m�^�Z��2�h�i:�Z#W|��nw��a^��oD�i��
�M(����{�b��#
��b'���:�@U&TtG,L<�\��"���^Zb+7vj�6�l�N���(��=��l���n��T5
`��5AW�N��$���F��������ze��;'�:n���T8	MN�D��H���g��1\U������6�w��V��H��?���y������WB�� ��^�?��fO�}�l��Wh�e~#�������_�RB�T%0+�y���+]h�K������t�q���������>�>tX�g
R@C�I�H�J��&k����{&^j&�L3�`=��q'��0
��k
�kkY�-]�g�������7�h�o#�Y%���F)b<��l���|4��.���D���}?}���;-�����D�;
�F�U`�(@n.����{�r��x��K���kc"f����`����X�]����������������;�k�����#�{bL5��A6����0�������^(A`S�W���p#�
�+~�Qw#�X]��n�*����+w���l���B��D��UO�)��5��Vp��\�����np�
?.��*#tzQi\uLd >:��qdi�B�5o���Wq����J�%mE
�H����	(iG<��1�jw�R�4����@�	�~����d�Z�g��z^	/S����q��[��l��7U������������c����u�2�����:���h�{j�"�������!����[�3�h�td����U�������&���4��K�Y�������
���M��e#���i%B�ih� @:)�:����?J�
Q	���X
�d"��-(g�g�L������j���*z����+���0K�hr�	�x.Kn�QM����px���J�&��\��	'-�kB���)��?,h�CM����L�6������*�mST���&t�'�V�(P�WQ���C�v_;��	���Evk�����]��B��Y���k�����J���Xr*	�"P%���C�
��A�����p�J�gs����?���*������HZ�3#�U��K)0n�3)F�h�����8p������.��������v���m��_:_�b�}��3������i<Rt�g����������{\1K5��������DfV" ��������F�L�M��v��V3+j��v�x&:�^��oa������h��0��oG#�_S�����qw�p�MS�DIEI5Q,��c3I�-�uE��������TK���	��)�����.�=.��u)#��������s�,t����^ �~Z����k��OQJ��A�b|���|�S�
�4J|��������b����Q����1�u(i�qj<�cE���)�jNu���x6)��[���S����
`4L�5���08��&������T�Bt��Y�,����8
�8F4�&��}d�^M��n�]��O�3`mW&33��|)���x�u)E������ .>~n_<����Er�9�����2MX l���/�_<Qt���~������R����G+�Fnz��
�xq�����������Z�����|��������tm;������s����%�������qM�����bw�C�,W������YPi�32�}o�a�����e��y��h�yx`���ct-��,��Lr�I������l�����U^��/����<���{�C���wa{y��&4"�3�x##�0W@ ���C����/_���wW4�����������Z	�����+�����?��/:�T.
�P�CP�x��C��z�#uf,5KEJ���R��3c����2��5'T7����Z�.B�*��(+h%������QD�$�Tz"K��J�v'��Kz�r������i�FX|��X�;����V�5�fi\�`:��H�<&���=$��q�������=�LP�+����c�5������W4����k�w7��F���������w|�9 IP�����A����uM7�V�FK(��4���@B�)��z�?���$h&��));����/�p����&���v���{C�o��K���F�C�'������Zt�h��+U��$���y��@,�Cr�f��2=T��CC�F� !m�27��Q'�,h���q.@���bF1�+����A��*l[��h���	m��a��ln���N�����Y%��Ov�@n+��R	!EC�af���r1��j1
SY�H4e�������Q���4��Sm��68�mC	
������%}�6B[�y�2�����-�[4z��`��.0H�7r�	��|�w��R��5dJ}���Ziw�a9�v]`�h|��t���TI`-�k%H`���k��c���U�
����6�QZ�K�J=J\P���5aO�V]��F����6������#�����+�0��/#�����<'�D1����k��u( ���D�nm�����~�#f��e�[5dc�7zA&^��P�V)E�b/���Yo_#@t%�iyb/}�8-v)2�s�+�����n���/����|����k�j��b�����	-Oh_�a���Q����{�@�I����m2�/�0!����*�e�c8Ni������0��Mf�"�jc`~W%K����c\���Z����c��^��V�i�P@��}��o�jm�n�����ua�����z�g���`k$Dt��&�-Y���=���su����K��;O� e/��C��,I���d��,��4���������A���`w��=����)��p����-|���3U�}������cO��wU�:��Mr��p���X��p�<�OY��x���Y(O[|hh�P���	�<a)�(���Z�����������@���K0v��i��ap/}�J/��s�9���@8#���2�
!$'�-r��	���Fh��J��������]C��I�	��& \r�s��>@/�����i{5��G>@������_��
<�>@��(�f�	�	����G�G�.� ��}$���'�X�3x��o�o��7e4��.��������z"��@���x<�-��iN����8�����>��
���v��U ����4S��S�0��/~����iSs�peh5��X��/j���Z�o��aS;��
�B]R��S
�d2�|��B���6�>V-��l`�~8�L�N�;*e�A����h���V�]�ti.G>Z�b�j���G}���
�I��
�9+d�G��LR@W�����h���W}������xn�>Z�>���h�b�����K� �
m��k?�]�a��8�x.�X�����O&���{���5��.���Z��G���h��j��V��\2j���]�}g��;|�0@�S�!�A1�������3������3'�Pki�2_0(����Qm��Z)�P�e���/����2E'2�6�^�V�������#��O�^��07F�/��Rk�R���AY8!=�f�TQE��\=K=�	+������U+r�>VLQ��x[+�l���HK�k���&<��^�����=VL&�AY'��)�\��N�v�R�s��g�ht��������y���J7h�T��d�{�!	����o5F)��<�ev ��(�/�K��2HML��v��_Xp��:$�n�AqZ:5'�<�hwL��E�d�o���]Or�z2���G���)�F����;���Q�	8#h���$�t��N	�|nBS&2�s)����U<_@�j/�q��6I�]/
K����Bh����|��J�[�E��ZE�t��CSCvC��T.	����w��C�J*�z��>�t�z��x�"A��
�k=�q��/	�/��j=�����)���������b�~.i�e���#�0�����5�0�����/�������?�����P������������hXh���H�7ax�6��d��^��@�#Zr�K'E�9X��
^�E3���p��2�7]5����O��vZ��?�w��)��������Y]��h;w�e�l,m��}���3Cp�/
����)�3l���@�M��mW',�i?����A�MFe�D��"�N��NN�}T����Uv�%YO\9/.3�`A�tk�����zv`��W��xqR�������k�(��KtB�	n�|�f�Z����MrW�;����G���Zl,/�?�=���/�K�
��n�?V]��'w��8dc����+?�m|g���������Y�-[�cI2��\R+	]M�+��f�R�O�X��+?0�� ��N�0��2Fz�kR��Qq�0��2E_�{�U���o���:��	�)��FN?����`m}jq��1���Xt�6�By
���o~w��O/i��t�8�mB���yC~s���J�x���6fK���[�aA�2����
��0��_�.P)�U�	aM4�[VP�L,FGBS��]���#<N�
����)�Y�v��
�y1�`��5����*
���h�ah�E<����p|���������m�����?�A{t��t�j	k�����r�����h�"�� 	�XC4b���aw�����rS^B\"�|��V�|��	�Kd�9��J�N��n���e���:;t����mh<��?{���C��V{���O�Vs;�L��Rh �����2�(�-2N��B�|����(�m��@���M:��C��@
�Y]ZRo�-
;"���e�B�� N6�E���4�F��K����ZP��t�8"�a�z���RxicE���]�}����?����q�^5�a�)^��\���U�I�\���&��0���y�Q��4���
�����!1��M&/�|BCD:���4$9�'W{}�����'��D�S��6
�q�ge$p
�8�9�Z��K��#l!���������_O�/`4n���R����
;L��&�w�cy9r�q�I�k�b_e�z#F.
7�0�~v����L�j4�����C�h]�/
!�~fz���!������$=�K8��Qj��R�$[-��K1_0��w&��B��/�a/�n�'���t���v�AE��	����F�����2���zR��\��p�1�
I��6^X����LO�//b:�W�*��@Y�Q��,$g\�����������+��	�?�fA*��>6#B�(Y�~���|�dF��k��s���Hm�Oi��:����w�6�lmg������D��F��&.2�%j�9Wn~z�<G[��c�f0�
���Z�5�4���=�u�+���u�����:�<7����~����I�`\�������1iZ2�����w��`C���TN1�-�CN��-�_�-�����j�7Z�C\f�D������4��Z���)@QqA"��:���8%QJ��N����;���$����|j[��47����Q���a:��.Q�$�K�����2#Jtge��T3��X>��;3�~|��g�!
`FK\�|���7��4x��l�6�a��81e�����I�*��E��&��k�OL�Q��@
��
�P�6.#������xzH����#3v*��5�sn	<�G*���f��������AB�#}7�������1���q&�d����L��l&RsD>X�x�7��h��o���3�u������U��W.�C�|q��N��7\-t�oC|������"�z����>,��/I�o�p�$�.n���x��'N����M�i�x��9G�A(9��sC'���u�]���X������<�����v�^�F5�MC;e4�����o�����=7�<���C�|���������hG���cu�]��1��������kS:8��MZ�}v�F�sh��g|�!�|V@�F��^	�/L���q��z�x������=7�+q��c��O������ht�4�k'u[iYc6q�aG����K�w�%h�cZ��0���a�x��lEku�������F, ����������7NkP����*�;��������;���-D�T��.��gl��U�������c�i����h����c�@���	�<t�|z��n��uzp��*x�����9��GFj���
"!�[x��,6u'���p����2,��w����n,� ��OkRwY�Ad�1G.�=�%���^2gd���i-��o	�t��?�I|z��eS�����/��]��j�k9��6PA���(���1R�?������~��������G!{� �,���R6�z�(QMTP��/�����V��0Gv�M��KL���2����6�us�����#`�N����*|��
���m��g�0�_c%��?�!�S_��s��WoO���-o���,��_�K]����qT}i)I����^^-�4}t�lG�X�D�������b2Bh��i�=3��@W�2�D�l�S6��M`KI��*�h�E�����L�P�)t�L�����X��H���4*��YB'��\�<�@Tay�6#b��/�����-��5@\��,���A�\����E�������H`ZY�8�@�[���k_������l����b��7D���z�>F6�Q*���w�+�}<�_!�
����[��v��7D��F�{�e�2���Ows+������/������n�)e�&�B�� �Bp|�]�J�iy��b*����A�9>C�cor�8��i���Ew�q_c�)����M����N��j�L)%���ny���N���o���������z���O��OE�����6��k����ht��o~	G���������5�c+�7����41�5�g�@��[�GE���)p����zO��#��]����6���� �&�8��66�	��q�����o`K8�>�����H�;��|Y�}=����'�^;��9WKa���F��M�E��9�>�2�5���la,$�	�Bv���J���m	/7��5T������JDh�W�j�njO"hD�|������9���`1.9U���D�cu�v�����F*_(;C��6�X�y���G�]�X�\�Y�ua�\�V��{� �}�������U����+��-TP}{>�d7<_,RK���a��K��s18����m�|�w�_.t8l� �!8�����u�&P�����ZF�^.�����Qy����f.�'�x ?y����~��9	��l���t��<��K��,m�6��]��[��D��&'y�R?G|��T��QMZ(���ZpM��L�{�Dap
��*�$�
�>[���I��Z��4�2�|:>��� fe1P��
���������"IBpq�����<����MFA�a:��A��1���Sf
U9B���������ac�0�4��fuM�H��3:9�F1b.5��s�Ra���'���1Hq������q��4T��-%���qL	����R*{�x�.]�A,�'�1�R�z���C�T~��x�&���y��?<�'+��\eU��,����fh�y��e�R��m4Y��t�Q
�������:R��7t�j����LT�T�q�����y?�����NJ�A�f_�S������)���

�y��sTfo���T/1#��������Pb�`+jh{_��/N�dJ������; 	2���*�4&�)�*�(:"�s��k�P���i�l|%����-T\1*�<,��Gh�G]o����=�`�n�����!*�i)��.7�s�Qz�U��R��e���������D`a���
���K�pk6�a?x����^���<m_Q����\m!H�����nn_����>��=���������/~n=���U��MTO	?�P������Z��RZ[�����^Y��0��������/W��Ya
\�����8�F�g�����������-T9!�>�G���tk��)�g�?R���1d>P�����������vZ������3�����t����"<�����n�m�n����p�=������
qN���l ZQx�p���>6���i�Q�&��+Q�8�Kcy-`=�������C�JM��SZg���W�o��v804_����^f���2��'��H B:��(� ����UJE�R����1&�F��2��{Q�NN�L/-� &M����_h	�$(��
G4'8NA��n��rv��(��Y�9	�d�"���n/#S+d�3�p�R��&l�����F��c�84�p;��"����Y-d�(+H����L�
9q��3������8����>Q����_3����b���-�T7���?���e7bqEO�&v�����������[���{�r-�F�z�U���Gq���_z5t/�����7O������(�ayX���l����OT����=�q�����0���\�K��!kb�rZ1h�p��5^1�5y�)�V��4k��D�lO���kL�F�V�i	��uV7�l9�!��E��do�0�e_��U���
T8�LC�@3���E�����IrY�Di�F�����������(6�>	��i4��R�]_	�S(��F�:
�m��,��m)�4'���^��>t��s�p�s	(fT98�+�(Jf�o#�=N����������?�����w�����Fz���d�Z� 
��U���a�����R[_V[z�Y�on��_���}�����_���}����~
����������m�m/��,�*��]:\��^���p{�g+ap5+*aV��(�r�}v��%}h���o~���jX�
N�����}��wW_=�����t�������\=���%(r���C���<��K�Z��~���V
��Zm�j<���e���h�8AYu�.�]�Sl�����@��?i�)�[(�R]��gi���m�LY��}��a���Bkfb�i�*�����c���{�����<�(b��.T�P�������B
���8������H��i	��)Q�f/���B�VH��g|P���"�p�0�Y��;%+�-d�I(s)=�e�s��#	�ix�J��|:��x����e���T!����������%!+1����{��3fik�_�Tkc������(T���)	6�[vcL�5'o7��K��;�"yU�W�FF��M���>
jSB��k���A�@��!6|�Pg����["�iA
�Z�c���\��ND�]E��Q��"T���h��[4J����+���(��4{���v�('����9�yK�D��L���}���)�_9
,�*@���5&Q�T>!���;��z��>��~;(�<"��B�"h�x�4�a�vJ�~;1U`�� Q�h�$�T�vo���m7~���H<}�*I?�*��.��p��xI	�f����C�C��/�983h1T��[�$�����pIaM��T�����.I�O2e�^-�
��+i� �#��n�:�����q��f�v�9�(�����������h���"o���]�V*lbj;dh�7m����`�� p�'�B��)��^5�Q�>DW�[�������9��\�o~	����}�O�K��m�k��]�;Z��`A3*U�D��/����	V��)�-v7F����H��������C�*�^���c��~����G�����g{t0W71`��B���Y7f/	�U����"��I
��<T�b�)���������c)��$��{>D�DOL���-z(�������i"�����\I�7e�a$jph������^%Ub"`;i�����S��%��A��4=D)O����W���es�3��e���*�M�����f���A�Q� x���Ox��Q���&�>�����=|����]e�3�D�'
���=	���m�"\��e	�R����V�,5����p��<4!D����&v09��j5E�(�'���rX6>�8��	
Tuw�]���\?�^�[�v�q�Df`�,|"L��Q1������p���u����O�_:}�K�^�`�VR����]�m�=�u�y�h�*bch;�+vZ]h�D;��b��(rU�r�:u�4��6*����n�%�
��7����@:�>z"��������iuZ�7lzo�����;ht~�n����Vn�M<~�����O����]�d�8�S����Ms`p���v9Cv�'�����BPJ���h�:gMu��@��Dz)q
������&��>+�IOo�?v�Hw�]�w\�_����q���`K�jf�����~��m#�n�)���z�N�M7l��rS�m��L������;h�V0�x�G���/G�`��6:"����D-����#�eR%��7���9G����ig
}|A ��H��g��3 �<���3}�4e���K�2���A����@ku�����b~��`�
o����������8gC)�gl#x+0��S�	�KU�n�����3���	t���%J���!p�f1���h3Tr�s�rR�L^JZV��9��Hb�Q7�be��h���m�����e*�d��$o�vfM*�d#E#���n�Pk�����t�u[������!����Pz���FFY��r}����\����m�J:��M������Sm)�Z���55�5a��?V9��i\������`�!Rb�������(��h���i�U���F\b0��}u-3#e����9
P�H�:%4�Tf)z��Lsv��&�r���j�5�D"
8/
�������]���m������~3^�$�N��0��ClC�A(XE#p"B�x��En08��b��O��������`OLj�&��8���������<��[�r��aVN���~)�2��
��ox������]��~��6���^{(�W��RDf-(��N1Sc�*�1����)��Y���Fs$�&��n��������0�a��.��l����`�g^����9!�)M����H�E�e
����������V�C$�g�#��h�&����bwD{�PG�j�PI9��*����#���}��x���������qu?����
r�im�&YeOW:U�Vf��23�V�S����YMO�;_B�8"D�a~�����1�M�|�tZ��
Z��M��l���_^��CK\Z�s�
�}A��?FM���X���=1�y��6J]i
�ZQ�o.Q������?&�V��4��G^����6���v�(����"3�9<��w*���l�js�[��!���Lg4��t=?���{�S��������?>���3�t�5=���37J�V����5K�����c���_
V�h���a;#&s�����"����!@i�����/�8������>�9++b�OlA�����1e�>���_"(�H����S������X��My���LUr)S���j�^1�Rc��1��YXi��F9�����w�g��Sq@���0���?�:�������`{��?-k9��� 4�WF�Gx������!*�#@���2���z��c��%�S���f��2=���3C��������~	���������_��{�����W>������\�_\}��;����>1�����1j����!.r�
��4��������i]i��<� |w�ca�W]���[8�������#[u���W����
��L;���k(��2�lE�}"��d�c%,������U��=�����K�kk��]����(nmi�%j�E��"-�v��Q�b2	Lu�Lu�����*0���\H``�����u��Z�R���W�N*�E�?GU�jH �%r�T9%��L����^\��.�e�/��z�����JM���yI���}*�*u�E�I��Ich/�/D#����t��_Z��Jz�3'�����-z� �3�Y-R*��:�Q#��<pTS*��'yh�����P���697R�mA4�������JY;h4���c)�<q��6*�K-w#�e|�X�=�T����i`k��J���`����8��T�����C�B��5Z)�|��Gu����=@B���-�T���U��>s���/Y��/�����_B�Y��������"<�
�����r��J��������7�q���Gb���d���O�	)^O#9��H5��d���0�@�?�<��k�)w}�`�����-���:gPF<�������Z��������:Ix��M�������L����m�n����
0��=8�2��s�;���I�a�yT��4u����������Q6��&����(�*u��y�zG��3�w�
�����Q,�S�1")�%D������b��y����TC��������t���0���a��H���W�(<+kZAE�9���J>�������v�5��*��n;�'V���bR���2�t=������R�J�a��/jI�(���Z���������])�n��S	$����(A�5����u�% �IEq������HQ@��j��B�����t�
�Y1G�&7�^~������vW
.7���u����IhT|��>��<>�N'wY#+��m=�I�L�GI �n2F���^���GAH)#�� (���v����kTlJ��(�J��m7~�?��&�����$w����A/�:K��?�+��@����
�"U����N��.�i�g-��AmxI�SR"9�)�����qt��|���tV�#|��UVb���\}P���cG�������&m��\:L���;w���o8��b�����7�My�o�YM��TA���:y�����4��\Q�K�s�YV�o�Rl+)��Mb�Q14�&�� 2��DW3��6���Qd��qs"��~��'��P��%m�
��<��3_��yV�p��
����SP��[-U����~�@q��T�
�Y�?������~?�%K&W �SB�����Di�d�d���7���)���e�����{;<'t������	O���U`^����2�����C%C�c�B���O�\��`.��������/��gi���v�8��1�1Jt�����5���x}���!LD�F<�yuK�n�B!������A�j\�')x�3�|O�`-#��EY��W��k"�c@��P�$V�>L�@1��a
�w��K�C���Xw"�z9$jA��������_�`yE��sG5�>�i	�"rRv�O<���~n�HF���5DB����BT/@�d���N��-@h}bB"����u�|=�C�K�l���Y�%8��{��)������S�����F7xD���;��X���4T�5B�v2G���B`%r����n��G�s�����r��z�"�;Z�)�����:}-z������|��K%x*�
)�e^A�H>�I�}_��w���5�`ZQ��Qf��t�J�����v��t���IZ�{���C~�
�2O�����x��?o�"���$��2F�&�@�U�g'�<��\��|�^]	����}Y��3�:�;�v���~�W4������NeVDA���C�9��zAU?R�q���mt����}.M~Y��e��r���"�Q�����Gwn0��"��S?)�3n!�b-�z��u�>��R��f�i�����1Y�������E��~�.eXJ��a���B4�S����m��k��m��9wP��I=�����f�A9L���%<|v1	_�N����#��6���O��u~�������������,m���]@��G()�S7�p���}�(�)J��6��$��(��I��S����d��	�����'������?\������z��k�>���7I�t������_���>^��2D��<q��R��b� nJCF�HJ�������������2���T��<�Z��K�$0�%	��ueA�u��|��	R�L*�xw�����8E��z��xEX�l
mw
/A^0�F�O9��=	x�v�K�����;P�(/p��E�&HV��K���!�2)�I�f��������/���:h�bdGn���X1^���L7��AL���RAm�^a}U�Q��wr���w;f���~�z�WT������!P�<��
���^\���[���AT�z����,Q�3�9�����U�����7�_��?��Vq���]}���m��g_������nn��y��aZB_t/�hf)�����n��������IZ��b�����[�:W�8r+1R��3���B�jZz�(�K�����P�c��W��{�{��>��A8���;Za|����]}��O����o��
�'�o��������[���!W9<��_�P�C���������w��o_�M��-J���ez��w��K��Q�v��In��K��K��j��l����

#69

David Steele

david@pgmasters.net

almost 9 years ago

In reply to: Rafia Sabih (#68)

Re: WIP: [[Parallel] Shared] Hash

Hi Thomas,

On 3/28/17 1:41 AM, Rafia Sabih wrote:

On Mon, Mar 27, 2017 at 12:20 PM, Thomas Munro

I thought this last point about Windows might be fatal to my design,
but it seems that Windows since at least version 2000 has support for
Unixoid unlinkability via the special flag FILE_SHARE_DELETE.

<...>

Please find the attached file for the explain analyse output of these
queries on head as well as patch.
Would be working on analysing the performance of this patch on 300 scale factor.

I have marked this submission "Waiting for Author". A new patch is
needed to address Andres' comments and you should have a look at Rafia's
results.

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Andres Freund (#67)

Re: WIP: [[Parallel] Shared] Hash

Hi,

On 2017-03-27 22:33:03 -0700, Andres Freund wrote:

On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:

Here is a new patch series responding to feedback from Peter and Andres:

Here's a review of 0007 & 0010 together - they're going to have to be
applied together anyway...
...
ok, ENOTIME for today...

Continuing, where I dropped of tiredly yesterday.

-		ExecHashJoinSaveTuple(tuple,
-							  hashvalue,
-							  &hashtable->innerBatchFile[batchno]);
+		if (HashJoinTableIsShared(hashtable))
+			sts_puttuple(hashtable->shared_inner_batches, batchno, &hashvalue,
+						 tuple);
+		else
+			ExecHashJoinSaveTuple(tuple,
+								  hashvalue,
+								  &hashtable->innerBatchFile[batchno]);
 	}
 }

Why isn't this done inside of ExecHashJoinSaveTuple?

@@ -1280,6 +1785,68 @@ ExecHashTableReset(HashJoinTable hashtable)

+			/* Rewind the shared read heads for this batch, inner and outer. */
+			sts_prepare_parallel_read(hashtable->shared_inner_batches,
+									  curbatch);
+			sts_prepare_parallel_read(hashtable->shared_outer_batches,
+									  curbatch);

It feels somewhat wrong to do this in here, rather than on the callsites.

+		}
+
+		/*
+		 * Each participant needs to make sure that data it has written for
+		 * this partition is now read-only and visible to other participants.
+		 */
+		sts_end_write(hashtable->shared_inner_batches, curbatch);
+		sts_end_write(hashtable->shared_outer_batches, curbatch);
+
+		/*
+		 * Wait again, so that all workers see the new hash table and can
+		 * safely read from batch files from any participant because they have
+		 * all ended writing.
+		 */
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_RESETTING_BATCH(curbatch));
+		BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_RESETTING);
+		Assert(BarrierPhase(&hashtable->shared->barrier) ==
+			   PHJ_PHASE_LOADING_BATCH(curbatch));
+		ExecHashUpdate(hashtable);
+
+		/* Forget the current chunks. */
+		hashtable->current_chunk = NULL;
+		return;
+	}

/*
* Release all the hash buckets and tuples acquired in the prior pass, and
@@ -1289,10 +1856,10 @@ ExecHashTableReset(HashJoinTable hashtable)
oldcxt = MemoryContextSwitchTo(hashtable->batchCxt);

 	/* Reallocate and reinitialize the hash bucket headers. */
-	hashtable->buckets = (HashJoinTuple *)
-		palloc0(nbuckets * sizeof(HashJoinTuple));
+	hashtable->buckets = (HashJoinBucketHead *)
+		palloc0(nbuckets * sizeof(HashJoinBucketHead));

-	hashtable->spaceUsed = nbuckets * sizeof(HashJoinTuple);
+	hashtable->spaceUsed = nbuckets * sizeof(HashJoinBucketHead);

/* Cannot be more than our previous peak; we had this size before. */
Assert(hashtable->spaceUsed <= hashtable->spacePeak);
@@ -1301,6 +1868,22 @@ ExecHashTableReset(HashJoinTable hashtable)

 	/* Forget the chunks (the memory was freed by the context reset above). */
 	hashtable->chunks = NULL;
+
+	/* Rewind the shared read heads for this batch, inner and outer. */
+	if (hashtable->innerBatchFile[curbatch] != NULL)
+	{
+		if (BufFileSeek(hashtable->innerBatchFile[curbatch], 0, 0L, SEEK_SET))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+				   errmsg("could not rewind hash-join temporary file: %m")));
+	}
+	if (hashtable->outerBatchFile[curbatch] != NULL)
+	{
+		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+				   errmsg("could not rewind hash-join temporary file: %m")));
+	}
 }

 /*
@@ -1310,12 +1893,21 @@ ExecHashTableReset(HashJoinTable hashtable)
 void
 ExecHashTableResetMatchFlags(HashJoinTable hashtable)
 {
+	dsa_pointer chunk_shared = InvalidDsaPointer;
 	HashMemoryChunk chunk;
 	HashJoinTuple tuple;
 	int			i;

 	/* Reset all flags in the main table ... */
-	chunk = hashtable->chunks;
+	if (HashJoinTableIsShared(hashtable))
+	{
+		/* This only runs in the leader during rescan initialization. */
+		Assert(!IsParallelWorker());
+		hashtable->shared->chunk_work_queue = hashtable->shared->chunks;
+		chunk = pop_chunk_queue(hashtable, &chunk_shared);
+	}
+	else
+		chunk = hashtable->chunks;

Hm - doesn't pop_chunk_queue empty the work queue?

+/*
+ * Load a tuple into shared dense storage, like 'load_private_tuple'.  This
+ * version is for shared hash tables.
+ */
+static HashJoinTuple
+load_shared_tuple(HashJoinTable hashtable, MinimalTuple tuple,
+				  dsa_pointer *shared, bool respect_work_mem)
+{

Hm. Are there issues with "blessed" records being stored in shared
memory? I seem to recall you talking about it, but I see nothing
addressing the issue here? (later) Ah, I see - you just prohibit
paralleism in that case - might be worth pointing to.

+	/* Check if some other participant has increased nbatch. */
+	if (hashtable->shared->nbatch > hashtable->nbatch)
+	{
+		Assert(respect_work_mem);
+		ExecHashIncreaseNumBatches(hashtable, hashtable->shared->nbatch);
+	}
+
+	/* Check if we need to help shrinking. */
+	if (hashtable->shared->shrink_needed && respect_work_mem)
+	{
+		hashtable->current_chunk = NULL;
+		LWLockRelease(&hashtable->shared->chunk_lock);
+		return NULL;
+	}
+
+	/* Oversized tuples get their own chunk. */
+	if (size > HASH_CHUNK_THRESHOLD)
+		chunk_size = size + HASH_CHUNK_HEADER_SIZE;
+	else
+		chunk_size = HASH_CHUNK_SIZE;
+
+	/* If appropriate, check if work_mem would be exceeded by a new chunk. */
+	if (respect_work_mem &&
+		hashtable->shared->grow_enabled &&
+		hashtable->shared->nbatch <= MAX_BATCHES_BEFORE_INCREASES_STOP &&
+		(hashtable->shared->size +
+		 chunk_size) > (work_mem * 1024L *
+						hashtable->shared->planned_participants))
+	{
+		/*
+		 * It would be exceeded.  Let's increase the number of batches, so we
+		 * can try to shrink the hash table.
+		 */
+		hashtable->shared->nbatch *= 2;
+		ExecHashIncreaseNumBatches(hashtable, hashtable->shared->nbatch);
+		hashtable->shared->chunk_work_queue = hashtable->shared->chunks;
+		hashtable->shared->chunks = InvalidDsaPointer;
+		hashtable->shared->shrink_needed = true;
+		hashtable->current_chunk = NULL;
+		LWLockRelease(&hashtable->shared->chunk_lock);
+
+		/* The caller needs to shrink the hash table. */
+		return NULL;
+	}

Hm - we could end up calling ExecHashIncreaseNumBatches twice here?
Probably harmless.

/* ----------------------------------------------------------------
  *		ExecHashJoin
@@ -129,6 +200,14 @@ ExecHashJoin(HashJoinState *node)
 					/* no chance to not build the hash table */
 					node->hj_FirstOuterTupleSlot = NULL;
 				}
+				else if (hashNode->shared_table_data != NULL)
+				{
+					/*
+					 * The empty-outer optimization is not implemented for
+					 * shared hash tables yet.
+					 */
+					node->hj_FirstOuterTupleSlot = NULL;

Hm, why is this checking for the shared-ness of the join in a different
manner?

+					if (HashJoinTableIsShared(hashtable))
+					{
+						/*
+						 * An important optimization: if this is a
+						 * single-batch join and not an outer join, there is
+						 * no reason to synchronize again when we've finished
+						 * probing.
+						 */
+						Assert(BarrierPhase(&hashtable->shared->barrier) ==
+							   PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
+						if (hashtable->nbatch == 1 && !HJ_FILL_INNER(node))
+							return NULL;	/* end of join */
+
+						/*
+						 * Check if we are a leader that can't go further than
+						 * probing the first batch, to avoid risk of deadlock
+						 * against workers.
+						 */
+						if (!LeaderGateCanContinue(&hashtable->shared->leader_gate))
+						{
+							/*
+							 * Other backends will need to handle all future
+							 * batches written by me.  We don't detach until
+							 * after we've finished writing to all batches so
+							 * that they are flushed, otherwise another
+							 * participant might try to read them too soon.
+							 */
+							sts_end_write_all_partitions(hashNode->shared_inner_batches);
+							sts_end_write_all_partitions(hashNode->shared_outer_batches);
+							BarrierDetach(&hashtable->shared->barrier);
+							hashtable->detached_early = true;
+							return NULL;
+						}
+
+						/*
+						 * We can't start searching for unmatched tuples until
+						 * all participants have finished probing, so we
+						 * synchronize here.
+						 */
+						Assert(BarrierPhase(&hashtable->shared->barrier) ==
+							   PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
+						if (BarrierWait(&hashtable->shared->barrier,
+										WAIT_EVENT_HASHJOIN_PROBING))
+						{
+							/* Serial phase: prepare for unmatched. */
+							if (HJ_FILL_INNER(node))
+							{
+								hashtable->shared->chunk_work_queue =
+									hashtable->shared->chunks;
+								hashtable->shared->chunks = InvalidDsaPointer;
+							}
+						}

Couldn't we skip that if this isn't an outer join? Not sure if the
complication would be worth it...

+void
+ExecShutdownHashJoin(HashJoinState *node)
+{
+	/*
+	 * By the time ExecEndHashJoin runs in a work, shared memory has been

s/work/worker/

+	 * destroyed.  So this is our last chance to do any shared memory cleanup.
+	 */
+	if (node->hj_HashTable)
+		ExecHashTableDetach(node->hj_HashTable);
+}

+           There is no extra charge
+	 * for probing the hash table for outer path row, on the basis that
+	 * read-only access to a shared hash table shouldn't be any more
+	 * expensive.
+	 */

Hm, that's debatable. !shared will mostly be on the local numa node,
shared probably not.

* Get hash table size that executor would use for inner relation.
 	 *
+	 * Shared hash tables are allowed to use the work_mem of all participants
+	 * combined to make up for the fact that there is only one copy shared by
+	 * all.

Hm. I don't quite understand that reasoning.

* XXX for the moment, always assume that skew optimization will be
* performed. As long as SKEW_WORK_MEM_PERCENT is small, it's not worth
* trying to determine that for sure.

If we don't do skew for parallelism, should we skip that bit?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 9 years ago

In reply to: Rafia Sabih (#68)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Mar 28, 2017 at 11:11 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

On Mon, Mar 27, 2017 at 12:20 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sun, Mar 26, 2017 at 3:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

But... what you said above must be a problem for Windows. I believe
it doesn't allow files to be unlinked if they are open, and I see that
DSM segments are cleaned up in resowner's phase ==
RESOURCE_RELEASE_BEFORE_LOCKS and files are closed in phase ==
RESOURCE_RELEASE_AFTER_LOCKS.

I thought this last point about Windows might be fatal to my design,
but it seems that Windows since at least version 2000 has support for
Unixoid unlinkability via the special flag FILE_SHARE_DELETE.

On testing v10 of this patch over commit
b54aad8e34bd6299093e965c50f4a23da96d7cc3 and applying the tweak
mentioned in [1], for TPC-H queries I found the results quite
encouraging,

Experimental setup:
TPC-H scale factor - 20
work_mem = 1GB
shared_buffers = 10GB
effective_cache_size = 10GB
random_page_cost = seq_page_cost = 0.1
max_parallel_workers_per_gather = 4

Performance numbers:
(Time in seconds)
Query | Head | Patch |
-------------------------------
Q3 | 73 | 37 |
Q5 | 56 | 31 |
Q7 | 40 | 30 |
Q8 | 8 | 8 |
Q9 | 85 | 42 |
Q10 | 86 | 46 |
Q14 | 11 | 6 |
Q16 | 32 | 11 |
Q21 | 53 | 56 |

Please find the attached file for the explain analyse output of these
queries on head as well as patch.
Would be working on analysing the performance of this patch on 300 scale factor.

[1] /messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com
--

Before moving to higher scale I tried playing around work_mem effects
on this patch and came across following results,
All settings are kept as before with the exception of work_mem that is
set to 64MB.

Most of the queries showed similar performance except a few, details
are as follows,
(all time are given in ms)
Query | Head | Patch
---------+----------+--------
Q8 | 8720 | 8839
Q18 | 370710 | 384347
Q21 | 53270 | 65189

Clearly, regression in Q8 and Q18 is minor but that in Q21 is
significant. Just to confirm, I have applied the tweak mentioned in
[1]: /messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com
For the explain analyse output of Q21 on head and with patch, please
check the attached file.

[1]: /messages/by-id/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#72

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#70)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

Hi hackers,

Thanks very much to Rafia for testing, and to Andres for his copious
review feedback. Here's a new version. Changes:

1. Keep all the backing files that are part of a BufFileSet in
subdirectories, as suggested by Andres. Now, instead of that
unpopular logic for scanning ranges of possible file paths to delete,
we can just blow away whole directories that group sets of related
files.

2. Don't expose 'participant' and 'partition' concepts, Andres didn't
like much, in the BufFile API. There is a new concept 'stripe' which
client code of BufFileSet can use to specify the participant number in
a more general way without saying so: it's really just a way to spread
files over tablespaces. I'm not sure if tablespaces are really used
that way much, but it seemed like Peter wasn't going to be too happy
with a proposal that didn't do *something* to respect the existing
temp_tablespaces GUC beahviour (and he'd be right). But I didn't
think it would make any kind of sense at all to stripe by 1GB segments
as private BufFiles do when writing from multiple processes, as I have
argued elsewhere, hence this scheme.

The 'qunique' function used here (basically poor man's std::unique) is
one I proposed earlier, with the name suggested by Tom Lane:

See /messages/by-id/CAEepm=2vmFTNpAmwbGGD2WaryM6T3hSDVKQPfUwjdD_5XY6vAA@mail.gmail.com
.

3. Merged the single-batch and multi-batch patches into one.
EarlierI had the idea that it was easier to review them in layers
since I hoped people might catch a glimpse of the central simplicity
without being hit by a wall of multi-batch logic, but since Andres is
reviewing and disagrees, I give you 0010-hj-parallel-v11.patch which
weighs in at 32 files changed, 2278 insertions(+), 250 deletions(-).

4. Moved the DSM handling to the every end of resowner.c's cleanup.
Peter pointed out that it would otherwise happen before fd.c Files are
closed. He was concerned about a different aspect of that which I'm
not sure I fully understand, but at the very least it seemed to
represent a significant problem for this design on Windows. I
discussed this briefly with Robert off-list and he told me that there
is probably no good reason for the ordering that we have, and what's
more, there may be good arguments even outside this case for DSM
segments being cleaned up as late as possible, now that they contain
shared control information and not just tuple data as once had been
imagined. I can't think of any reason why this would not be safe.
Can you?

5. The empty inner relation optimisation implemented.

Some smaller changes and miles of feedback inline below:

On Mon, Mar 27, 2017 at 11:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Mon, Mar 27, 2017 at 9:41 AM, Andres Freund <andres@anarazel.de> wrote:
SharedBufFile allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment. This includes
changes to fd.c and buffile.c to support new kinds of temporary file.
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 4ca0ea4..a509c05 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
I think the new facilities should be explained in the file's header.
Will do.

Done.

@@ -68,9 +71,10 @@ struct BufFile
* avoid making redundant FileSeek calls.
*/
-       bool            isTemp;                 /* can only add files if this is TRUE */
+       bool            isSegmented;    /* can only add files if this is TRUE */
That's a bit of a weird and uncommented upon change.
I was trying to cut down on the number of places we use the word
'temporary' to activate various different behaviours. In this case,
the only thing it controls is whether the BufFile is backed by one
single fd.c File or many segments, so I figured it should be renamed.

As Peter and you have pointed out, there may be a case for removing it
altogether.

Done in 0007-hj-remove-buf-file-is-temp-v11.patch.

@@ -79,6 +83,8 @@ struct BufFile
*/
ResourceOwner resowner;

+ BufFileTag tag; /* for discoverability between backends */

Not perfectly happy with the name tag here, the name is a bit too
similar to BufferTag - something quite different.

Yeah, will rename.

Done. That existed only because I had sharedbuffile.c which needed
special access to buffile.c via those weird 'tag' interfaces. In the
new version that isn't required, and a new struct BufFileSet is
provided by buffile.c/h.

+static void
+make_tagged_path(char *tempdirpath, char *tempfilepath,
+                                const BufFileTag *tag, int segment)
+{
+       if (tag->tablespace == DEFAULTTABLESPACE_OID ||
+               tag->tablespace == GLOBALTABLESPACE_OID)
+               snprintf(tempdirpath, MAXPGPATH, "base/%s", PG_TEMP_FILES_DIR);
+       else
+       {
+               snprintf(tempdirpath, MAXPGPATH, "pg_tblspc/%u/%s/%s",
+                                tag->tablespace, TABLESPACE_VERSION_DIRECTORY,
+                                PG_TEMP_FILES_DIR);
+       }
+
+       snprintf(tempfilepath, MAXPGPATH, "%s/%s%d.%d.%d.%d.%d", tempdirpath,
+                        PG_TEMP_FILE_PREFIX,
+                        tag->creator_pid, tag->set, tag->partition, tag->participant,
+                        segment);

Is there a risk that this ends up running afoul of filename length
limits on some platforms?

The names are shorter now, and split over two levels:

pgsql_tmp37303.2.set/pgsql_tmp.p30.b0.0

If we do decide not to change this: Why is that sufficient? Doesn't the
same problem exist for segments later than the first?

It does exist and it is handled. The comment really should say
"unlinking segment N + 1 (if it exists) before creating segment N".
Will update.

I got rid of this. This doesn't come up anymore because the patch now
blows away whole directories. There is never a case where files left
over after a crash-restart would confuse us. There may be left over
directories, but if we find that we can't create a directory, we try
to delete it and all its contents first (ie to see if there was a
leftover directory from before a crash-restart) and then try again, so
individual segment files shouldn't be able to confuse us.

+ * PathNameCreateTemporaryFile, PathNameOpenTemporaryFile and
+ * PathNameDeleteTemporaryFile are used for temporary files that may be shared
+ * between backends.  A File created or opened with these functions is not
+ * automatically deleted when the file is closed, but it is automatically
+ * closed and end of transaction and counts agains the temporary file limit of
+ * the backend that created it.  Any File created this way must be explicitly
+ * deleted with PathNameDeleteTemporaryFile.  Automatic file deletion is not
+ * provided because this interface is designed for use by buffile.c and
+ * indirectly by sharedbuffile.c to implement temporary files with shared
+ * ownership and cleanup.

Hm. Those name are pretty easy to misunderstand, no? s/Temp/Shared/?

Hmm. Yeah these may be better. Will think about that.

I like these names. This is fd.c providing named temporary files.
They are definitely temporary files still: they participate in the
total temp limit and logging/pgstat and they are automatically closed.
The only different things are: they have names permitting opening by
other backends, and (it follows) are not automatically deleted on
close. buffile.c takes over that duty using a BufFileSet.

+File
+PathNameOpenTemporaryFile(char *tempfilepath)
+{
+       File file;
+
+       /*
+        * Open the file.  Note: we don't use O_EXCL, in case there is an orphaned
+        * temp file that can be reused.
+        */
+       file = PathNameOpenFile(tempfilepath, O_RDONLY | PG_BINARY, 0);

If so, wouldn't we need to truncate the file?

Yes, this lacks O_TRUNC. Thanks.

Actually the reason I did that is because I wanted to open the file
with O_RDONLY, which is incompatible with O_TRUNC. Misleading comment
removed.

+ * A single SharedBufFileSet can manage any number of 'tagged' BufFiles that
+ * are shared between a fixed number of participating backends.  Each shared
+ * BufFile can be written to by a single participant but can be read by any
+ * backend after it has been 'exported'.  Once a given BufFile is exported, it
+ * becomes read-only and cannot be extended.  To create a new shared BufFile,
+ * a participant needs its own distinct participant number, and needs to
+ * specify an arbitrary partition number for the file.  To make it available
+ * to other backends, it must be explicitly exported, which flushes internal
+ * buffers and renders it read-only.  To open a file that has been shared, a
+ * backend needs to know the number of the participant that created the file,
+ * and the partition number.  It is the responsibily of calling code to ensure
+ * that files are not accessed before they have been shared.

Hm. One way to make this safer would be to rename files when exporting.
Should be sufficient to do this to the first segment, I guess.

Interesting idea. Will think about that. That comment isn't great
and repeats itself. Will improve.

Comment improved. I haven't investigated a file-renaming scheme for
exporting files yet.

+ * Each file is identified by a partition number and a participant number, so
+ * that a SharedBufFileSet can be viewed as a 2D table of individual files.
I think using "files" as a term here is a bit dangerous - they're
individually segmented again, right?
True. It's a 2D matrix of BufFiles. The word "file" is super
overloaded here. Will fix.

No longer present.

+/*
+ * The number of bytes of shared memory required to construct a
+ * SharedBufFileSet.
+ */
+Size
+SharedBufFileSetSize(int participants)
+{
+       return offsetof(SharedBufFileSet, participants) +
+               sizeof(SharedBufFileParticipant) * participants;
+}
The function name sounds a bit like a function actuallize setting some
size... s/Size/DetermineSize/?
Hmm yeah "set" as verb vs "set" as noun. I think "estimate" is the
established word for this sort of thing (even though that seems
strange because it sounds like it doesn't have to be exactly right:
clearly in all these shmem-space-reservation functions it has to be
exactly right). Will change.

Done. (Of course 'estimate' is both a noun and a verb too, and for
extra points pronounced differently...)

+/*
+ * Create a new file suitable for sharing.  Each backend that calls this must
+ * use a distinct participant number.  Behavior is undefined if a participant
+ * calls this more than once for the same partition number.  Partitions should
+ * ideally be numbered consecutively or in as small a range as possible,
+ * because file cleanup will scan the range of known partitions looking for
+ * files.
+ */

Wonder if we shouldn't just create a directory for all such files.

Hmm. Yes, that could work well. Will try that.

Done.

I'm a bit unhappy with the partition terminology around this. It's
getting a bit confusing. We have partitions, participants and
segements. Most of them could be understood for something entirely
different than the meaning you have here...

Ok. Let me try to explain [explanation...].

(Perhaps SharedBufFileSet should be called PartitionedBufFileSet?)

I got rid of most of that terminology. Now I have BufFileSet which is
a set of named BufFiles and it's up to client code to manage the
namespace within it. SharedTuplestore happens to build names that
include partition and participant numbers, but that's its business.
There is also a 'stripe' number, which is used as a way to spread
files across multiple temp_tablespaces.

+static void
+shared_buf_file_on_dsm_detach(dsm_segment *segment, Datum datum)
+{
+       bool unlink_files = false;
+       SharedBufFileSet *set = (SharedBufFileSet *) DatumGetPointer(datum);
+
+       SpinLockAcquire(&set->mutex);
+       Assert(set->refcount > 0);
+       if (--set->refcount == 0)
+               unlink_files = true;
+       SpinLockRelease(&set->mutex);
I'm a bit uncomfortable with releasing a refcount, and then still using
the memory from the set... I don't think there's a concrete danger
here as the code stands, but it's a fairly dangerous pattern.
Will fix.

I could fix that but I'd feel bad about doing more work while holding
the spinlock (even though it can't possibly be contended because we
are the last to detach). I have added a comment to explain that it's
safe to continue accessing the DSM segment while in this function
body.

On Mon, Mar 27, 2017 at 10:47 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:

Here is a new patch series responding to feedback from Peter and Andres:
+
+/* Per-participant shared state. */
+typedef struct SharedTuplestoreParticipant
+{
+       LWLock lock;
Hm. No padding (ala LWLockMinimallyPadded / LWLockPadded) - but that's
probably ok, for now.

I hunted around but didn't see an idiom for making this whole struct
cacheline-sized.

+       bool error;                                     /* Error occurred flag. */
+       bool eof;                                       /* End of file reached. */
+       int read_fileno;                        /* BufFile segment file number. */
+       off_t read_offset;                      /* Offset within segment file. */

Hm. I wonder if it'd not be better to work with 64bit offsets, and just
separate that out upon segment access.

This falls out of the current two-part BufFileTell and BufFileSeek
interface. Since translation could be done trivially
(single_address_space_offset = fileno * MAX_PHYSICAL_FILESIZE +
offset), that might be a reasonable refactoring, but it seems to be
material for a separate patch, considering that other client code
would be affected, no?

+/* The main data structure in shared memory. */

"main data structure" isn't particularly meaningful.

Fixed.

+struct SharedTuplestore
+{
+       int reading_partition;
+       int nparticipants;
+       int flags;
Maybe add a comment saying /* flag bits from SHARED_TUPLESTORE_* */?

Done.

+ Size meta_data_size;

What's this?

Comments added to every struct member.

+ SharedTuplestoreParticipant participants[FLEXIBLE_ARRAY_MEMBER];

I'd add a comment here, that there's further data after participants.

Done.

+};
+
+/* Per-participant backend-private state. */
+struct SharedTuplestoreAccessor
+{
Hm. The name and it being backend-local are a bit conflicting.

Hmm. It's a (SharedTupleStore) Accessor, not a Shared (...). Not
sure if we have an established convention for this kind of thing...

+       int participant;                        /* My partitipant number. */
+       SharedTuplestore *sts;          /* The shared state. */
+       int nfiles;                                     /* Size of local files array. */
+       BufFile **files;                        /* Files we have open locally for writing. */

Shouldn't this mention that it's indexed by partition?

Done.

+       BufFile *read_file;                     /* The current file to read from. */
+       int read_partition;                     /* The current partition to read from. */
+       int read_participant;           /* The current participant to read from. */
+       int read_fileno;                        /* BufFile segment file number. */
+       off_t read_offset;                      /* Offset within segment file. */
+};

+/*
+ * Initialize a SharedTuplestore in existing shared memory.  There must be
+ * space for sts_size(participants) bytes.  If flags is set to the value
+ * SHARED_TUPLESTORE_SINGLE_PASS then each partition may only be read once,
+ * because underlying files will be deleted.

Any reason not to use flags that are compatible with tuplestore.c?

tuplestore.c uses some executor.h flags like EXEC_FLAG_MARK.
sharedtuplestore.c's interface and capabilities are extremely
primitive and only really let it do exactly what I needed to do here.
Namely, every participant writes into its own set of partition files,
and then all together we perform a single "partial scan" in some
undefined order to get all the tuples back and share them out between
backends. Extending it to behave more like the real tuplestore may be
interesting for other projects (dynamic partitioning etc) but it
didn't seem like a good idea to speculate on what exactly would be
needed. This particular flag means 'please delete individual backing
files as we go after reading them', and I don't believe there is any
equivalent; someone thought the private HJ should do that so I figured
I should do it here too.

+ * Tuples that are stored may optionally carry a piece of fixed sized
+ * meta-data which will be retrieved along with the tuple.  This is useful for
+ * the hash codes used for multi-batch hash joins, but could have other
+ * applications.
+ */
+SharedTuplestoreAccessor *
+sts_initialize(SharedTuplestore *sts, int participants,
+                          int my_participant_number,
+                          Size meta_data_size,
+                          int flags,
+                          dsm_segment *segment)
+{

Not sure I like that the naming here has little in common with
tuplestore.h's api.

Hmm. I feel like its interface needs to be significantly different to
express the things it needs to do, especially at initialisation. As
for the tuple write/write interface, how would you improve this?

sts_puttuple(...);
sts_puttuple(...);
...
sts_end_write_all_partitions(...);

sts_prepare_partial_scan(...); /* in one backend only */
sts_begin_partial_scan(...);
... = sts_gettuple(...);
... = sts_gettuple(...);
...
sts_end_partial_scan(...);

One thought that I keep having: the private hash join code should also
use tuplestore. But a smarter tuplestore that knows how to hold onto
the hash value (the meta-data in my sharedtuplestore.c) and knows
about partitions (batches). It would be nice if the private and
shared batching code finished up harmonised in this respect.

+
+MinimalTuple
+sts_gettuple(SharedTuplestoreAccessor *accessor, void *meta_data)
+{

This needs docs.

Done.

+       SharedBufFileSet *fileset = GetSharedBufFileSet(accessor->sts);
+       MinimalTuple tuple = NULL;
+
+       for (;;)
+       {

...
+               /* Check if this participant's file has already been entirely read. */
+               if (participant->eof)
+               {
+                       BufFileClose(accessor->read_file);
+                       accessor->read_file = NULL;
+                       LWLockRelease(&participant->lock);
+                       continue;

Why are we closing the file while holding the lock?

Fixed.

+
+               /* Read the optional meta-data. */
+               eof = false;
+               if (accessor->sts->meta_data_size > 0)
+               {
+                       nread = BufFileRead(accessor->read_file, meta_data,
+                                                               accessor->sts->meta_data_size);
+                       if (nread == 0)
+                               eof = true;
+                       else if (nread != accessor->sts->meta_data_size)
+                               ereport(ERROR,
+                                               (errcode_for_file_access(),
+                                                errmsg("could not read from temporary file: %m")));
+               }
+
+               /* Read the size. */
+               if (!eof)
+               {
+                       nread = BufFileRead(accessor->read_file, &tuple_size, sizeof(tuple_size));
+                       if (nread == 0)
+                               eof = true;

Why is it legal to have EOF here, if metadata previously didn't have an
EOF? Perhaps add an error if accessor->sts->meta_data_size != 0?

Improved comments.

+               if (eof)
+               {
+                       participant->eof = true;
+                       if ((accessor->sts->flags & SHARED_TUPLESTORE_SINGLE_PASS) != 0)
+                               SharedBufFileDestroy(fileset, accessor->read_partition,
+                                                                        accessor->read_participant);
+
+                       participant->error = false;
+                       LWLockRelease(&participant->lock);
+
+                       /* Move to next participant's file. */
+                       BufFileClose(accessor->read_file);
+                       accessor->read_file = NULL;
+                       continue;
+               }
+
+               /* Read the tuple. */
+               tuple = (MinimalTuple) palloc(tuple_size);
+               tuple->t_len = tuple_size;

Done.

On Tue, Mar 28, 2017 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:

Here is a new patch series responding to feedback from Peter and Andres:

Here's a review of 0007 & 0010 together - they're going to have to be
applied together anyway...

I have now merged them FWIW.

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ac339fb566..775c9126c7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3814,6 +3814,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>

+     <varlistentry id="guc-cpu-shared-tuple-cost" xreflabel="cpu_shared_tuple_cost">
+      <term><varname>cpu_shared_tuple_cost</varname> (<type>floating point</type>)
+      <indexterm>
+       <primary><varname>cpu_shared_tuple_cost</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the planner's estimate of the cost of sharing rows in
+        memory during a parallel query.
+        The default is 0.001.
+       </para>
+      </listitem>
+     </varlistentry>
+

Isn't that really low in comparison to the other costs? I think
specifying a bit more what this actually measures would be good too - is
it putting the tuple in shared memory? Is it accessing it?

Yeah. It was really just to make the earlier Shared Hash consistently
more expensive than private Hash, by a tiny amount. Then it wouldn't
kick in until it could help you avoid batching.

I will try to come up with some kind of argument based on data...

+     <varlistentry id="guc-cpu-synchronization-cost" xreflabel="cpu_synchronization_cost">
+      <term><varname>cpu_synchronization_cost</varname> (<type>floating point</type>)
+      <indexterm>
+       <primary><varname>cpu_synchronization_cost</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the planner's estimate of the cost of waiting at synchronization
+        points for other processes while executing parallel queries.
+        The default is 1.0.
+       </para>
+      </listitem>
+     </varlistentry>

Isn't this also really cheap in comparison to a, probably cached, seq
page read?

It's not really the synchronisation primitive itself, which is fast,
it's how long the other guys may spend doing other stuff before they
reach the barrier. Currently we have a block granularity parallel
query system, so really this is an estimation of how long the average
participant will have to wait for the last of its peers to finish
chewing on up to one page of tuples from its (ultimate) source of
parallelism. Yeah I'm waffling a bit because I don't have a
principled answer to this question yet...

+       if (HashJoinTableIsShared(hashtable))
+       {
+               /*
+                * Synchronize parallel hash table builds.  At this stage we know that
+                * the shared hash table has been created, but we don't know if our
+                * peers are still in MultiExecHash and if so how far through.  We use
+                * the phase to synchronize with them.
+                */
+               barrier = &hashtable->shared->barrier;
+
+               switch (BarrierPhase(barrier))
+               {
+               case PHJ_PHASE_BEGINNING:

Note pgindent will indent this further. Might be worthwhile to try to
pgindent the file, revert some of the unintended damage.

Fixed switch statement indentation. I will try pgindent soon and see
how badly it all breaks.

/*
* set expression context
*/

I'd still like this to be moved to the start.

Done.

@@ -126,17 +202,79 @@ MultiExecHash(HashState *node)
/* Not subject to skew optimization, so insert normally */
ExecHashTableInsert(hashtable, slot, hashvalue);
}
-                       hashtable->totalTuples += 1;
+                       hashtable->partialTuples += 1;
+                       if (!HashJoinTableIsShared(hashtable))
+                               hashtable->totalTuples += 1;
}
}

FWIW, I'd put HashJoinTableIsShared() into a local var - the compiler
won't be able to do that on its own because external function calls
could invalidate the result.

Done in in the hot loops.

That brings me to a related topic: Have you measured whether your
changes cause performance differences?

I have never succeeded in measuring any reproducible difference
between master with 0 workers and my patch with the 0 workers on
various contrived queries and TPCH queries (except the ones where my
patch makes certain outer joins faster for known reasons). I suspect
it just spends to much time ping ponging in and out of the node for
each tuple for tiny differences in coding to show up. But I could be
testing for the wrong things...

+ finish_loading(hashtable);

I find the sudden switch to a different naming scheme in the same file a
bit jarring.

Ok. I have now changed all of the static functions in nodeHash.c from
foo_bar to ExecHashFooBar.

+       if (HashJoinTableIsShared(hashtable))
+               BarrierDetach(&hashtable->shared->shrink_barrier);
+
+       if (HashJoinTableIsShared(hashtable))
+       {

Consecutive if blocks with the same condition...

Fixed.

+               bool elected_to_resize;
+
+               /*
+                * Wait for all backends to finish building.  If only one worker is
+                * running the building phase because of a non-partial inner plan, the
+                * other workers will pile up here waiting.  If multiple worker are
+                * building, they should finish close to each other in time.
+                */

That comment is outdated, isn't it?

Yes, fixed.

/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-       if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-               ExecHashIncreaseNumBuckets(hashtable);
+       ExecHashUpdate(hashtable);
+       ExecHashIncreaseNumBuckets(hashtable);

So this now doesn't actually increase the number of buckets anymore.

Well that function always returned if found there were already enough
buckets, so either the test at call site or in the function was
redundant. I have renamed it to ExecHashIncreaseNumBucketsIfNeeded()
to make that clearer.

+ reinsert:
+       /* If the table was resized, insert tuples into the new buckets. */
+       ExecHashUpdate(hashtable);
+       ExecHashReinsertAll(hashtable);
ReinsertAll just happens to do nothing if we didn't have to
resize... Not entirely obvious, sure reads as if it were unconditional.
Also, it's not actually "All" when batching is in use, no?

Renamed to ExecHashReinsertHashtableIfNeeded.

+ post_resize:
+       if (HashJoinTableIsShared(hashtable))
+       {
+               Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+               BarrierWait(barrier, WAIT_EVENT_HASH_RESIZING);
+               Assert(BarrierPhase(barrier) == PHJ_PHASE_REINSERTING);
+       }
+
+ reinsert:
+       /* If the table was resized, insert tuples into the new buckets. */
+       ExecHashUpdate(hashtable);
+       ExecHashReinsertAll(hashtable);

Hm. So even non-resizing backends reach this - but they happen to not
do anything because there's no work queued up, right? That's, uh, not
obvious.

Added comments to that effect.

For me the code here would be a good bit easier to read if we had a
MultiExecHash and MultiExecParallelHash. Half of MultiExecHash is just
if(IsShared) blocks, and copying would avoid potential slowdowns.

Hmm. Yeah I have struggled with this question in several places. For
example I have ExecHashLoadPrivateTuple and ExecHashLoadSharedTuple
because the intertwangled version was unbearable. But in
MultiExecHash's case, I feel there is some value in showing that the
basic hash build steps are the same. The core loop, where the main
action really happens, is unchanged.

+               /*
+                * Set up for skew optimization, if possible and there's a need for
+                * more than one batch.  (In a one-batch join, there's no point in
+                * it.)
+                */
+               if (nbatch > 1)
+                       ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);

So there's no equivalent to the skew optimization for parallel query
yet... It doesn't sound like that should be particulalry hard on first
blush?

Making the skew table shared, setting up buckets for MVCs, build and
probing it is easy. It's work_mem exhaustion and shrinking and
related jiggery pokery that'll be tricky, but I'll shortly be looking
at that with vigour and vim. That there may be one or two empty
relation optimisations that I haven't got yet because they involve a
bit of extra communication.

static void
-ExecHashIncreaseNumBatches(HashJoinTable hashtable)
+ExecHashIncreaseNumBatches(HashJoinTable hashtable, int nbatch)
So this doesn't actually increase the number of batches anymore... At
the very least this should mention that the main work is done in
ExecHashShrink.

Yeah. Done.

+/*
+ * Process the queue of chunks whose tuples need to be redistributed into the
+ * correct batches until it is empty.  In the best case this will shrink the
+ * hash table, keeping about half of the tuples in memory and sending the rest
+ * to a future batch.
+ */
+static void
+ExecHashShrink(HashJoinTable hashtable)

Should mention this really only is meaningful after
ExecHashIncreaseNumBatches has run.

Updated.

+{
+       long            ninmemory;
+       long            nfreed;
+       dsa_pointer chunk_shared;
+       HashMemoryChunk chunk;

-       /* If know we need to resize nbuckets, we can do it while rebatching. */
-       if (hashtable->nbuckets_optimal != hashtable->nbuckets)
+       if (HashJoinTableIsShared(hashtable))
{
-               /* we never decrease the number of buckets */
-               Assert(hashtable->nbuckets_optimal > hashtable->nbuckets);
+               /*
+                * Since a newly launched participant could arrive while shrinking is
+                * already underway, we need to be able to jump to the correct place
+                * in this function.
+                */
+               switch (PHJ_SHRINK_PHASE(BarrierPhase(&hashtable->shared->shrink_barrier)))
+               {
+               case PHJ_SHRINK_PHASE_BEGINNING: /* likely case */
+                       break;
+               case PHJ_SHRINK_PHASE_CLEARING:
+                       goto clearing;
+               case PHJ_SHRINK_PHASE_WORKING:
+                       goto working;
+               case PHJ_SHRINK_PHASE_DECIDING:
+                       goto deciding;
+               }

Hm, so we jump into different nesting levels here :/

I rewrote this without goto. Mea culpa.

ok, ENOTIME for today...

Thanks! Was enough to keep me busy for some time...

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index f2c885afbe..87d8f3766e 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -6,10 +6,78 @@
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- *
* IDENTIFICATION
*       src/backend/executor/nodeHashjoin.c
*
+ * NOTES:
+ *
+ * PARALLELISM
+ *
+ * Hash joins can participate in parallel queries in two ways: in
+ * non-parallel-aware mode, where each backend builds an identical hash table
+ * and then probes it with a partial outer relation, or parallel-aware mode
+ * where there is a shared hash table that all participants help to build.  A
+ * parallel-aware hash join can save time and space by dividing the work up
+ * and sharing the result, but has extra communication overheads.

There's a third, right? The hashjoin, and everything below it, could
also not be parallel, but above it could be some parallel aware node
(e.g. a parallel aware HJ).

Yeah that's the same thing: it's not aware of parallelism. Its outer
plan may be partial or not, and it doesn't even know. That's the
distinction I'm trying to make clear: actually doing something special
for parallelism. I've update the text slightly to say that the outer
plan may be partial or not in a hash join that is under Gather.

+ * In both cases, hash joins use a private state machine to track progress
+ * through the hash join algorithm.
That's not really parallel specific, right? Perhaps just say that
parallel HJs use the normal state machine?

Updated.

+ * In a parallel-aware hash join, there is also a shared 'phase' which
+ * co-operating backends use to synchronize their local state machine and
+ * program counter with the multi-process join.  The phase is managed by a
+ * 'barrier' IPC primitive.
Hm. I wonder if 'phase' shouldn't just be name
sharedHashJoinState. Might be a bit easier to understand than a
different terminology.

Hmm. Well it is a lot like a state machine but it might be more
confusing to have both local and shared 'state'. I think 'phases' of
parallel computation are quite intuitive. I'm rather attached to this
terminology...

+ * The phases are as follows:
+ *
+ *   PHJ_PHASE_BEGINNING   -- initial phase, before any participant acts
+ *   PHJ_PHASE_CREATING           -- one participant creates the shmem hash table
+ *   PHJ_PHASE_BUILDING           -- all participants build the hash table
+ *   PHJ_PHASE_RESIZING           -- one participant decides whether to expand buckets
+ *   PHJ_PHASE_REINSERTING -- all participants reinsert tuples if necessary
+ *   PHJ_PHASE_PROBING    -- all participants probe the hash table
+ *   PHJ_PHASE_UNMATCHED   -- all participants scan for unmatched tuples
I think somewhere here - and probably around the sites it's happening -
should mention that state transitions are done kinda implicitly via
BarrierWait progressing to the numerically next phase. That's not
entirely obvious (and actually limits what the barrier mechanism can be
used for...).

Yeah. Added comments.

On Wed, Mar 29, 2017 at 9:31 AM, Andres Freund <andres@anarazel.de> wrote:

-               ExecHashJoinSaveTuple(tuple,
-                                                         hashvalue,
-                                                         &hashtable->innerBatchFile[batchno]);
+               if (HashJoinTableIsShared(hashtable))
+                       sts_puttuple(hashtable->shared_inner_batches, batchno, &hashvalue,
+                                                tuple);
+               else
+                       ExecHashJoinSaveTuple(tuple,
+                                                                 hashvalue,
+                                                                 &hashtable->innerBatchFile[batchno]);
}
}

Why isn't this done inside of ExecHashJoinSaveTuple?

I had it that way earlier but the arguments got ugly. I suppose it
could take an SOMETHING_INNER/SOMETHING_OUTER enum and a partition
number.

I wonder if SharedTuplestore should be able to handle the private case too...

@@ -1280,6 +1785,68 @@ ExecHashTableReset(HashJoinTable hashtable)

+                       /* Rewind the shared read heads for this batch, inner and outer. */
+                       sts_prepare_parallel_read(hashtable->shared_inner_batches,
+                                                                         curbatch);
+                       sts_prepare_parallel_read(hashtable->shared_outer_batches,
+                                                                         curbatch);

It feels somewhat wrong to do this in here, rather than on the callsites.

The private hash table code does the moral equivalent directly below:
it uses BufFileSeek to rewind the current inner and outer batch to the
start.

+               }
+
+               /*
+                * Each participant needs to make sure that data it has written for
+                * this partition is now read-only and visible to other participants.
+                */
+               sts_end_write(hashtable->shared_inner_batches, curbatch);
+               sts_end_write(hashtable->shared_outer_batches, curbatch);
+
+               /*
+                * Wait again, so that all workers see the new hash table and can
+                * safely read from batch files from any participant because they have
+                * all ended writing.
+                */
+               Assert(BarrierPhase(&hashtable->shared->barrier) ==
+                          PHJ_PHASE_RESETTING_BATCH(curbatch));
+               BarrierWait(&hashtable->shared->barrier, WAIT_EVENT_HASH_RESETTING);
+               Assert(BarrierPhase(&hashtable->shared->barrier) ==
+                          PHJ_PHASE_LOADING_BATCH(curbatch));
+               ExecHashUpdate(hashtable);
+
+               /* Forget the current chunks. */
+               hashtable->current_chunk = NULL;
+               return;
+       }

/*
* Release all the hash buckets and tuples acquired in the prior pass, and
@@ -1289,10 +1856,10 @@ ExecHashTableReset(HashJoinTable hashtable)
oldcxt = MemoryContextSwitchTo(hashtable->batchCxt);

/* Reallocate and reinitialize the hash bucket headers. */
-       hashtable->buckets = (HashJoinTuple *)
-               palloc0(nbuckets * sizeof(HashJoinTuple));
+       hashtable->buckets = (HashJoinBucketHead *)
+               palloc0(nbuckets * sizeof(HashJoinBucketHead));

-       hashtable->spaceUsed = nbuckets * sizeof(HashJoinTuple);
+       hashtable->spaceUsed = nbuckets * sizeof(HashJoinBucketHead);

/* Cannot be more than our previous peak; we had this size before. */
Assert(hashtable->spaceUsed <= hashtable->spacePeak);
@@ -1301,6 +1868,22 @@ ExecHashTableReset(HashJoinTable hashtable)

/* Forget the chunks (the memory was freed by the context reset above). */
hashtable->chunks = NULL;
+
+       /* Rewind the shared read heads for this batch, inner and outer. */
+       if (hashtable->innerBatchFile[curbatch] != NULL)
+       {
+               if (BufFileSeek(hashtable->innerBatchFile[curbatch], 0, 0L, SEEK_SET))
+                       ereport(ERROR,
+                                       (errcode_for_file_access(),
+                                  errmsg("could not rewind hash-join temporary file: %m")));
+       }
+       if (hashtable->outerBatchFile[curbatch] != NULL)
+       {
+               if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
+                       ereport(ERROR,
+                                       (errcode_for_file_access(),
+                                  errmsg("could not rewind hash-join temporary file: %m")));
+       }
}

/*
@@ -1310,12 +1893,21 @@ ExecHashTableReset(HashJoinTable hashtable)
void
ExecHashTableResetMatchFlags(HashJoinTable hashtable)
{
+       dsa_pointer chunk_shared = InvalidDsaPointer;
HashMemoryChunk chunk;
HashJoinTuple tuple;
int                     i;

/* Reset all flags in the main table ... */
-       chunk = hashtable->chunks;
+       if (HashJoinTableIsShared(hashtable))
+       {
+               /* This only runs in the leader during rescan initialization. */
+               Assert(!IsParallelWorker());
+               hashtable->shared->chunk_work_queue = hashtable->shared->chunks;
+               chunk = pop_chunk_queue(hashtable, &chunk_shared);
+       }
+       else
+               chunk = hashtable->chunks;

Hm - doesn't pop_chunk_queue empty the work queue?

Well first it puts the main chunks onto the work queue, and then it
pops them off one by one clearing flags until there is nothing left on
the work queue. But this is only running in one backend. It's not
very exciting. Do you see a bug here?

+/*
+ * Load a tuple into shared dense storage, like 'load_private_tuple'.  This
+ * version is for shared hash tables.
+ */
+static HashJoinTuple
+load_shared_tuple(HashJoinTable hashtable, MinimalTuple tuple,
+                                 dsa_pointer *shared, bool respect_work_mem)
+{
Hm. Are there issues with "blessed" records being stored in shared
memory? I seem to recall you talking about it, but I see nothing
addressing the issue here? (later) Ah, I see - you just prohibit
paralleism in that case - might be worth pointing to.

Note added.

I had difficulty testing that. I couldn't create anonymous ROW(...)
values without the project moving above the hash table. Andrew Gierth
showed me a way to prevent that with OFFSET 0 but that disabled
parallelism. I tested that code by writing extra test code to dump
the output of tlist_references_transient_type() on the tlists of
various test paths not in a parallel query. Ideas welcome, as I feel
like this belongs in a regression test.

+       /* Check if some other participant has increased nbatch. */
+       if (hashtable->shared->nbatch > hashtable->nbatch)
+       {
+               Assert(respect_work_mem);
+               ExecHashIncreaseNumBatches(hashtable, hashtable->shared->nbatch);
+       }
+
+       /* Check if we need to help shrinking. */
+       if (hashtable->shared->shrink_needed && respect_work_mem)
+       {
+               hashtable->current_chunk = NULL;
+               LWLockRelease(&hashtable->shared->chunk_lock);
+               return NULL;
+       }
+
+       /* Oversized tuples get their own chunk. */
+       if (size > HASH_CHUNK_THRESHOLD)
+               chunk_size = size + HASH_CHUNK_HEADER_SIZE;
+       else
+               chunk_size = HASH_CHUNK_SIZE;
+
+       /* If appropriate, check if work_mem would be exceeded by a new chunk. */
+       if (respect_work_mem &&
+               hashtable->shared->grow_enabled &&
+               hashtable->shared->nbatch <= MAX_BATCHES_BEFORE_INCREASES_STOP &&
+               (hashtable->shared->size +
+                chunk_size) > (work_mem * 1024L *
+                                               hashtable->shared->planned_participants))
+       {
+               /*
+                * It would be exceeded.  Let's increase the number of batches, so we
+                * can try to shrink the hash table.
+                */
+               hashtable->shared->nbatch *= 2;
+               ExecHashIncreaseNumBatches(hashtable, hashtable->shared->nbatch);
+               hashtable->shared->chunk_work_queue = hashtable->shared->chunks;
+               hashtable->shared->chunks = InvalidDsaPointer;
+               hashtable->shared->shrink_needed = true;
+               hashtable->current_chunk = NULL;
+               LWLockRelease(&hashtable->shared->chunk_lock);
+
+               /* The caller needs to shrink the hash table. */
+               return NULL;
+       }

Hm - we could end up calling ExecHashIncreaseNumBatches twice here?
Probably harmless.

Yes. In the code higher up we could observe that someone else has
increased the number of batches: here we are just updating our local
hashtable->nbatch. Then further down we could decide that it needs to
be done again because we work out that this allocation will push us
over the work_mem limit. Really that function just *sets* the number
of batches. It's really the code beginning hashtable->shared->nbatch
*= 2 that is really increasing the number of batches and setting up
the state for all participants to shrink the hash table and free up
some memory.

/* ----------------------------------------------------------------
*             ExecHashJoin
@@ -129,6 +200,14 @@ ExecHashJoin(HashJoinState *node)
/* no chance to not build the hash table */
node->hj_FirstOuterTupleSlot = NULL;
}
+                               else if (hashNode->shared_table_data != NULL)
+                               {
+                                       /*
+                                        * The empty-outer optimization is not implemented for
+                                        * shared hash tables yet.
+                                        */
+                                       node->hj_FirstOuterTupleSlot = NULL;

Hm, why is this checking for the shared-ness of the join in a different
manner?

The usual manner is HashJoinTableIsShare(hashtable) but you see
Assert(hashtable == NULL) a few lines earlier; this is the
HJ_BUILD_HASHTABLE state where it hasn't been constructed yet. When
ExecHashTableCreate (a bit further down) constructs it it'll assign
hashtable->shared = state->shared_table_data (to point to a bit of DSM
memory). The reason the usual test is based on the HashJoinTable
pointer usually called 'hashtable' is because that is passed around
almost everywhere so it's convenient to use that.

+                                       if (HashJoinTableIsShared(hashtable))
+                                       {
+                                               /*
+                                                * An important optimization: if this is a
+                                                * single-batch join and not an outer join, there is
+                                                * no reason to synchronize again when we've finished
+                                                * probing.
+                                                */
+                                               Assert(BarrierPhase(&hashtable->shared->barrier) ==
+                                                          PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
+                                               if (hashtable->nbatch == 1 && !HJ_FILL_INNER(node))
+                                                       return NULL;    /* end of join */
+
+                                               /*
+                                                * Check if we are a leader that can't go further than
+                                                * probing the first batch, to avoid risk of deadlock
+                                                * against workers.
+                                                */
+                                               if (!LeaderGateCanContinue(&hashtable->shared->leader_gate))
+                                               {
+                                                       /*
+                                                        * Other backends will need to handle all future
+                                                        * batches written by me.  We don't detach until
+                                                        * after we've finished writing to all batches so
+                                                        * that they are flushed, otherwise another
+                                                        * participant might try to read them too soon.
+                                                        */
+                                                       sts_end_write_all_partitions(hashNode->shared_inner_batches);
+                                                       sts_end_write_all_partitions(hashNode->shared_outer_batches);
+                                                       BarrierDetach(&hashtable->shared->barrier);
+                                                       hashtable->detached_early = true;
+                                                       return NULL;
+                                               }
+
+                                               /*
+                                                * We can't start searching for unmatched tuples until
+                                                * all participants have finished probing, so we
+                                                * synchronize here.
+                                                */
+                                               Assert(BarrierPhase(&hashtable->shared->barrier) ==
+                                                          PHJ_PHASE_PROBING_BATCH(hashtable->curbatch));
+                                               if (BarrierWait(&hashtable->shared->barrier,
+                                                                               WAIT_EVENT_HASHJOIN_PROBING))
+                                               {
+                                                       /* Serial phase: prepare for unmatched. */
+                                                       if (HJ_FILL_INNER(node))
+                                                       {
+                                                               hashtable->shared->chunk_work_queue =
+                                                                       hashtable->shared->chunks;
+                                                               hashtable->shared->chunks = InvalidDsaPointer;
+                                                       }
+                                               }

Couldn't we skip that if this isn't an outer join? Not sure if the
complication would be worth it...

Yes, well we don't even get this far in the very common case of a
single batch inner join (see note above that about an "important
optimization"). If it's outer you need this, and if there are
multiple batches it hardly matters if you have to go through this
extra step. But you're right that there are a few missed
opportunities here and there.

+void
+ExecShutdownHashJoin(HashJoinState *node)
+{
+       /*
+        * By the time ExecEndHashJoin runs in a work, shared memory has been

s/work/worker/

Fixed.

+        * destroyed.  So this is our last chance to do any shared memory cleanup.
+        */
+       if (node->hj_HashTable)
+               ExecHashTableDetach(node->hj_HashTable);
+}

+           There is no extra charge
+        * for probing the hash table for outer path row, on the basis that
+        * read-only access to a shared hash table shouldn't be any more
+        * expensive.
+        */

Hm, that's debatable. !shared will mostly be on the local numa node,
shared probably not.

Agreed, NUMA surely changes the situation for probing. I wonder if it
deserves a separate GUC. I'm actually quite hesitant to try to model
things like that because it seems like a can of worms. I will try to
come up with some numbers backed up with data though. Watch this
space.

* Get hash table size that executor would use for inner relation.
*
+        * Shared hash tables are allowed to use the work_mem of all participants
+        * combined to make up for the fact that there is only one copy shared by
+        * all.

Hm. I don't quite understand that reasoning.

Our model for memory usage limits is that every instance of an
executor node is allowed to allocate up to work_mem. If I run a
parallel hash join in 9.6 with 3 workers and I have set work_mem to
10MB, then the system will attempt to stay under 10MB in each
participant, using up to 40MB across the 4 processes.

The goal of Parallel Shared Hash is to divide the work of building the
hash table up over the 4 backends, and combine the work_mem of the 4
backends to create a shared hash table. The total amount of memory
used is the same, but we make much better use of it. Make sense?

* XXX for the moment, always assume that skew optimization will be
* performed. As long as SKEW_WORK_MEM_PERCENT is small, it's not worth
* trying to determine that for sure.

If we don't do skew for parallelism, should we skip that bit?

I am looking into the skew optimisation. Will report back on that
soon, and also try to get some data relevant to costing.

--
Thomas Munro
http://www.enterprisedb.com

#73

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#72)

Re: WIP: [[Parallel] Shared] Hash

Hi Thomas,

On 2017-03-31 17:53:12 +1300, Thomas Munro wrote:

Thanks very much to Rafia for testing, and to Andres for his copious
review feedback. Here's a new version. Changes:

I've not looked at that aspect, but one thing I think would be good is
to first add patch that increases coverage of nodeHash[join].c to nearly
100%. There's currently significant bits of nodeHash.c that aren't
covered (skew optimization, large tuples).

https://coverage.postgresql.org/src/backend/executor/nodeHash.c.gcov.html
https://coverage.postgresql.org/src/backend/executor/nodeHashjoin.c.gcov.html

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#72)

Re: WIP: [[Parallel] Shared] Hash

Hi,

On 2017-03-31 17:53:12 +1300, Thomas Munro wrote:

Thanks very much to Rafia for testing, and to Andres for his copious
review feedback. Here's a new version. Changes:

I unfortunately think it's too late to get this into v10. There's still
heavy development going on, several pieces changed quite noticeably
since the start of the CF and there's still features missing. Hence I
think this unfortunately has to be pushed - as much as I'd have liked to
have this in 10.

Do you agree?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#74)

Re: WIP: [[Parallel] Shared] Hash

On Tue, Apr 4, 2017 at 9:11 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-03-31 17:53:12 +1300, Thomas Munro wrote:

Thanks very much to Rafia for testing, and to Andres for his copious
review feedback. Here's a new version. Changes:

I unfortunately think it's too late to get this into v10. There's still
heavy development going on, several pieces changed quite noticeably
since the start of the CF and there's still features missing. Hence I
think this unfortunately has to be pushed - as much as I'd have liked to
have this in 10.

Do you agree?

Agreed.

Thank you very much Andres, Ashutosh, Peter, Rafia and Robert for all
the review, testing and discussion so far.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Oleg Golovanov

rentech@mail.ru

almost 9 years ago

In reply to: Thomas Munro (#75)

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Hi.

I got errors of patching on CentOS 7:

bash-4.2$ grep Hunk *.log | grep FAILED
0005-hj-leader-gate-v11.patch.log:Hunk #1 FAILED at 14.
0010-hj-parallel-v11.patch.log:Hunk #2 FAILED at 2850.
0010-hj-parallel-v11.patch.log:Hunk #1 FAILED at 21.
0010-hj-parallel-v11.patch.log:Hunk #3 FAILED at 622.
0010-hj-parallel-v11.patch.log:Hunk #6 FAILED at 687.
0010-hj-parallel-v11.patch.log:Hunk #1 FAILED at 21.
0010-hj-parallel-v11.patch.log:Hunk #3 FAILED at 153. What is wrong? The sources were clean:

bash-4.2$ git status
# On branch master
nothing to commit, working directory clean

I was patching by the command:
patch -b -i ../.patches/parallel-shared-hash-v11/0001-hj-refactor-memory-accounting-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0001-hj-refactor-memory-accounting-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0002-hj-refactor-batch-increases-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0002-hj-refactor-batch-increases-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0003-hj-refactor-unmatched-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0003-hj-refactor-unmatched-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0004-hj-barrier-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0004-hj-barrier-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0005-hj-leader-gate-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0005-hj-leader-gate-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0006-hj-let-node-have-seg-in-worker-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0006-hj-let-node-have-seg-in-worker-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0007-hj-remove-buf-file-is-temp-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0007-hj-remove-buf-file-is-temp-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0008-hj-buf-file-set-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0008-hj-buf-file-set-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0009-hj-shared-tuplestore-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0009-hj-shared-tuplestore-v11.patch.log
patch -b -i ../.patches/parallel-shared-hash-v11/0010-hj-parallel-v11.patch -p1 --verbose > ../.patches/parallel-shared-hash-v11/0010-hj-parallel-v11.patch.log Best Regards,

Oleg Golovanov
Moscow, Russia

Show quoted text

Вторник, 4 апреля 2017, 0:28 +03:00 от Thomas Munro <thomas.munro@enterprisedb.com>:

On Tue, Apr 4, 2017 at 9:11 AM, Andres Freund < andres@anarazel.de > wrote:

Hi,

On 2017-03-31 17:53:12 +1300, Thomas Munro wrote:

Thanks very much to Rafia for testing, and to Andres for his copious
review feedback. Here's a new version. Changes:

I unfortunately think it's too late to get this into v10. There's still
heavy development going on, several pieces changed quite noticeably
since the start of the CF and there's still features missing. Hence I
think this unfortunately has to be pushed - as much as I'd have liked to
have this in 10.

Do you agree?

Agreed.

Thank you very much Andres, Ashutosh, Peter, Rafia and Robert for all
the review, testing and discussion so far.

--
Thomas Munro
http://www.enterprisedb.com

#77

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Oleg Golovanov (#76)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Apr 13, 2017 at 10:04 PM, Oleg Golovanov <rentech@mail.ru> wrote:

bash-4.2$ grep Hunk *.log | grep FAILED
0005-hj-leader-gate-v11.patch.log:Hunk #1 FAILED at 14.
0010-hj-parallel-v11.patch.log:Hunk #2 FAILED at 2850.
0010-hj-parallel-v11.patch.log:Hunk #1 FAILED at 21.
0010-hj-parallel-v11.patch.log:Hunk #3 FAILED at 622.
0010-hj-parallel-v11.patch.log:Hunk #6 FAILED at 687.
0010-hj-parallel-v11.patch.log:Hunk #1 FAILED at 21.
0010-hj-parallel-v11.patch.log:Hunk #3 FAILED at 153.

Hi Oleg

Thanks for looking at this. It conflicted with commit 9c7f5229. Here
is a rebased patch set.

This version also removes some code for dealing with transient record
types which didn't work out. I'm trying to deal with that problem
separately[1]/messages/by-id/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com and in a general way so that the parallel hash join
patch doesn't have to deal with it at all.

[1]: /messages/by-id/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

#78

Oleg Golovanov

rentech@mail.ru

over 8 years ago

In reply to: Thomas Munro (#77)

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Hi.

Thanks for rebased patch set v12. Currently I try to use this patch on my new test site and get following:

Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
|index bdf15621c83..e9db8880161 100644
|--- a/src/include/access/parallel.h
|+++ b/src/include/access/parallel.h
--------------------------
patching file src/include/access/parallel.h
Using Plan A...
Hunk #1 FAILED at 58.
1 out of 1 hunk FAILED -- saving rejects to file src/include/access/parallel.h.rej

Can you actualize your patch set? The error got from 0010-hj-parallel-v12.patch.

Best Regards,

Oleg Golovanov
Moscow, Russia

Show quoted text

Четверг, 13 апреля 2017, 13:49 +03:00 от Thomas Munro <thomas.munro@enterprisedb.com>:

On Thu, Apr 13, 2017 at 10:04 PM, Oleg Golovanov < rentech@mail.ru > wrote:

bash-4.2$ grep Hunk *.log | grep FAILED
0005-hj-leader-gate-v11.patch.log:Hunk #1 FAILED at 14.
0010-hj-parallel-v11.patch.log:Hunk #2 FAILED at 2850.
0010-hj-parallel-v11.patch.log:Hunk #1 FAILED at 21.
0010-hj-parallel-v11.patch.log:Hunk #3 FAILED at 622.
0010-hj-parallel-v11.patch.log:Hunk #6 FAILED at 687.
0010-hj-parallel-v11.patch.log:Hunk #1 FAILED at 21.
0010-hj-parallel-v11.patch.log:Hunk #3 FAILED at 153.

Hi Oleg

Thanks for looking at this. It conflicted with commit 9c7f5229. Here
is a rebased patch set.

This version also removes some code for dealing with transient record
types which didn't work out. I'm trying to deal with that problem
separately[1] and in a general way so that the parallel hash join
patch doesn't have to deal with it at all.

[1] /messages/by-id/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Oleg Golovanov (#78)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Apr 27, 2017 at 5:13 AM, Oleg Golovanov <rentech@mail.ru> wrote:

Can you actualize your patch set? The error got from
0010-hj-parallel-v12.patch.

I really should get around to setting up a cron job to tell me about
that. Here's a rebased version.

The things currently on my list for this patch are:

1. Implement the skew optimisation.
2. Consider Andres's suggestion of splitting MultiExecHash into two
functions, serial and parallel version, rather than having all those
conditional blocks in there.
3. Figure out whether the shared BufFile stuff I propose would work
well for Peter Geoghegan's parallel tuple sort patch, by trying it
(I've made a start, more soon).
4. Figure out how the costing model needs to be tweaked, probably
based on experimentation.

I'm taking a short break to work on other things right now but will
post a version with those changes soon.

--
Thomas Munro
http://www.enterprisedb.com

#80

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Thomas Munro (#79)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Thu, Apr 27, 2017 at 11:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Apr 27, 2017 at 5:13 AM, Oleg Golovanov <rentech@mail.ru> wrote:

Can you actualize your patch set? The error got from
0010-hj-parallel-v12.patch.

I really should get around to setting up a cron job to tell me about
that. Here's a rebased version.

Rebased.

--
Thomas Munro
http://www.enterprisedb.com

#81

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Thomas Munro (#80)

1 attachment(s)

Re: WIP: [[Parallel] Shared] Hash

On Mon, May 22, 2017 at 6:39 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Apr 27, 2017 at 11:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Apr 27, 2017 at 5:13 AM, Oleg Golovanov <rentech@mail.ru> wrote:

Can you actualize your patch set? The error got from
0010-hj-parallel-v12.patch.

I really should get around to setting up a cron job to tell me about
that. Here's a rebased version.

Rebased.

Rebased for the recent re-indent and shm_toc API change; no functional
changes in this version.

(I have a new patch set in the pipeline adding the skew optimisation
and some other things, more on that soon.)

--
Thomas Munro
http://www.enterprisedb.com

#82

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Thomas Munro (#81)

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

On Wed, Jun 28, 2017 at 9:58 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Rebased for the recent re-indent and shm_toc API change; no functional
changes in this version.

(I have a new patch set in the pipeline adding the skew optimisation
and some other things, more on that soon.)

This patch does not apply. And the thread has stalled for three months
now but I cannot see a review for what has been submitted. I am moving
it to next CF with waiting on author. Please provide a rebased
version. If there are other threads on this topic, it would be nice to
link them to the existing CF entry
https://commitfest.postgresql.org/15/871/..
--
Michael

#83

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Michael Paquier (#82)

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

On Thu, Nov 30, 2017 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

This patch does not apply. And the thread has stalled for three months
now but I cannot see a review for what has been submitted. I am moving
it to next CF with waiting on author. Please provide a rebased
version. If there are other threads on this topic, it would be nice to
link them to the existing CF entry
https://commitfest.postgresql.org/15/871/..

Thanks. There is in fact a second thread with updated patches and
current discussion, and it is listed in the CF entry. It would be
nice if the CF could show more clearly which thread is 'active' (for
the benefit of humans and also robots), and list any others as
archived/old/history.

--
Thomas Munro
http://www.enterprisedb.com