[POC] Faster processing at Gather node

Started by Rafia Sabihover 8 years ago64 messages
#1Rafia Sabih
rafia.sabih@enterprisedb.com
1 attachment(s)

Hello everybody,

While analysing the performance of TPC-H queries for the newly developed
parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
that the time taken by gather node is significant. On investigation, as per
the current method it copies each tuple to the shared queue and notifies
the receiver. Since, this copying is done in shared queue, a lot of locking
and latching overhead is there.

So, in this POC patch I tried to copy all the tuples in a local queue thus
avoiding all the locks and latches. Once, the local queue is filled as per
it's capacity, tuples are transferred to the shared queue. Once, all the
tuples are transferred the receiver is sent the notification about the same.

With this patch I could see significant improvement in performance for
simple queries,

head:
explain analyse select * from t where i < 30000000;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..83225.55 rows=29676454 width=19) (actual
time=1.379..35871.235 rows=29999999 loops=1)
Workers Planned: 64
Workers Launched: 64
-> Parallel Seq Scan on t (cost=0.00..83225.55 rows=463695 width=19)
(actual time=0.125..1415.521 rows=461538 loops=65)
Filter: (i < 30000000)
Rows Removed by Filter: 1076923
Planning time: 0.180 ms
Execution time: 38503.478 ms
(8 rows)

patch:
explain analyse select * from t where i < 30000000;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..83225.55 rows=29676454 width=19) (actual
time=0.980..24499.427 rows=29999999 loops=1)
Workers Planned: 64
Workers Launched: 64
-> Parallel Seq Scan on t (cost=0.00..83225.55 rows=463695 width=19)
(actual time=0.088..968.406 rows=461538 loops=65)
Filter: (i < 30000000)
Rows Removed by Filter: 1076923
Planning time: 0.158 ms
Execution time: 27331.849 ms
(8 rows)

head:
explain analyse select * from t where i < 40000000;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..83225.55 rows=39501511 width=19) (actual
time=0.890..38438.753 rows=39999999 loops=1)
Workers Planned: 64
Workers Launched: 64
-> Parallel Seq Scan on t (cost=0.00..83225.55 rows=617211 width=19)
(actual time=0.074..1235.180 rows=615385 loops=65)
Filter: (i < 40000000)
Rows Removed by Filter: 923077
Planning time: 0.113 ms
Execution time: 41609.855 ms
(8 rows)

patch:
explain analyse select * from t where i < 40000000;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..83225.55 rows=39501511 width=19) (actual
time=1.085..31806.671 rows=39999999 loops=1)
Workers Planned: 64
Workers Launched: 64
-> Parallel Seq Scan on t (cost=0.00..83225.55 rows=617211 width=19)
(actual time=0.083..954.342 rows=615385 loops=65)
Filter: (i < 40000000)
Rows Removed by Filter: 923077
Planning time: 0.151 ms
Execution time: 35341.429 ms
(8 rows)

head:
explain analyse select * from t where i < 45000000;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..102756.80 rows=44584013 width=19) (actual
time=0.563..49156.252 rows=44999999 loops=1)
Workers Planned: 32
Workers Launched: 32
-> Parallel Seq Scan on t (cost=0.00..102756.80 rows=1393250 width=19)
(actual time=0.069..1905.436 rows=1363636 loops=33)
Filter: (i < 45000000)
Rows Removed by Filter: 1666667
Planning time: 0.106 ms
Execution time: 52722.476 ms
(8 rows)

patch:
explain analyse select * from t where i < 45000000;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..102756.80 rows=44584013 width=19) (actual
time=0.545..37501.200 rows=44999999 loops=1)
Workers Planned: 32
Workers Launched: 32
-> Parallel Seq Scan on t (cost=0.00..102756.80 rows=1393250 width=19)
(actual time=0.068..2165.430 rows=1363636 loops=33)
Filter: (i < 45000000)
Rows Removed by Filter: 1666667
Planning time: 0.087 ms
Execution time: 41458.969 ms
(8 rows)

The improvement in performance is most when the selectivity is around
20-30%, in which case currently parallelism is not selected.

I am testing the performance impact of this on TPC-H queries, in the
meantime would appreciate some feedback on the design, etc.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

faster_gather.patchapplication/octet-stream; name=faster_gather.patchDownload
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 8d7e711b3b..e3405b255d 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -59,6 +59,7 @@
  */
 #define TUPLE_QUEUE_MODE_CONTROL	'c' /* mode-switch message contents */
 #define TUPLE_QUEUE_MODE_DATA		'd'
+#define LOCAL_TUPLE_QUEUE_SIZE		32768
 
 /*
  * Both the sender and receiver build trees of TupleRemapInfo nodes to help
@@ -145,6 +146,10 @@ typedef struct TQueueDestReceiver
 	char		mode;			/* current message mode */
 	TupleDesc	tupledesc;		/* current top-level tuple descriptor */
 	TupleRemapInfo **field_remapinfo;	/* current top-level remap info */
+	char *iovec;
+	int         length;
+	int         count;
+
 } TQueueDestReceiver;
 
 /*
@@ -213,6 +218,7 @@ static TupleRemapInfo *BuildRangeRemapInfo(Oid rngtypid,
 					MemoryContext mycontext);
 static TupleRemapInfo **BuildFieldRemapInfo(TupleDesc tupledesc,
 					MemoryContext mycontext);
+static void empty_tqueue(TQueueDestReceiver *tqueue);
 
 
 /*
@@ -304,10 +310,53 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
 		}
 	}
 
-	/* Send the tuple itself. */
+	/* Store tuples in the local queue. */
 	tuple = ExecMaterializeSlot(slot);
-	result = shm_mq_send(tqueue->queue, tuple->t_len, tuple->t_data, false);
-
+	if(TupIsNull(slot))
+		empty_tqueue(tqueue);
+	else
+	{
+		if (tqueue->length + tuple->t_len < LOCAL_TUPLE_QUEUE_SIZE)
+		{
+			shm_mq_iovec *iov = tqueue->iovec + tqueue->length;
+			iov->len = tuple->t_len;
+			tqueue->length += sizeof (shm_mq_iovec);
+			iov->data = tqueue->iovec + tqueue->length;
+
+			/*store the tuple */
+			memcpy(iov->data, tuple->t_data, tuple->t_len);
+			tqueue->length += tuple->t_len;
+			tqueue->count++;
+			return true;
+		}
+		/* once local tuple queue is full, pass them to the shared queue */
+		else
+		{
+			int byte = 0;
+			tqueue->length = 0;
+			while(tqueue->count-- > 0)
+			{
+				shm_mq_iovec *iov = tqueue->iovec + byte;
+				/* notify the receiver only when all the tuples are sent to share queue */
+				if (tqueue->count == 0)
+					result = local_mq_send(tqueue->queue, iov->len, iov->data, false, true);
+				else
+					result = local_mq_send(tqueue->queue, iov->len, iov->data, false, false);
+				byte += sizeof(shm_mq_iovec) + iov->len;
+			}
+			tqueue->count = 0;
+			shm_mq_iovec *iov = tqueue->iovec + tqueue->length;
+			iov->len = tuple->t_len;
+			tqueue->length += sizeof (shm_mq_iovec);
+			iov->data = tqueue->iovec + tqueue->length;
+
+			/*store the tuple */
+			memcpy(iov->data, tuple->t_data, tuple->t_len);
+			tqueue->length += tuple->t_len;
+			tqueue->count++;
+			return true;
+		}
+	}
 	/* Check for failure. */
 	if (result == SHM_MQ_DETACHED)
 		return false;
@@ -318,6 +367,22 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
 
 	return true;
 }
+/* Empty the slot, if there is any content left in it */
+static void
+empty_tqueue(TQueueDestReceiver *tqueue)
+{
+		int byte = 0;
+		shm_mq_result result;
+		while(tqueue->count-- > 0)
+		{
+			shm_mq_iovec *iov = tqueue->iovec + byte;
+			if (tqueue->count == 0)
+				local_mq_send(tqueue->queue, iov->len, iov->data, false, true);
+			else
+				local_mq_send(tqueue->queue, iov->len, iov->data, false, false);
+			byte += sizeof(shm_mq_iovec) + iov->len;
+		}
+}
 
 /*
  * Examine the given datum and send any necessary control messages for
@@ -577,7 +642,7 @@ static void
 tqueueShutdownReceiver(DestReceiver *self)
 {
 	TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
-
+	empty_tqueue(tqueue);
 	shm_mq_detach(shm_mq_get_queue(tqueue->queue));
 }
 
@@ -622,6 +687,8 @@ CreateTupleQueueDestReceiver(shm_mq_handle *handle)
 	/* Top-level tupledesc is not known yet */
 	self->tupledesc = NULL;
 	self->field_remapinfo = NULL;
+	self->iovec = palloc0(LOCAL_TUPLE_QUEUE_SIZE);
+	self->length = 0;
 
 	return (DestReceiver *) self;
 }
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index f5bf807cd6..3e5f17756e 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -141,6 +141,8 @@ struct shm_mq_handle
 
 static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
 				  const void *data, bool nowait, Size *bytes_written);
+static shm_mq_result local_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
+				  const void *data, bool nowait, Size *bytes_written);
 static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
 					 bool nowait, Size *nbytesp, void **datap);
 static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
@@ -327,6 +329,16 @@ shm_mq_send(shm_mq_handle *mqh, Size nbytes, const void *data, bool nowait)
 
 	return shm_mq_sendv(mqh, &iov, 1, nowait);
 }
+shm_mq_result
+local_mq_send(shm_mq_handle *mqh, Size nbytes, const void *data, bool nowait, bool notify)
+{
+	shm_mq_iovec iov;
+
+	iov.data = data;
+	iov.len = nbytes;
+
+	return local_mq_sendv(mqh, &iov, 1, nowait, notify);
+}
 
 /*
  * Write a message into a shared message queue, gathered from multiple
@@ -491,6 +503,158 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	/* Notify receiver of the newly-written data, and return. */
 	return shm_mq_notify_receiver(mq);
 }
+shm_mq_result
+local_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait, bool notify)
+{
+	shm_mq_result res;
+	shm_mq	   *mq = mqh->mqh_queue;
+	Size		nbytes = 0;
+	Size		bytes_written;
+	int			i;
+	int			which_iov = 0;
+	Size		offset;
+
+	Assert(mq->mq_sender == MyProc);
+
+	/* Compute total size of write. */
+	for (i = 0; i < iovcnt; ++i)
+		nbytes += iov[i].len;
+
+	/* Try to write, or finish writing, the length word into the buffer. */
+	while (!mqh->mqh_length_word_complete)
+	{
+		Assert(mqh->mqh_partial_bytes < sizeof(Size));
+		res = local_mq_send_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
+								((char *) &nbytes) +mqh->mqh_partial_bytes,
+								nowait, &bytes_written);
+
+		if (res == SHM_MQ_DETACHED)
+		{
+			/* Reset state in case caller tries to send another message. */
+			mqh->mqh_partial_bytes = 0;
+			mqh->mqh_length_word_complete = false;
+			return res;
+		}
+		mqh->mqh_partial_bytes += bytes_written;
+
+		if (mqh->mqh_partial_bytes >= sizeof(Size))
+		{
+			Assert(mqh->mqh_partial_bytes == sizeof(Size));
+
+			mqh->mqh_partial_bytes = 0;
+			mqh->mqh_length_word_complete = true;
+		}
+
+		if (res != SHM_MQ_SUCCESS)
+			return res;
+
+		/* Length word can't be split unless bigger than required alignment. */
+		Assert(mqh->mqh_length_word_complete || sizeof(Size) > MAXIMUM_ALIGNOF);
+	}
+
+	/* Write the actual data bytes into the buffer. */
+	Assert(mqh->mqh_partial_bytes <= nbytes);
+	offset = mqh->mqh_partial_bytes;
+	do
+	{
+		Size		chunksize;
+
+		/* Figure out which bytes need to be sent next. */
+		if (offset >= iov[which_iov].len)
+		{
+			offset -= iov[which_iov].len;
+			++which_iov;
+			if (which_iov >= iovcnt)
+				break;
+			continue;
+		}
+
+		/*
+		 * We want to avoid copying the data if at all possible, but every
+		 * chunk of bytes we write into the queue has to be MAXALIGN'd, except
+		 * the last.  Thus, if a chunk other than the last one ends on a
+		 * non-MAXALIGN'd boundary, we have to combine the tail end of its
+		 * data with data from one or more following chunks until we either
+		 * reach the last chunk or accumulate a number of bytes which is
+		 * MAXALIGN'd.
+		 */
+		if (which_iov + 1 < iovcnt &&
+			offset + MAXIMUM_ALIGNOF > iov[which_iov].len)
+		{
+			char		tmpbuf[MAXIMUM_ALIGNOF];
+			int			j = 0;
+
+			for (;;)
+			{
+				if (offset < iov[which_iov].len)
+				{
+					tmpbuf[j] = iov[which_iov].data[offset];
+					j++;
+					offset++;
+					if (j == MAXIMUM_ALIGNOF)
+						break;
+				}
+				else
+				{
+					offset -= iov[which_iov].len;
+					which_iov++;
+					if (which_iov >= iovcnt)
+						break;
+				}
+			}
+
+			res = local_mq_send_bytes(mqh, j, tmpbuf, nowait, &bytes_written);
+
+			if (res == SHM_MQ_DETACHED)
+			{
+				/* Reset state in case caller tries to send another message. */
+				mqh->mqh_partial_bytes = 0;
+				mqh->mqh_length_word_complete = false;
+				return res;
+			}
+
+			mqh->mqh_partial_bytes += bytes_written;
+			if (res != SHM_MQ_SUCCESS)
+				return res;
+			continue;
+		}
+
+		/*
+		 * If this is the last chunk, we can write all the data, even if it
+		 * isn't a multiple of MAXIMUM_ALIGNOF.  Otherwise, we need to
+		 * MAXALIGN_DOWN the write size.
+		 */
+		chunksize = iov[which_iov].len - offset;
+		if (which_iov + 1 < iovcnt)
+			chunksize = MAXALIGN_DOWN(chunksize);
+		res = local_mq_send_bytes(mqh, chunksize, &iov[which_iov].data[offset],
+								nowait, &bytes_written);
+
+		if (res == SHM_MQ_DETACHED)
+		{
+			/* Reset state in case caller tries to send another message. */
+			mqh->mqh_length_word_complete = false;
+			mqh->mqh_partial_bytes = 0;
+			return res;
+		}
+
+		mqh->mqh_partial_bytes += bytes_written;
+		offset += bytes_written;
+		if (res != SHM_MQ_SUCCESS)
+			return res;
+	} while (mqh->mqh_partial_bytes < nbytes);
+
+	/* Reset for next message. */
+	mqh->mqh_partial_bytes = 0;
+	mqh->mqh_length_word_complete = false;
+
+	/* Notify receiver of the newly-written data, and return. */
+	if(notify)
+	{
+		return shm_mq_notify_receiver(mq);
+	}
+	else return res;
+}
 
 /*
  * Receive a message from a shared message queue.
@@ -933,6 +1097,125 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 	*bytes_written = sent;
 	return SHM_MQ_SUCCESS;
 }
+static shm_mq_result
+local_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
+				  bool nowait, Size *bytes_written)
+{
+	shm_mq	   *mq = mqh->mqh_queue;
+	Size		sent = 0;
+	uint64		used;
+	Size		ringsize = mq->mq_ring_size;
+	Size		available;
+
+	while (sent < nbytes)
+	{
+		bool		detached = false;
+		uint64		rb;
+
+		/* Compute number of ring buffer bytes used and available. */
+		rb = mq->mq_bytes_read;
+		Assert(mq->mq_bytes_written >= rb);
+		used = mq->mq_bytes_written - rb;
+		Assert(used <= ringsize);
+		available = Min(ringsize - used, nbytes - sent);
+
+		if (available == 0 && !mqh->mqh_counterparty_attached)
+		{
+			/*
+			 * The queue is full, so if the receiver isn't yet known to be
+			 * attached, we must wait for that to happen.
+			 */
+			if (nowait)
+			{
+				if (shm_mq_counterparty_gone(mq, mqh->mqh_handle))
+				{
+					*bytes_written = sent;
+					return SHM_MQ_DETACHED;
+				}
+				if (shm_mq_get_receiver(mq) == NULL)
+				{
+					*bytes_written = sent;
+					return SHM_MQ_WOULD_BLOCK;
+				}
+			}
+			else if (!shm_mq_wait_internal(mq, &mq->mq_receiver,
+										   mqh->mqh_handle))
+			{
+				mq->mq_detached = true;
+				*bytes_written = sent;
+				return SHM_MQ_DETACHED;
+			}
+
+			mqh->mqh_counterparty_attached = true;
+
+			/*
+			 * The receiver may have read some data after attaching, so we
+			 * must not wait without rechecking the queue state.
+			 */
+		}
+		else if (available == 0)
+		{
+			shm_mq_result res;
+			/* Let the receiver know that we need them to read some data. */
+			res = shm_mq_notify_receiver(mq);
+
+			if (res != SHM_MQ_SUCCESS)
+			{
+				*bytes_written = sent;
+				return res;
+			}
+
+			/* Skip manipulation of our latch if nowait = true. */
+			if (nowait)
+			{
+				*bytes_written = sent;
+				return SHM_MQ_WOULD_BLOCK;
+			}
+
+			/*
+			 * Wait for our latch to be set.  It might already be set for some
+			 * unrelated reason, but that'll just result in one extra trip
+			 * through the loop.  It's worth it to avoid resetting the latch
+			 * at top of loop, because setting an already-set latch is much
+			 * cheaper than setting one that has been reset.
+			 */
+			WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_EVENT_MQ_SEND);
+
+			/* Reset the latch so we don't spin. */
+			ResetLatch(MyLatch);
+
+			/* An interrupt may have occurred while we were waiting. */
+			CHECK_FOR_INTERRUPTS();
+		}
+		else
+		{
+			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
+			Size		sendnow = Min(available, ringsize - offset);
+
+			/* Write as much data as we can via a single memcpy(). */
+			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
+				   (char *) data + sent, sendnow);
+			sent += sendnow;
+
+			/*
+			 * Update count of bytes written, with alignment padding.  Note
+			 * that this will never actually insert any padding except at the
+			 * end of a run of bytes, because the buffer size is a multiple of
+			 * MAXIMUM_ALIGNOF, and each read is as well.
+			 */
+			Assert(sent == nbytes || sendnow == MAXALIGN(sendnow));
+			mq->mq_bytes_written += MAXALIGN(sendnow);
+
+			/*
+			 * For efficiency, we don't set the reader's latch here.  We'll do
+			 * that only when the buffer fills up or after writing an entire
+			 * message.
+			 */
+		}
+	}
+	*bytes_written = sent;
+	return SHM_MQ_SUCCESS;
+}
 
 /*
  * Wait until at least *nbytesp bytes are available to be read from the
diff --git a/src/include/storage/shm_mq.h b/src/include/storage/shm_mq.h
index 7a37535ab3..74bb681717 100644
--- a/src/include/storage/shm_mq.h
+++ b/src/include/storage/shm_mq.h
@@ -76,6 +76,11 @@ extern shm_mq_result shm_mq_sendv(shm_mq_handle *mqh,
 extern shm_mq_result shm_mq_receive(shm_mq_handle *mqh,
 			   Size *nbytesp, void **datap, bool nowait);
 
+extern shm_mq_result local_mq_send(shm_mq_handle *mqh,
+			Size nbytes, const void *data, bool nowait, bool notify);
+extern shm_mq_result local_mq_sendv(shm_mq_handle *mqh,
+			 shm_mq_iovec *iov, int iovcnt, bool nowait, bool notify);
+
 /* Wait for our counterparty to attach to the queue. */
 extern shm_mq_result shm_mq_wait_for_attach(shm_mq_handle *mqh);
 
#2Robert Haas
robertmhaas@gmail.com
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

On Fri, May 19, 2017 at 7:55 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

While analysing the performance of TPC-H queries for the newly developed
parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
that the time taken by gather node is significant. On investigation, as per
the current method it copies each tuple to the shared queue and notifies the
receiver. Since, this copying is done in shared queue, a lot of locking and
latching overhead is there.

So, in this POC patch I tried to copy all the tuples in a local queue thus
avoiding all the locks and latches. Once, the local queue is filled as per
it's capacity, tuples are transferred to the shared queue. Once, all the
tuples are transferred the receiver is sent the notification about the same.

What if, instead of doing this, we switched the shm_mq stuff to use atomics?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#2)
Re: [POC] Faster processing at Gather node

On Fri, May 19, 2017 at 5:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 19, 2017 at 7:55 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

While analysing the performance of TPC-H queries for the newly developed
parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
that the time taken by gather node is significant. On investigation, as per
the current method it copies each tuple to the shared queue and notifies the
receiver. Since, this copying is done in shared queue, a lot of locking and
latching overhead is there.

So, in this POC patch I tried to copy all the tuples in a local queue thus
avoiding all the locks and latches. Once, the local queue is filled as per
it's capacity, tuples are transferred to the shared queue. Once, all the
tuples are transferred the receiver is sent the notification about the same.

What if, instead of doing this, we switched the shm_mq stuff to use atomics?

That is one of the very first things we have tried, but it didn't show
us any improvement, probably because sending tuple-by-tuple over
shm_mq is not cheap. Also, independently, we have tried to reduce the
frequency of SetLatch (used to notify receiver), but that also didn't
result in improving the results. Now, I think one thing that can be
tried is to use atomics in shm_mq and reduce the frequency to notify
receiver, but not sure if that can give us better results than with
this idea. There are a couple of other ideas which has been tried to
improve the speed of Gather like avoiding an extra copy of tuple which
we need to do before sending tuple
(tqueueReceiveSlot->ExecMaterializeSlot) and increasing the size of
tuple queue length, but none of those has shown any noticeable
improvement. I am aware of all this because I and Dilip were offlist
involved in brainstorming ideas with Rafia to improve the speed of
Gather. I think it might have been better to show the results of
ideas that didn't work out, but I guess Rafia hasn't shared those with
the intuition that nobody would be interested in hearing what didn't
work out.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

Hi Rafia,

I like the idea of reducing locking overhead by sending tuples in bulk.
The implementation could probably be simpler: you could extend the API
of shm_mq to decouple notifying the sender from actually putting data
into the queue (i.e., make shm_mq_notify_receiver public and make a
variant of shm_mq_sendv that doesn't send the notification). From Amit's
letter I understand that you have already tried something along these
lines and the performance wasn't good. What was the bottleneck then? If
it's the locking around mq_bytes_read/written, it can be rewritten with
atomics. I think it would be great to try this approach because it
doesn't add much code, doesn't add any additional copying and improves
shm_mq performance in general.

--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Kuzmenkov (#4)
Re: [POC] Faster processing at Gather node

On Fri, Sep 8, 2017 at 11:07 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Hi Rafia,

I like the idea of reducing locking overhead by sending tuples in bulk. The
implementation could probably be simpler: you could extend the API of shm_mq
to decouple notifying the sender from actually putting data into the queue
(i.e., make shm_mq_notify_receiver public and make a variant of shm_mq_sendv
that doesn't send the notification).

Rafia can comment on details, but I would like to bring it to your
notice that we need kind of local buffer (queue) for gathermerge
processing as well where the data needs to be fetched in order from
queues. So, there is always a chance that some of the workers have
filled their queues while waiting for the master to extract the data.
I think the patch posted by Rafia on the nearby thread [1]/messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com addresses
both the problems by one patch.

[1]: /messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Amit Kapila (#5)
Re: [POC] Faster processing at Gather node

On Sat, Sep 9, 2017 at 8:14 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 8, 2017 at 11:07 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Hi Rafia,

I like the idea of reducing locking overhead by sending tuples in bulk. The
implementation could probably be simpler: you could extend the API of shm_mq
to decouple notifying the sender from actually putting data into the queue
(i.e., make shm_mq_notify_receiver public and make a variant of shm_mq_sendv
that doesn't send the notification).

Rafia can comment on details, but I would like to bring it to your
notice that we need kind of local buffer (queue) for gathermerge
processing as well where the data needs to be fetched in order from
queues. So, there is always a chance that some of the workers have
filled their queues while waiting for the master to extract the data.
I think the patch posted by Rafia on the nearby thread [1] addresses
both the problems by one patch.

[1] - /messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com

Thanks Alexander for your interest in this work. As rightly pointed
out by Amit, when experimenting with this patch we found that there
are cases when master is busy and unable to read tuples in
shared_queue and the worker get stuck as it can not process tuples any
more. When experimenting aong these lines, I found that Q12 of TPC-H
is showing great performance improvement when increasing
shared_tuple_queue_size [1]/messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com.
It was then we realised that merging this with the idea of giving an
illusion of larger tuple queue size with a local queue[1]/messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com could be
more beneficial. To precisely explain the meaning of merging the two
ideas, now we write tuples in local_queue once shared_queue is full
and as soon as we have filled some enough tuples in local queue we
copy the tuples from local to shared queue in one memcpy call. It is
giving good performance improvements for quite some cases.

I'll be glad if you may have a look at this patch and enlighten me
with your suggestions. :-)

[1]: /messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Rafia Sabih (#6)
Re: [POC] Faster processing at Gather node

Thanks Rafia, Amit, now I understand the ideas behind the patch better.
I'll see if I can look at the new one.

--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

Hi Rafia,

On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:

head:
explain analyse select * from t where i < 30000000;
QUERY PLAN

Could you share how exactly you generated the data? Just so others can
compare a bit better with your results?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Andres Freund (#8)
1 attachment(s)
Re: [POC] Faster processing at Gather node

On Tue, Oct 17, 2017 at 3:22 AM, Andres Freund <andres@anarazel.de> wrote:

Hi Rafia,

On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:

head:
explain analyse select * from t where i < 30000000;
QUERY PLAN

Could you share how exactly you generated the data? Just so others can
compare a bit better with your results?

Sure. I used generate_series(1, 10000000);
Please find the attached script for the detailed steps.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

large_tbl_gen.sqlapplication/octet-stream; name=large_tbl_gen.sqlDownload
#10Andres Freund
andres@anarazel.de
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

Hi Everyone,

On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:

While analysing the performance of TPC-H queries for the newly developed
parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
that the time taken by gather node is significant. On investigation, as per
the current method it copies each tuple to the shared queue and notifies
the receiver. Since, this copying is done in shared queue, a lot of locking
and latching overhead is there.

So, in this POC patch I tried to copy all the tuples in a local queue thus
avoiding all the locks and latches. Once, the local queue is filled as per
it's capacity, tuples are transferred to the shared queue. Once, all the
tuples are transferred the receiver is sent the notification about the same.

With this patch I could see significant improvement in performance for
simple queries,

I've spent some time looking into this, and I'm not quite convinced this
is the right approach. Let me start by describing where I see the
current performance problems around gather stemming from.

The observations here are made using
select * from t where i < 30000000 offset 29999999 - 1;
with Rafia's data. That avoids slowdowns on the leader due to too many
rows printed out. Sometimes I'll also use
SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;
on tpch data to show the effects on wider tables.

The precise query doesn't really matter, the observations here are more
general, I hope.

1) nodeGather.c re-projects every row from workers. As far as I can tell
that's now always exactly the same targetlist as it's coming from the
worker. Projection capability was added in 8538a6307049590 (without
checking whether it's needed afaict), but I think it in turn often
obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7. My
measurement shows that removing the projection yields quite massive
speedups in queries like yours, which is not too surprising.

I suspect this just needs a tlist_matches_tupdesc check + an if
around ExecProject(). And a test, right now tests pass unless
force_parallel_mode is used even if just commenting out the
projection unconditionally.

before commenting out nodeGather projection:

   rafia time: 8283.583
   rafia profile:
+   30.62%  postgres  postgres             [.] shm_mq_receive
+   18.49%  postgres  postgres             [.] s_lock
+   10.08%  postgres  postgres             [.] SetLatch
-    7.02%  postgres  postgres             [.] slot_deform_tuple
   - slot_deform_tuple
      - 88.01% slot_getsomeattrs
           ExecInterpExpr
           ExecGather
           ExecLimit
   lineitem time: 8448.468
   lineitem profile:
+   24.63%  postgres  postgres             [.] slot_deform_tuple
+   24.43%  postgres  postgres             [.] shm_mq_receive
+   17.36%  postgres  postgres             [.] ExecInterpExpr
+    7.41%  postgres  postgres             [.] s_lock
+    5.73%  postgres  postgres             [.] SetLatch
after:
   rafia time: 6660.224
   rafia profile:
+   36.77%  postgres  postgres              [.] shm_mq_receive
+   19.33%  postgres  postgres              [.] s_lock
+   13.14%  postgres  postgres              [.] SetLatch
+    9.22%  postgres  postgres              [.] AllocSetReset
+    4.27%  postgres  postgres              [.] ExecGather
+    2.79%  postgres  postgres              [.] AllocSetAlloc
   lineitem time: 4507.416
   lineitem profile:
+   34.81%  postgres  postgres            [.] shm_mq_receive
+   15.45%  postgres  postgres            [.] s_lock
+   13.38%  postgres  postgres            [.] SetLatch
+    9.87%  postgres  postgres            [.] AllocSetReset
+    5.82%  postgres  postgres            [.] ExecGather

as quite clearly visible, avoiding the projection yields some major
speedups.

The following analysis here has the projection removed.

2) The spinlocks both on the the sending and receiving side a quite hot:

   rafia query leader:
+   36.16%  postgres  postgres            [.] shm_mq_receive
+   19.49%  postgres  postgres            [.] s_lock
+   13.24%  postgres  postgres            [.] SetLatch

The presence of s_lock shows us that we're clearly often contending
on spinlocks, given that's the slow-path for SpinLockAcquire(). In
shm_mq_receive the instruction profile shows:

│ SpinLockAcquire(&mq->mq_mutex);
│1 5ab: mov $0xa9b580,%ecx
│ mov $0x4a4,%edx
│ mov $0xa9b538,%esi
│ mov %r15,%rdi
│ → callq s_lock
│ ↑ jmpq 2a1
│ tas():
│1 5c7: mov $0x1,%eax
32.83 │ lock xchg %al,(%r15)
│ shm_mq_inc_bytes_read():
│ SpinLockAcquire(&mq->mq_mutex);
and
0.01 │ pop %r15
0.04 │ ← retq
│ nop
│ tas():
│1 338: mov $0x1,%eax
17.59 │ lock xchg %al,(%r15)
│ shm_mq_get_bytes_written():
│ SpinLockAcquire(&mq->mq_mutex);
0.05 │ test %al,%al
0.01 │ ↓ jne 448
│ v = mq->mq_bytes_written;

    rafia query worker:
+   33.00%  postgres  postgres            [.] shm_mq_send_bytes
+    9.90%  postgres  postgres            [.] s_lock
+    7.74%  postgres  postgres            [.] shm_mq_send
+    5.40%  postgres  postgres            [.] ExecInterpExpr
+    5.34%  postgres  postgres            [.] SetLatch

Again, we have strong indicators for a lot of spinlock
contention. The instruction profiles show the same;

shm_mq_send_bytes
│ shm_mq_inc_bytes_written(mq, MAXALIGN(sendnow));
│ and $0xfffffffffffffff8,%r15
│ tas():
0.08 │ mov %ebp,%eax
31.07 │ lock xchg %al,(%r14)
│ shm_mq_inc_bytes_written():
│ * Increment the number of bytes written.
│ */

and

│3 98: cmp %r13,%rbx
0.70 │ ↓ jae 430
│ tas():
0.12 │1 a1: mov %ebp,%eax
28.53 │ lock xchg %al,(%r14)
│ shm_mq_get_bytes_read():
│ SpinLockAcquire(&mq->mq_mutex);
│ test %al,%al
│ ↓ jne 298
│ v = mq->mq_bytes_read;

shm_mq_send:
│ tas():
50.73 │ lock xchg %al,0x0(%r13)
│ shm_mq_notify_receiver():
│ shm_mq_notify_receiver(volatile shm_mq *mq)
│ {
│ PGPROC *receiver;
│ bool detached;

My interpretation here is that it's not just the effect of the
locking causing the slowdown, but to a significant degree the effect
of the implied bus lock.

To test that theory, here are the timings for
1) spinlocks present
time: 6593.045
2) spinlocks acuisition replaced by *full* memory barriers, which on x86 is a lock; addl $0,0(%%rsp)
time: 5159.306
3) spinlocks replaced by read/write barriers as appropriate.
time: 4687.610

By my understanding of shm_mq.c's logic, something like 3) aught to
be doable in a correct manner. There should be, in normal
circumstances, only be one end modifying each of the protected
variables. Doing so instead of using full block atomics with locked
instructions is very likely to yield considerably better performance.

The top-level profile after 3 is:

   leader:
+   25.89%  postgres  postgres          [.] shm_mq_receive
+   24.78%  postgres  postgres          [.] SetLatch
+   14.63%  postgres  postgres          [.] AllocSetReset
+    7.31%  postgres  postgres          [.] ExecGather
   worker:
+   14.02%  postgres  postgres            [.] ExecInterpExpr
+   11.63%  postgres  postgres            [.] shm_mq_send_bytes
+   11.25%  postgres  postgres            [.] heap_getnext
+    6.78%  postgres  postgres            [.] SetLatch
+    6.38%  postgres  postgres            [.] slot_deform_tuple

still a lot of cycles in the queue code, but proportionally less.

4) Modulo computations in shm_mq are expensive:

│ shm_mq_send_bytes():
│ Size offset = mq->mq_bytes_written % (uint64) ringsize;
0.12 │1 70: xor %edx,%edx
│ Size sendnow = Min(available, ringsize - offset);
│ mov %r12,%rsi
│ Size offset = mq->mq_bytes_written % (uint64) ringsize;
43.75 │ div %r12
│ memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
7.23 │ movzbl 0x31(%r15),%eax

│ shm_mq_receive_bytes():
│ used = written - mq->mq_bytes_read;
1.17 │ sub %rax,%rcx
│ offset = mq->mq_bytes_read % (uint64) ringsize;
18.49 │ div %rbp
│ mov %rdx,%rdi

that we end up with a full blown div makes sense - the compiler can't
know anything about ringsize, therefore it can't optimize to a mask.
I think we should force the size of the ringbuffer to be a power of
two, and use a maks instead of a size for the buffer.

5) There's a *lot* of latch interactions. The biggest issue actually is
the memory barrier implied by a SetLatch, waiting for the latch
barely shows up.

from 4) above:

   leader:
+   25.89%  postgres  postgres          [.] shm_mq_receive
+   24.78%  postgres  postgres          [.] SetLatch
+   14.63%  postgres  postgres          [.] AllocSetReset
+    7.31%  postgres  postgres          [.] ExecGather

│ 0000000000781b10 <SetLatch>:
│ SetLatch():
│ /*
│ * The memory barrier has to be placed here to ensure that any flag
│ * variables possibly changed by this process have been flushed to main
│ * memory, before we check/set is_set.
│ */
│ pg_memory_barrier();
77.43 │ lock addl $0x0,(%rsp)

│ /* Quick exit if already set */
│ if (latch->is_set)
0.12 │ mov (%rdi),%eax

Commenting out the memory barrier - which is NOT CORRECT - improves
timing:
before: 4675.626
after: 4125.587

The correct fix obviously is not to break latch correctness. I think
the big problem here is that we perform a SetLatch() for every read
from the latch.

I think we should
a) introduce a batch variant for receiving like:

extern shm_mq_result shm_mq_receivev(shm_mq_handle *mqh,
shm_mq_iovec *iov, int *iovcnt,
bool nowait)

which then only does the SetLatch() at the end. And maybe if it
wrapped around.

b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
queue whenever it's not empty when a tuple is ready.

I've not benchmarked this, but I'm pretty certain that the benefits
isn't just going to be reduced cost of SetLatch() calls, but also
increased performance due to fewer context switches

6) I've observed, using strace, debug outputs with timings, and top with
a short interval, that quite often only one backend has sufficient
work, while other backends are largely idle.

I think the problem here is that the "anti round robin" provisions from
bc7fcab5e36b9597857, while much better than the previous state, have
swung a bit too far into the other direction. Especially if we were
to introduce batching as I suggest in 5), but even without, this
leads to back-fort on half-empty queues between the gatherstate->nextreader
backend, and the leader.

I'm not 100% certain what the right fix here is.

One fairly drastic solution would be to move away from a
single-produce-single-consumer style, per worker, queue, to a global
queue. That'd avoid fairness issues between the individual workers,
at the price of potential added contention. One disadvantage is that
such a combined queue approach is not easily applicable for gather
merge.

One less drastic approach would be to move to try to drain the queue
fully in one batch, and then move to the next queue. That'd avoid
triggering a small wakeups for each individual tuple, as one
currently would get without the 'stickyness'.

It might also be a good idea to use a more directed form of wakeups,
e.g. using a per-worker latch + a wait event set, to avoid iterating
over all workers.

Unfortunately the patch's "local worker queue" concept seems, to me,
like it's not quite addressing the structural issues, instead opting to
address them by adding another layer of queuing. I suspect that if we'd
go for the above solutions there'd be only very small benefit in
implementing such per-worker local queues.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#10)
Re: [POC] Faster processing at Gather node

Hi,

On 2017-10-17 14:39:57 -0700, Andres Freund wrote:

I've spent some time looking into this, and I'm not quite convinced this
is the right approach. Let me start by describing where I see the
current performance problems around gather stemming from.

One further approach to several of these issues could also be to change
things a bit more radically:

Instead of the current shm_mq + tqueue.c, have a drastically simpler
queue, that just stores fixed width dsa_pointers. Dealing with that
queue will be quite a bit faster. In that queue one would store dsa.c
managed pointers to tuples.

One thing that makes that attractive is that that'd move a bunch of
copying in the leader process solely to the worker processes, because
the leader could just convert the dsa_pointer into a local pointer and
hand that upwards the execution tree.

We'd possibly need some halfway clever way to reuse dsa allocations, but
that doesn't seem impossible.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#10)
Re: [POC] Faster processing at Gather node

On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund <andres@anarazel.de> wrote:

The precise query doesn't really matter, the observations here are more
general, I hope.

1) nodeGather.c re-projects every row from workers. As far as I can tell
that's now always exactly the same targetlist as it's coming from the
worker. Projection capability was added in 8538a6307049590 (without
checking whether it's needed afaict), but I think it in turn often
obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7. My
measurement shows that removing the projection yields quite massive
speedups in queries like yours, which is not too surprising.

That seems like an easy and worthwhile optimization.

I suspect this just needs a tlist_matches_tupdesc check + an if
around ExecProject(). And a test, right now tests pass unless
force_parallel_mode is used even if just commenting out the
projection unconditionally.

So, for this to fail, we'd need a query that uses parallelism but
where the target list contains a parallel-restricted function. Also,
the function should really be such that we'll reliably get a failure
rather than only with some small probability. I'm not quite sure how
to put together such a test case, but there's probably some way.

2) The spinlocks both on the the sending and receiving side a quite hot:

rafia query leader:
+   36.16%  postgres  postgres            [.] shm_mq_receive
+   19.49%  postgres  postgres            [.] s_lock
+   13.24%  postgres  postgres            [.] SetLatch

To test that theory, here are the timings for
1) spinlocks present
time: 6593.045
2) spinlocks acuisition replaced by *full* memory barriers, which on x86 is a lock; addl $0,0(%%rsp)
time: 5159.306
3) spinlocks replaced by read/write barriers as appropriate.
time: 4687.610

By my understanding of shm_mq.c's logic, something like 3) aught to
be doable in a correct manner. There should be, in normal
circumstances, only be one end modifying each of the protected
variables. Doing so instead of using full block atomics with locked
instructions is very likely to yield considerably better performance.

I think the sticking point here will be the handling of the
mq_detached flag. I feel like I fixed a bug at some point where this
had to be checked or set under the lock at the same time we were
checking or setting mq_bytes_read and/or mq_bytes_written, but I don't
remember the details. I like the idea, though.

Not sure what happened to #3 on your list... you went from #2 to #4.

4) Modulo computations in shm_mq are expensive:

that we end up with a full blown div makes sense - the compiler can't
know anything about ringsize, therefore it can't optimize to a mask.
I think we should force the size of the ringbuffer to be a power of
two, and use a maks instead of a size for the buffer.

This seems like it would require some redesign. Right now we let the
caller choose any size they want and subtract off our header size to
find the actual queue size. We can waste up to MAXALIGN-1 bytes, but
that's not much. This would waste up to half the bytes provided,
which is probably not cool.

5) There's a *lot* of latch interactions. The biggest issue actually is
the memory barrier implied by a SetLatch, waiting for the latch
barely shows up.

Commenting out the memory barrier - which is NOT CORRECT - improves
timing:
before: 4675.626
after: 4125.587

The correct fix obviously is not to break latch correctness. I think
the big problem here is that we perform a SetLatch() for every read
from the latch.

I think it's a little bit of an overstatement to say that commenting
out the memory barrier is not correct. When we added that code, we
removed this comment:

- * Presently, when using a shared latch for interprocess signalling, the
- * flag variable(s) set by senders and inspected by the wait loop must
- * be protected by spinlocks or LWLocks, else it is possible to miss events
- * on machines with weak memory ordering (such as PPC). This restriction
- * will be lifted in future by inserting suitable memory barriers into
- * SetLatch and ResetLatch.

It seems to me that in any case where the data is protected by an
LWLock, the barriers we've added to SetLatch and ResetLatch are
redundant. I'm not sure if that's entirely true in the spinlock case,
because S_UNLOCK() is only documented to have release semantics, so
maybe the load of latch->is_set could be speculated before the lock is
dropped. But I do wonder if we're just multiplying barriers endlessly
instead of trying to think of ways to minimize them (e.g. have a
variant of SpinLockRelease that acts as a full barrier instead of a
release barrier, and then avoid a second barrier when checking the
latch state).

All that having been said, a batch variant for reading tuples in bulk
might make sense. I think when I originally wrote this code I was
thinking that one process might be filling the queue while another
process was draining it, and that it might therefore be important to
free up space as early as possible. But maybe that's not a very good
intuition.

b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
queue whenever it's not empty when a tuple is ready.

Batching them with what? I hate to postpone sending tuples we've got;
that sounds nice in the case where we're sending tons of tuples at
high speed, but there might be other cases where it makes the leader
wait.

6) I've observed, using strace, debug outputs with timings, and top with
a short interval, that quite often only one backend has sufficient
work, while other backends are largely idle.

Doesn't that just mean we're bad at choosing how man workers to use?
If one worker can't outrun the leader, it's better to have the other
workers sleep and keep one the one lucky worker on CPU than to keep
context switching. Or so I would assume.

One fairly drastic solution would be to move away from a
single-produce-single-consumer style, per worker, queue, to a global
queue. That'd avoid fairness issues between the individual workers,
at the price of potential added contention. One disadvantage is that
such a combined queue approach is not easily applicable for gather
merge.

It might also lead to more contention.

One less drastic approach would be to move to try to drain the queue
fully in one batch, and then move to the next queue. That'd avoid
triggering a small wakeups for each individual tuple, as one
currently would get without the 'stickyness'.

I don't know if that is better but it seems worth a try.

It might also be a good idea to use a more directed form of wakeups,
e.g. using a per-worker latch + a wait event set, to avoid iterating
over all workers.

I don't follow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#12)
Re: [POC] Faster processing at Gather node

Hi,

On 2017-10-18 15:46:39 -0400, Robert Haas wrote:

2) The spinlocks both on the the sending and receiving side a quite hot:

rafia query leader:
+   36.16%  postgres  postgres            [.] shm_mq_receive
+   19.49%  postgres  postgres            [.] s_lock
+   13.24%  postgres  postgres            [.] SetLatch

To test that theory, here are the timings for
1) spinlocks present
time: 6593.045
2) spinlocks acuisition replaced by *full* memory barriers, which on x86 is a lock; addl $0,0(%%rsp)
time: 5159.306
3) spinlocks replaced by read/write barriers as appropriate.
time: 4687.610

By my understanding of shm_mq.c's logic, something like 3) aught to
be doable in a correct manner. There should be, in normal
circumstances, only be one end modifying each of the protected
variables. Doing so instead of using full block atomics with locked
instructions is very likely to yield considerably better performance.

I think the sticking point here will be the handling of the
mq_detached flag. I feel like I fixed a bug at some point where this
had to be checked or set under the lock at the same time we were
checking or setting mq_bytes_read and/or mq_bytes_written, but I don't
remember the details. I like the idea, though.

Hm. I'm a bit confused/surprised by that. What'd be the worst that can
happen if we don't immediately detect that the other side is detached?
At least if we only do so in the non-blocking paths, the worst that
seems that could happen is that we read/write at most one superflous
queue entry, rather than reporting an error? Or is the concern that
detaching might be detected *too early*, before reading the last entry
from a queue?

Not sure what happened to #3 on your list... you went from #2 to #4.

Threes are boring.

4) Modulo computations in shm_mq are expensive:

that we end up with a full blown div makes sense - the compiler can't
know anything about ringsize, therefore it can't optimize to a mask.
I think we should force the size of the ringbuffer to be a power of
two, and use a maks instead of a size for the buffer.

This seems like it would require some redesign. Right now we let the
caller choose any size they want and subtract off our header size to
find the actual queue size. We can waste up to MAXALIGN-1 bytes, but
that's not much. This would waste up to half the bytes provided,
which is probably not cool.

Yea, I think it'd require a shm_mq_estimate_size(Size queuesize), that
returns the next power-of-two queuesize + overhead.

5) There's a *lot* of latch interactions. The biggest issue actually is
the memory barrier implied by a SetLatch, waiting for the latch
barely shows up.

Commenting out the memory barrier - which is NOT CORRECT - improves
timing:
before: 4675.626
after: 4125.587

The correct fix obviously is not to break latch correctness. I think
the big problem here is that we perform a SetLatch() for every read
from the latch.

I think it's a little bit of an overstatement to say that commenting
out the memory barrier is not correct. When we added that code, we
removed this comment:

- * Presently, when using a shared latch for interprocess signalling, the
- * flag variable(s) set by senders and inspected by the wait loop must
- * be protected by spinlocks or LWLocks, else it is possible to miss events
- * on machines with weak memory ordering (such as PPC). This restriction
- * will be lifted in future by inserting suitable memory barriers into
- * SetLatch and ResetLatch.

It seems to me that in any case where the data is protected by an
LWLock, the barriers we've added to SetLatch and ResetLatch are
redundant. I'm not sure if that's entirely true in the spinlock case,
because S_UNLOCK() is only documented to have release semantics, so
maybe the load of latch->is_set could be speculated before the lock is
dropped. But I do wonder if we're just multiplying barriers endlessly
instead of trying to think of ways to minimize them (e.g. have a
variant of SpinLockRelease that acts as a full barrier instead of a
release barrier, and then avoid a second barrier when checking the
latch state).

I'm not convinced by this. Imo the multiplying largely comes from
superflous actions, like the per-entry SetLatch calls here, rather than
per batch.

After all I'd benchmarked this *after* an experimental conversion of
shm_mq to not use spinlocks - so there's actually no external barrier
providing these guarantees that could be combined with SetLatch()'s
barrier.

Presumably part of the cost here comes from the fact that the barriers
actually do have an influence over the ordering.

All that having been said, a batch variant for reading tuples in bulk
might make sense. I think when I originally wrote this code I was
thinking that one process might be filling the queue while another
process was draining it, and that it might therefore be important to
free up space as early as possible. But maybe that's not a very good
intuition.

I think that's a sensible goal, but I don't think that has to mean that
the draining has to happen after every entry. If you'd e.g. have a
shm_mq_receivev() with 16 iovecs, that'd commonly be only part of a
single tqueue mq (unless your tuples are > 4k). I'll note that afaict
shm_mq_sendv() already batches its SetLatch() calls.

I think one important thing a batched drain can avoid is that a worker
awakes to just put one new tuple into the queue and then sleep
again. That's kinda expensive.

b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
queue whenever it's not empty when a tuple is ready.

Batching them with what? I hate to postpone sending tuples we've got;
that sounds nice in the case where we're sending tons of tuples at
high speed, but there might be other cases where it makes the leader
wait.

Yea, that'd need some smarts. How about doing something like batching up
locally as long as the queue contains less than one average sized batch?

6) I've observed, using strace, debug outputs with timings, and top with
a short interval, that quite often only one backend has sufficient
work, while other backends are largely idle.

Doesn't that just mean we're bad at choosing how man workers to use?
If one worker can't outrun the leader, it's better to have the other
workers sleep and keep one the one lucky worker on CPU than to keep
context switching. Or so I would assume.

No, I don't think that's necesarily true. If that worker's queue is full
every-time, then yes. But I think a common scenario is that the
"current" worker only has a small portion of the queue filled. Draining
that immediately just leads to increased cacheline bouncing.

I'd not previously thought about this, but won't staying sticky to the
current worker potentially cause pins on individual tuples be held for a
potentially long time by workers not making any progress?

It might also be a good idea to use a more directed form of wakeups,
e.g. using a per-worker latch + a wait event set, to avoid iterating
over all workers.

I don't follow.

Well, right now we're busily checking each worker's queue. That's fine
with a handful of workers, but starts to become not that cheap pretty
soon afterwards. In the surely common case where the workers are the
bottleneck (because that's when parallelism is worthwhile), we'll check
each worker's queue once one of them returned a single tuple. It'd not
be a stupid idea to have a per-worker latch and wait for the latches of
all workers at once. Then targetedly drain the queues of the workers
that WaitEventSetWait(nevents = nworkers) signalled as ready.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#13)
Re: [POC] Faster processing at Gather node

On Wed, Oct 18, 2017 at 4:30 PM, Andres Freund <andres@anarazel.de> wrote:

Hm. I'm a bit confused/surprised by that. What'd be the worst that can
happen if we don't immediately detect that the other side is detached?
At least if we only do so in the non-blocking paths, the worst that
seems that could happen is that we read/write at most one superflous
queue entry, rather than reporting an error? Or is the concern that
detaching might be detected *too early*, before reading the last entry
from a queue?

Detaching too early is definitely a problem. A worker is allowed to
start up, write all of its results into the queue, and then detach
without waiting for the leader to read those results. (Reading
messages that weren't really written would be a problem too, of
course.)

I'm not convinced by this. Imo the multiplying largely comes from
superflous actions, like the per-entry SetLatch calls here, rather than
per batch.

After all I'd benchmarked this *after* an experimental conversion of
shm_mq to not use spinlocks - so there's actually no external barrier
providing these guarantees that could be combined with SetLatch()'s
barrier.

OK.

All that having been said, a batch variant for reading tuples in bulk
might make sense. I think when I originally wrote this code I was
thinking that one process might be filling the queue while another
process was draining it, and that it might therefore be important to
free up space as early as possible. But maybe that's not a very good
intuition.

I think that's a sensible goal, but I don't think that has to mean that
the draining has to happen after every entry. If you'd e.g. have a
shm_mq_receivev() with 16 iovecs, that'd commonly be only part of a
single tqueue mq (unless your tuples are > 4k). I'll note that afaict
shm_mq_sendv() already batches its SetLatch() calls.

But that's a little different -- shm_mq_sendv() sends one message
collected from multiple places. There's no more reason for it to wake
up the receiver before the whole message is written that there would
be for shm_mq_send(); it's the same problem.

I think one important thing a batched drain can avoid is that a worker
awakes to just put one new tuple into the queue and then sleep
again. That's kinda expensive.

Yes. Or - part of a tuple, which is worse.

b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
queue whenever it's not empty when a tuple is ready.

Batching them with what? I hate to postpone sending tuples we've got;
that sounds nice in the case where we're sending tons of tuples at
high speed, but there might be other cases where it makes the leader
wait.

Yea, that'd need some smarts. How about doing something like batching up
locally as long as the queue contains less than one average sized batch?

I don't follow.

No, I don't think that's necesarily true. If that worker's queue is full
every-time, then yes. But I think a common scenario is that the
"current" worker only has a small portion of the queue filled. Draining
that immediately just leads to increased cacheline bouncing.

Hmm, OK.

I'd not previously thought about this, but won't staying sticky to the
current worker potentially cause pins on individual tuples be held for a
potentially long time by workers not making any progress?

Yes.

It might also be a good idea to use a more directed form of wakeups,
e.g. using a per-worker latch + a wait event set, to avoid iterating
over all workers.

I don't follow.

Well, right now we're busily checking each worker's queue. That's fine
with a handful of workers, but starts to become not that cheap pretty
soon afterwards. In the surely common case where the workers are the
bottleneck (because that's when parallelism is worthwhile), we'll check
each worker's queue once one of them returned a single tuple. It'd not
be a stupid idea to have a per-worker latch and wait for the latches of
all workers at once. Then targetedly drain the queues of the workers
that WaitEventSetWait(nevents = nworkers) signalled as ready.

Hmm, interesting. But we can't completely ignore the process latch
either, since among other things a worker erroring out and propagating
the error to the leader relies on that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#10)
Re: [POC] Faster processing at Gather node

On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund <andres@anarazel.de> wrote:

Hi Everyone,

On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:

While analysing the performance of TPC-H queries for the newly developed
parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
that the time taken by gather node is significant. On investigation, as per
the current method it copies each tuple to the shared queue and notifies
the receiver. Since, this copying is done in shared queue, a lot of locking
and latching overhead is there.

So, in this POC patch I tried to copy all the tuples in a local queue thus
avoiding all the locks and latches. Once, the local queue is filled as per
it's capacity, tuples are transferred to the shared queue. Once, all the
tuples are transferred the receiver is sent the notification about the same.

With this patch I could see significant improvement in performance for
simple queries,

I've spent some time looking into this, and I'm not quite convinced this
is the right approach.

As per my understanding, the patch in this thread is dead (not
required) after the patch posted by Rafia in thread "Effect of
changing the value for PARALLEL_TUPLE_QUEUE_SIZE" [1]/messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com. There seem to
be two related problems in this area, first is shm queue communication
overhead and second is workers started to stall when shm queue gets
full. We can observe the first problem in simple queries that use
gather and second in gather merge kind of scenarios. We are trying to
resolve both the problems with the patch posted in another thread. I
think there is some similarity with the patch posted on this thread,
but it is different. I have mentioned something similar up thread as
well.

Let me start by describing where I see the
current performance problems around gather stemming from.

The observations here are made using
select * from t where i < 30000000 offset 29999999 - 1;
with Rafia's data. That avoids slowdowns on the leader due to too many
rows printed out. Sometimes I'll also use
SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;
on tpch data to show the effects on wider tables.

The precise query doesn't really matter, the observations here are more
general, I hope.

1) nodeGather.c re-projects every row from workers. As far as I can tell
that's now always exactly the same targetlist as it's coming from the
worker. Projection capability was added in 8538a6307049590 (without
checking whether it's needed afaict), but I think it in turn often
obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7. My
measurement shows that removing the projection yields quite massive
speedups in queries like yours, which is not too surprising.

I suspect this just needs a tlist_matches_tupdesc check + an if
around ExecProject(). And a test, right now tests pass unless
force_parallel_mode is used even if just commenting out the
projection unconditionally.

+1. I think we should something to avoid this.

Commenting out the memory barrier - which is NOT CORRECT - improves
timing:
before: 4675.626
after: 4125.587

The correct fix obviously is not to break latch correctness. I think
the big problem here is that we perform a SetLatch() for every read
from the latch.

I think we should
a) introduce a batch variant for receiving like:

extern shm_mq_result shm_mq_receivev(shm_mq_handle *mqh,
shm_mq_iovec *iov, int *iovcnt,
bool nowait)

which then only does the SetLatch() at the end. And maybe if it
wrapped around.

b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
queue whenever it's not empty when a tuple is ready.

I think the patch posted in another thread has tried to achieve such a
batching by using local queues, maybe we should try some other way.

Unfortunately the patch's "local worker queue" concept seems, to me,
like it's not quite addressing the structural issues, instead opting to
address them by adding another layer of queuing.

That is done to use batching the tuples while sending them. Sure, we
can do some of the other things as well, but I think the main
advantage is from batching the tuples in a smart way while sending
them and once that is done, we might not need many of the other
optimizations.

[1]: /messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#12)
Re: [POC] Faster processing at Gather node

On Thu, Oct 19, 2017 at 1:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund <andres@anarazel.de> wrote:

b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
queue whenever it's not empty when a tuple is ready.

Batching them with what? I hate to postpone sending tuples we've got;
that sounds nice in the case where we're sending tons of tuples at
high speed, but there might be other cases where it makes the leader
wait.

I think Rafia's latest patch on the thread [1]/messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com has done something
similar where the tuples are sent till there is a space in shared
memory queue and then turn to batching the tuples using local queues.

[1]: /messages/by-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS=fHiBJmbSOF74aBQ@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#10)
1 attachment(s)
Re: [POC] Faster processing at Gather node

On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund <andres@anarazel.de> wrote:

2) The spinlocks both on the the sending and receiving side a quite hot:

rafia query leader:
+   36.16%  postgres  postgres            [.] shm_mq_receive
+   19.49%  postgres  postgres            [.] s_lock
+   13.24%  postgres  postgres            [.] SetLatch

Here's a patch which, as per an off-list discussion between Andres,
Amit, and myself, removes the use of the spinlock for most
send/receive operations in favor of memory barriers and the atomics
support for 8-byte reads and writes. I tested with a pgbench -i -s
300 database with pgbench_accounts_pkey dropped and
max_parallel_workers_per_gather boosted to 10. I used this query:

select aid, count(*) from pgbench_accounts group by 1 having count(*) > 1;

which produces this plan:

Finalize GroupAggregate (cost=1235865.51..5569468.75 rows=10000000 width=12)
Group Key: aid
Filter: (count(*) > 1)
-> Gather Merge (cost=1235865.51..4969468.75 rows=30000000 width=12)
Workers Planned: 6
-> Partial GroupAggregate (cost=1234865.42..1322365.42
rows=5000000 width=12)
Group Key: aid
-> Sort (cost=1234865.42..1247365.42 rows=5000000 width=4)
Sort Key: aid
-> Parallel Seq Scan on pgbench_accounts
(cost=0.00..541804.00 rows=5000000 width=4)
(10 rows)

On hydra (PPC), these changes didn't help much. Timings:

master: 29605.582, 29753.417, 30160.485
patch: 28218.396, 27986.951, 26465.584

That's about a 5-6% improvement. On my MacBook, though, the
improvement was quite a bit more:

master: 21436.745, 20978.355, 19918.617
patch: 15896.573, 15880.652, 15967.176

Median-to-median, that's about a 24% improvement.

Any reviews appreciated.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

shm-mq-less-spinlocks-v1.2.patchapplication/octet-stream; name=shm-mq-less-spinlocks-v1.2.patchDownload
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 770559a03e..75e3edfe47 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -31,27 +31,27 @@
  * Some notes on synchronization:
  *
  * mq_receiver and mq_bytes_read can only be changed by the receiver; and
- * mq_sender and mq_bytes_written can only be changed by the sender.  However,
- * because most of these fields are 8 bytes and we don't assume that 8 byte
- * reads and writes are atomic, the spinlock must be taken whenever the field
- * is updated, and whenever it is read by a process other than the one allowed
- * to modify it. But the process that is allowed to modify it is also allowed
- * to read it without the lock.  On architectures where 8-byte writes are
- * atomic, we could replace these spinlocks with memory barriers, but
- * testing found no performance benefit, so it seems best to keep things
- * simple for now.
+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.
  *
- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.
  *
  * mq_ring_size and mq_ring_offset never change after initialization, and
  * can therefore be read without the lock.
  *
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * At any given time, the difference between mq_bytes_read and
  * mq_bytes_written defines the number of bytes within mq_ring that contain
  * unread data, and mq_bytes_read defines the position where those bytes
  * begin.  The sender can increase the number of unread bytes at any time,
@@ -71,8 +71,8 @@ struct shm_mq
 	slock_t		mq_mutex;
 	PGPROC	   *mq_receiver;
 	PGPROC	   *mq_sender;
-	uint64		mq_bytes_read;
-	uint64		mq_bytes_written;
+	pg_atomic_uint64 mq_bytes_read;
+	pg_atomic_uint64 mq_bytes_written;
 	Size		mq_ring_size;
 	bool		mq_detached;
 	uint8		mq_ring_offset;
@@ -150,11 +150,8 @@ static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 					 BackgroundWorkerHandle *handle);
-static uint64 shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n);
-static uint64 shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n);
-static shm_mq_result shm_mq_notify_receiver(volatile shm_mq *mq);
 static void shm_mq_detach_callback(dsm_segment *seg, Datum arg);
 
 /* Minimum queue size is enough for header and at least one chunk of data. */
@@ -182,8 +179,8 @@ shm_mq_create(void *address, Size size)
 	SpinLockInit(&mq->mq_mutex);
 	mq->mq_receiver = NULL;
 	mq->mq_sender = NULL;
-	mq->mq_bytes_read = 0;
-	mq->mq_bytes_written = 0;
+	pg_atomic_init_u64(&mq->mq_bytes_read, 0);
+	pg_atomic_init_u64(&mq->mq_bytes_written, 0);
 	mq->mq_ring_size = size - data_offset;
 	mq->mq_detached = false;
 	mq->mq_ring_offset = data_offset - offsetof(shm_mq, mq_ring);
@@ -352,6 +349,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	PGPROC	   *receiver;
 	Size		nbytes = 0;
 	Size		bytes_written;
 	int			i;
@@ -492,8 +490,30 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_length_word_complete = false;
 
+	/* If queue has been detached, let caller know. */
+	if (mq->mq_detached)
+		return SHM_MQ_DETACHED;
+
+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */
+	if (mqh->mqh_counterparty_attached)
+		receiver = mq->mq_receiver;
+	else
+	{
+		SpinLockAcquire(&mq->mq_mutex);
+		receiver = mq->mq_receiver;
+		SpinLockRelease(&mq->mq_mutex);
+		if (receiver == NULL)
+			return SHM_MQ_SUCCESS;
+		mqh->mqh_counterparty_attached = true;
+	}
+
 	/* Notify receiver of the newly-written data, and return. */
-	return shm_mq_notify_receiver(mq);
+	SetLatch(&receiver->procLatch);
+	return SHM_MQ_SUCCESS;
 }
 
 /*
@@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 
 	while (sent < nbytes)
 	{
-		bool		detached;
 		uint64		rb;
+		uint64		wb;
 
 		/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;
 		Assert(used <= ringsize);
 		available = Min(ringsize - used, nbytes - sent);
 
 		/* Bail out if the queue has been detached. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			*bytes_written = sent;
 			return SHM_MQ_DETACHED;
@@ -900,15 +921,12 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else if (available == 0)
 		{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			SetLatch(&mq->mq_receiver->procLatch);
 
 			/* Skip manipulation of our latch if nowait = true. */
 			if (nowait)
@@ -934,10 +952,18 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else
 		{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
+
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);
 
-			/* Write as much data as we can via a single memcpy(). */
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 */
+			pg_memory_barrier();
 			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
 				   (char *) data + sent, sendnow);
 			sent += sendnow;
@@ -983,19 +1009,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 	for (;;)
 	{
 		Size		offset;
-		bool		detached;
+		uint64		read;
 
 		/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+		used = written - read;
 		Assert(used <= ringsize);
-		offset = mq->mq_bytes_read % (uint64) ringsize;
+		offset = read % (uint64) ringsize;
 
 		/* If we have enough data or buffer has wrapped, we're done. */
 		if (used >= bytes_needed || offset + used >= ringsize)
 		{
 			*nbytesp = Min(used, ringsize - offset);
 			*datap = &mq->mq_ring[mq->mq_ring_offset + offset];
+
+			/*
+			 * Separate the read of mq_bytes_written, above, from caller's
+			 * attempt to read the data itself.  Pairs with the barrier in
+			 * shm_mq_inc_bytes_written.
+			 */
+			pg_read_barrier();
 			return SHM_MQ_SUCCESS;
 		}
 
@@ -1007,7 +1041,7 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		 * receiving a message stored in the buffer even after the sender has
 		 * detached.
 		 */
-		if (detached)
+		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1037,16 +1071,10 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 static bool
 shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 {
-	bool		detached;
 	pid_t		pid;
 
-	/* Acquire the lock just long enough to check the pointer. */
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
 	/* If the queue has been detached, counterparty is definitely gone. */
-	if (detached)
+	if (mq->mq_detached)
 		return true;
 
 	/* If there's a handle, check worker status. */
@@ -1059,9 +1087,7 @@ shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 		if (status != BGWH_STARTED && status != BGWH_NOT_YET_STARTED)
 		{
 			/* Mark it detached, just to make it official. */
-			SpinLockAcquire(&mq->mq_mutex);
 			mq->mq_detached = true;
-			SpinLockRelease(&mq->mq_mutex);
 			return true;
 		}
 	}
@@ -1091,16 +1117,14 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	{
 		BgwHandleStatus status;
 		pid_t		pid;
-		bool		detached;
 
 		/* Acquire the lock just long enough to check the pointer. */
 		SpinLockAcquire(&mq->mq_mutex);
-		detached = mq->mq_detached;
 		result = (*ptr != NULL);
 		SpinLockRelease(&mq->mq_mutex);
 
 		/* Fail if detached; else succeed if initialized. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			result = false;
 			break;
@@ -1133,23 +1157,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 }
 
 /*
- * Get the number of bytes read.  The receiver need not use this to access
- * the count of bytes read, but the sender must.
- */
-static uint64
-shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_read;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes read.
  */
 static void
@@ -1157,63 +1164,51 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
 	PGPROC	   *sender;
 
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes().
+	 * We only need a read barrier here because the increment of mq_bytes_read
+	 * is actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_read,
+						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
+
+	/*
+	 * We shouldn't have any bytes to read without a sender, so we can read
+	 * mq_sender here without a lock.  Once it's initialized, it can't change.
+	 */
 	sender = mq->mq_sender;
-	SpinLockRelease(&mq->mq_mutex);
-
-	/* We shouldn't have any bytes to read without a sender. */
 	Assert(sender != NULL);
 	SetLatch(&sender->procLatch);
 }
 
 /*
- * Get the number of bytes written.  The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_written;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes written.
  */
 static void
 shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n)
 {
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_written += n;
-	SpinLockRelease(&mq->mq_mutex);
-}
-
-/*
- * Set receiver's latch, unless queue is detached.
- */
-static shm_mq_result
-shm_mq_notify_receiver(volatile shm_mq *mq)
-{
-	PGPROC	   *receiver;
-	bool		detached;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	receiver = mq->mq_receiver;
-	SpinLockRelease(&mq->mq_mutex);
-
-	if (detached)
-		return SHM_MQ_DETACHED;
-	if (receiver)
-		SetLatch(&receiver->procLatch);
-	return SHM_MQ_SUCCESS;
+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with shm_mq_get_bytes_written's read
+	 * barrier.
+	 */
+	pg_write_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_written,
+						pg_atomic_read_u64(&mq->mq_bytes_written) + n);
 }
 
 /* Shim for on_dsm_callback. */
#18Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#17)
Re: [POC] Faster processing at Gather node

Hi,

On 2017-11-04 16:38:31 +0530, Robert Haas wrote:

On hydra (PPC), these changes didn't help much. Timings:

master: 29605.582, 29753.417, 30160.485
patch: 28218.396, 27986.951, 26465.584

That's about a 5-6% improvement. On my MacBook, though, the
improvement was quite a bit more:

Hm. I wonder why that is. Random unverified theories (this plane doesn't
have power supplies for us mere mortals in coach, therefore I'm not
going to run benchmarks):

- Due to the lower per-core performance the leader backend is so
bottlenecked that there's just not a whole lot of
contention. Therefore removing the lock doesn't help much. That might
be a bit different if the redundant projection is removed.
- IO performance on hydra is notoriously bad so there might just not be
enough data available for workers to process rows fast enough to cause
contention.

master: 21436.745, 20978.355, 19918.617
patch: 15896.573, 15880.652, 15967.176

Median-to-median, that's about a 24% improvement.

Neat!

- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.

Maybe mention that there's a fallback for ancient platforms?

@@ -900,15 +921,12 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
}
else if (available == 0)
{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			SetLatch(&mq->mq_receiver->procLatch);

Might not hurt to assert mqh_counterparty_attached, just for slightly
easier debugging.

@@ -983,19 +1009,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
for (;;)
{
Size		offset;
-		bool		detached;
+		uint64		read;
/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);

Theoretically we don't actually need to re-read this from shared memory,
we could just have the information in the local memory too. Right?
Doubtful however that it's important enough to bother given that we've
to move the cacheline for `mq_bytes_written` anyway, and will later also
dirty it to *update* `mq_bytes_read`. Similarly on the write side.

-/*
* Increment the number of bytes read.
*/
static void
@@ -1157,63 +1164,51 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
{
PGPROC *sender;

-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes().
+	 * We only need a read barrier here because the increment of mq_bytes_read
+	 * is actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */

s/the bus lock/a bus lock/? Might also be worth rephrasing away from
bus locks - there's a lot of different ways atomics are implemented.

/*
- * Get the number of bytes written. The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
- uint64 v;
-
- SpinLockAcquire(&mq->mq_mutex);
- v = mq->mq_bytes_written;
- *detached = mq->mq_detached;
- SpinLockRelease(&mq->mq_mutex);
-
- return v;
-}

You reference this function in a comment elsewhere:

+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with shm_mq_get_bytes_written's read
+	 * barrier.
+	 */
+	pg_write_barrier();

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#18)
4 attachment(s)
Re: [POC] Faster processing at Gather node

On Sat, Nov 4, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:

master: 21436.745, 20978.355, 19918.617
patch: 15896.573, 15880.652, 15967.176

Median-to-median, that's about a 24% improvement.

Neat!

With the attached stack of 4 patches, I get: 10811.768 ms, 10743.424
ms, 10632.006 ms, about a 49% improvement median-to-median. Haven't
tried it on hydra or any other test cases yet.

skip-gather-project-v1.patch does what it says on the tin. I still
don't have a test case for this, and I didn't find that it helped very
much, but it would probably help more in a test case with more
columns, and you said this looked like a big bottleneck in your
testing, so here you go.

shm-mq-less-spinlocks-v2.patch is updated from the version I posted
before based on your review comments. I don't think it's really
necessary to mention that the 8-byte atomics have fallbacks here;
whatever needs to be said about that should be said in some central
place that talks about atomics, not in each user individually. I
agree that there might be some further speedups possible by caching
some things in local memory, but I haven't experimented with that.

shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only
consume input from the shared queue when the amount of unconsumed
input exceeds 1/4 of the queue size. This caused a large performance
improvement in my testing because it causes the number of times the
latch gets set to drop dramatically. I experimented a bit with
thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be
enough to capture most of the benefit.

remove-memory-leak-protection-v1.patch removes the memory leak
protection that Tom installed upon discovering that the original
version of tqueue.c leaked memory like crazy. I think that it
shouldn't do that any more, courtesy of
6b65a7fe62e129d5c2b85cd74d6a91d8f7564608. Assuming that's correct, we
can avoid a whole lot of tuple copying in Gather Merge and a much more
modest amount of overhead in Gather. Since my test case exercised
Gather Merge, this bought ~400 ms or so.

Even with all of these patches applied, there's clearly still room for
more optimization, but MacOS's "sample" profiler seems to show that
the bottlenecks are largely shifting elsewhere:

Sort by top of stack, same collapsed (when >= 5):
slot_getattr (in postgres) 706
slot_deform_tuple (in postgres) 560
ExecAgg (in postgres) 378
ExecInterpExpr (in postgres) 372
AllocSetAlloc (in postgres) 319
_platform_memmove$VARIANT$Haswell (in
libsystem_platform.dylib) 314
read (in libsystem_kernel.dylib) 303
heap_compare_slots (in postgres) 296
combine_aggregates (in postgres) 273
shm_mq_receive_bytes (in postgres) 272

I'm probably not super-excited about spending too much more time
trying to make the _platform_memmove time (only 20% or so of which
seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time
go down until, say, somebody JIT's slot_getattr and slot_deform_tuple.
:-)

One thing that might be worth doing is hammering on the AllocSetAlloc
time. I think that's largely caused by allocating space for heap
tuples and then freeing them and allocating space for new heap tuples.
Gather/Gather Merge are guilty of that, but I think there may be other
places in the executor with the same issue. Maybe we could have
fixed-size buffers for small tuples that just get reused and only
palloc for large tuples (cf. SLAB_SLOT_SIZE).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

skip-gather-project-v1.patchapplication/octet-stream; name=skip-gather-project-v1.patchDownload
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 5dfc49deb9..837abc0f01 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -23,8 +23,6 @@
 #include "utils/memutils.h"
 
 
-static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
-
 
 /*
  * ExecScanFetch -- check interrupts & fetch next potential tuple
@@ -237,8 +235,9 @@ void
 ExecAssignScanProjectionInfo(ScanState *node)
 {
 	Scan	   *scan = (Scan *) node->ps.plan;
+	TupleDesc	tupdesc = node->ss_ScanTupleSlot->tts_tupleDescriptor;
 
-	ExecAssignScanProjectionInfoWithVarno(node, scan->scanrelid);
+	ExecConditionalAssignProjectionInfo(&node->ps, tupdesc, scan->scanrelid);
 }
 
 /*
@@ -248,75 +247,9 @@ ExecAssignScanProjectionInfo(ScanState *node)
 void
 ExecAssignScanProjectionInfoWithVarno(ScanState *node, Index varno)
 {
-	Scan	   *scan = (Scan *) node->ps.plan;
-
-	if (tlist_matches_tupdesc(&node->ps,
-							  scan->plan.targetlist,
-							  varno,
-							  node->ss_ScanTupleSlot->tts_tupleDescriptor))
-		node->ps.ps_ProjInfo = NULL;
-	else
-		ExecAssignProjectionInfo(&node->ps,
-								 node->ss_ScanTupleSlot->tts_tupleDescriptor);
-}
-
-static bool
-tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc)
-{
-	int			numattrs = tupdesc->natts;
-	int			attrno;
-	bool		hasoid;
-	ListCell   *tlist_item = list_head(tlist);
-
-	/* Check the tlist attributes */
-	for (attrno = 1; attrno <= numattrs; attrno++)
-	{
-		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
-		Var		   *var;
-
-		if (tlist_item == NULL)
-			return false;		/* tlist too short */
-		var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
-		if (!var || !IsA(var, Var))
-			return false;		/* tlist item not a Var */
-		/* if these Asserts fail, planner messed up */
-		Assert(var->varno == varno);
-		Assert(var->varlevelsup == 0);
-		if (var->varattno != attrno)
-			return false;		/* out of order */
-		if (att_tup->attisdropped)
-			return false;		/* table contains dropped columns */
-
-		/*
-		 * Note: usually the Var's type should match the tupdesc exactly, but
-		 * in situations involving unions of columns that have different
-		 * typmods, the Var may have come from above the union and hence have
-		 * typmod -1.  This is a legitimate situation since the Var still
-		 * describes the column, just not as exactly as the tupdesc does. We
-		 * could change the planner to prevent it, but it'd then insert
-		 * projection steps just to convert from specific typmod to typmod -1,
-		 * which is pretty silly.
-		 */
-		if (var->vartype != att_tup->atttypid ||
-			(var->vartypmod != att_tup->atttypmod &&
-			 var->vartypmod != -1))
-			return false;		/* type mismatch */
-
-		tlist_item = lnext(tlist_item);
-	}
-
-	if (tlist_item)
-		return false;			/* tlist too long */
-
-	/*
-	 * If the plan context requires a particular hasoid setting, then that has
-	 * to match, too.
-	 */
-	if (ExecContextForcesOids(ps, &hasoid) &&
-		hasoid != tupdesc->tdhasoid)
-		return false;
+	TupleDesc	tupdesc = node->ss_ScanTupleSlot->tts_tupleDescriptor;
 
-	return true;
+	ExecConditionalAssignProjectionInfo(&node->ps, tupdesc, varno);
 }
 
 /*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index e8c06c7605..876439835a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -56,6 +56,7 @@
 #include "utils/typcache.h"
 
 
+static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
 static void ShutdownExprContext(ExprContext *econtext, bool isCommit);
 
 
@@ -504,6 +505,85 @@ ExecAssignProjectionInfo(PlanState *planstate,
 
 
 /* ----------------
+ *		ExecConditionalAssignProjectionInfo
+ *
+ * as ExecAssignProjectionInfo, but store NULL rather than building projection
+ * info if no projection is required
+ * ----------------
+ */
+void
+ExecConditionalAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc,
+									Index varno)
+{
+	if (tlist_matches_tupdesc(planstate,
+							  planstate->plan->targetlist,
+							  varno,
+							  inputDesc))
+		planstate->ps_ProjInfo = NULL;
+	else
+		ExecAssignProjectionInfo(planstate, inputDesc);
+}
+
+static bool
+tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc)
+{
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+	bool		hasoid;
+	ListCell   *tlist_item = list_head(tlist);
+
+	/* Check the tlist attributes */
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		Var		   *var;
+
+		if (tlist_item == NULL)
+			return false;		/* tlist too short */
+		var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
+		if (!var || !IsA(var, Var))
+			return false;		/* tlist item not a Var */
+		/* if these Asserts fail, planner messed up */
+		Assert(var->varno == varno);
+		Assert(var->varlevelsup == 0);
+		if (var->varattno != attrno)
+			return false;		/* out of order */
+		if (att_tup->attisdropped)
+			return false;		/* table contains dropped columns */
+
+		/*
+		 * Note: usually the Var's type should match the tupdesc exactly, but
+		 * in situations involving unions of columns that have different
+		 * typmods, the Var may have come from above the union and hence have
+		 * typmod -1.  This is a legitimate situation since the Var still
+		 * describes the column, just not as exactly as the tupdesc does. We
+		 * could change the planner to prevent it, but it'd then insert
+		 * projection steps just to convert from specific typmod to typmod -1,
+		 * which is pretty silly.
+		 */
+		if (var->vartype != att_tup->atttypid ||
+			(var->vartypmod != att_tup->atttypmod &&
+			 var->vartypmod != -1))
+			return false;		/* type mismatch */
+
+		tlist_item = lnext(tlist_item);
+	}
+
+	if (tlist_item)
+		return false;			/* tlist too long */
+
+	/*
+	 * If the plan context requires a particular hasoid setting, then that has
+	 * to match, too.
+	 */
+	if (ExecContextForcesOids(ps, &hasoid) &&
+		hasoid != tupdesc->tdhasoid)
+		return false;
+
+	return true;
+}
+
+/* ----------------
  *		ExecFreeExprContext
  *
  * A plan node's ExprContext should be freed explicitly during executor
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 639f4f5af8..a1f0f7800e 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -102,12 +102,6 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
 	outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
 
 	/*
-	 * Initialize result tuple type and projection info.
-	 */
-	ExecAssignResultTypeFromTL(&gatherstate->ps);
-	ExecAssignProjectionInfo(&gatherstate->ps, NULL);
-
-	/*
 	 * Initialize funnel slot to same tuple descriptor as outer plan.
 	 */
 	if (!ExecContextForcesOids(&gatherstate->ps, &hasoid))
@@ -115,6 +109,12 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
 	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
 	ExecSetSlotDescriptor(gatherstate->funnel_slot, tupDesc);
 
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&gatherstate->ps);
+	ExecConditionalAssignProjectionInfo(&gatherstate->ps, tupDesc, OUTER_VAR);
+
 	return gatherstate;
 }
 
@@ -217,6 +217,10 @@ ExecGather(PlanState *pstate)
 	if (TupIsNull(slot))
 		return NULL;
 
+	/* If no projection is required, we're done. */
+	if (node->ps.ps_ProjInfo == NULL)
+		return slot;
+
 	/*
 	 * Form the result tuple using ExecProject(), and return it.
 	 */
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 5625b12521..6da607b7c4 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -115,10 +115,19 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
 	outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
 
 	/*
+	 * Store the tuple descriptor into gather merge state, so we can use it
+	 * while initializing the gather merge slots.
+	 */
+	if (!ExecContextForcesOids(&gm_state->ps, &hasoid))
+		hasoid = false;
+	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
+	gm_state->tupDesc = tupDesc;
+
+	/*
 	 * Initialize result tuple type and projection info.
 	 */
 	ExecAssignResultTypeFromTL(&gm_state->ps);
-	ExecAssignProjectionInfo(&gm_state->ps, NULL);
+	ExecConditionalAssignProjectionInfo(&gm_state->ps, tupDesc, OUTER_VAR);
 
 	/*
 	 * initialize sort-key information
@@ -150,15 +159,6 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
 		}
 	}
 
-	/*
-	 * Store the tuple descriptor into gather merge state, so we can use it
-	 * while initializing the gather merge slots.
-	 */
-	if (!ExecContextForcesOids(&gm_state->ps, &hasoid))
-		hasoid = false;
-	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
-	gm_state->tupDesc = tupDesc;
-
 	/* Now allocate the workspace for gather merge */
 	gather_merge_setup(gm_state);
 
@@ -253,6 +253,10 @@ ExecGatherMerge(PlanState *pstate)
 	if (TupIsNull(slot))
 		return NULL;
 
+	/* If no projection is required, we're done. */
+	if (node->ps.ps_ProjInfo == NULL)
+		return slot;
+
 	/*
 	 * Form the result tuple using ExecProject(), and return it.
 	 */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c4ecf0d50f..9c268407e7 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -495,6 +495,8 @@ extern void ExecAssignResultTypeFromTL(PlanState *planstate);
 extern TupleDesc ExecGetResultType(PlanState *planstate);
 extern void ExecAssignProjectionInfo(PlanState *planstate,
 						 TupleDesc inputDesc);
+extern void ExecConditionalAssignProjectionInfo(PlanState *planstate,
+									TupleDesc inputDesc, Index varno);
 extern void ExecFreeExprContext(PlanState *planstate);
 extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
 extern void ExecAssignScanTypeFromOuterPlan(ScanState *scanstate);
shm-mq-less-spinlocks-v2.patchapplication/octet-stream; name=shm-mq-less-spinlocks-v2.patchDownload
From 10feddc5228ff3b4ae026717717f4a143dcc0e64 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 17:42:53 +0100
Subject: [PATCH 2/4] shm-mq-less-spinlocks-v2.patch

---
 src/backend/storage/ipc/shm_mq.c | 237 +++++++++++++++++++--------------------
 1 file changed, 116 insertions(+), 121 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 770559a03e..75c6bbd4fb 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -31,27 +31,27 @@
  * Some notes on synchronization:
  *
  * mq_receiver and mq_bytes_read can only be changed by the receiver; and
- * mq_sender and mq_bytes_written can only be changed by the sender.  However,
- * because most of these fields are 8 bytes and we don't assume that 8 byte
- * reads and writes are atomic, the spinlock must be taken whenever the field
- * is updated, and whenever it is read by a process other than the one allowed
- * to modify it. But the process that is allowed to modify it is also allowed
- * to read it without the lock.  On architectures where 8-byte writes are
- * atomic, we could replace these spinlocks with memory barriers, but
- * testing found no performance benefit, so it seems best to keep things
- * simple for now.
+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.
  *
- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.
  *
  * mq_ring_size and mq_ring_offset never change after initialization, and
  * can therefore be read without the lock.
  *
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * At any given time, the difference between mq_bytes_read and
  * mq_bytes_written defines the number of bytes within mq_ring that contain
  * unread data, and mq_bytes_read defines the position where those bytes
  * begin.  The sender can increase the number of unread bytes at any time,
@@ -71,8 +71,8 @@ struct shm_mq
 	slock_t		mq_mutex;
 	PGPROC	   *mq_receiver;
 	PGPROC	   *mq_sender;
-	uint64		mq_bytes_read;
-	uint64		mq_bytes_written;
+	pg_atomic_uint64 mq_bytes_read;
+	pg_atomic_uint64 mq_bytes_written;
 	Size		mq_ring_size;
 	bool		mq_detached;
 	uint8		mq_ring_offset;
@@ -150,11 +150,8 @@ static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 					 BackgroundWorkerHandle *handle);
-static uint64 shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n);
-static uint64 shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n);
-static shm_mq_result shm_mq_notify_receiver(volatile shm_mq *mq);
 static void shm_mq_detach_callback(dsm_segment *seg, Datum arg);
 
 /* Minimum queue size is enough for header and at least one chunk of data. */
@@ -182,8 +179,8 @@ shm_mq_create(void *address, Size size)
 	SpinLockInit(&mq->mq_mutex);
 	mq->mq_receiver = NULL;
 	mq->mq_sender = NULL;
-	mq->mq_bytes_read = 0;
-	mq->mq_bytes_written = 0;
+	pg_atomic_init_u64(&mq->mq_bytes_read, 0);
+	pg_atomic_init_u64(&mq->mq_bytes_written, 0);
 	mq->mq_ring_size = size - data_offset;
 	mq->mq_detached = false;
 	mq->mq_ring_offset = data_offset - offsetof(shm_mq, mq_ring);
@@ -352,6 +349,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	PGPROC	   *receiver;
 	Size		nbytes = 0;
 	Size		bytes_written;
 	int			i;
@@ -492,8 +490,30 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_length_word_complete = false;
 
+	/* If queue has been detached, let caller know. */
+	if (mq->mq_detached)
+		return SHM_MQ_DETACHED;
+
+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */
+	if (mqh->mqh_counterparty_attached)
+		receiver = mq->mq_receiver;
+	else
+	{
+		SpinLockAcquire(&mq->mq_mutex);
+		receiver = mq->mq_receiver;
+		SpinLockRelease(&mq->mq_mutex);
+		if (receiver == NULL)
+			return SHM_MQ_SUCCESS;
+		mqh->mqh_counterparty_attached = true;
+	}
+
 	/* Notify receiver of the newly-written data, and return. */
-	return shm_mq_notify_receiver(mq);
+	SetLatch(&receiver->procLatch);
+	return SHM_MQ_SUCCESS;
 }
 
 /*
@@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 
 	while (sent < nbytes)
 	{
-		bool		detached;
 		uint64		rb;
+		uint64		wb;
 
 		/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;
 		Assert(used <= ringsize);
 		available = Min(ringsize - used, nbytes - sent);
 
 		/* Bail out if the queue has been detached. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			*bytes_written = sent;
 			return SHM_MQ_DETACHED;
@@ -900,15 +921,13 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else if (available == 0)
 		{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			Assert(mqh->mqh_counterparty_attached);
+			SetLatch(&mq->mq_receiver->procLatch);
 
 			/* Skip manipulation of our latch if nowait = true. */
 			if (nowait)
@@ -934,10 +953,18 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else
 		{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
+
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);
 
-			/* Write as much data as we can via a single memcpy(). */
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 */
+			pg_memory_barrier();
 			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
 				   (char *) data + sent, sendnow);
 			sent += sendnow;
@@ -983,19 +1010,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 	for (;;)
 	{
 		Size		offset;
-		bool		detached;
+		uint64		read;
 
 		/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+		used = written - read;
 		Assert(used <= ringsize);
-		offset = mq->mq_bytes_read % (uint64) ringsize;
+		offset = read % (uint64) ringsize;
 
 		/* If we have enough data or buffer has wrapped, we're done. */
 		if (used >= bytes_needed || offset + used >= ringsize)
 		{
 			*nbytesp = Min(used, ringsize - offset);
 			*datap = &mq->mq_ring[mq->mq_ring_offset + offset];
+
+			/*
+			 * Separate the read of mq_bytes_written, above, from caller's
+			 * attempt to read the data itself.  Pairs with the barrier in
+			 * shm_mq_inc_bytes_written.
+			 */
+			pg_read_barrier();
 			return SHM_MQ_SUCCESS;
 		}
 
@@ -1007,7 +1042,7 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		 * receiving a message stored in the buffer even after the sender has
 		 * detached.
 		 */
-		if (detached)
+		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1037,16 +1072,10 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 static bool
 shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 {
-	bool		detached;
 	pid_t		pid;
 
-	/* Acquire the lock just long enough to check the pointer. */
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
 	/* If the queue has been detached, counterparty is definitely gone. */
-	if (detached)
+	if (mq->mq_detached)
 		return true;
 
 	/* If there's a handle, check worker status. */
@@ -1059,9 +1088,7 @@ shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 		if (status != BGWH_STARTED && status != BGWH_NOT_YET_STARTED)
 		{
 			/* Mark it detached, just to make it official. */
-			SpinLockAcquire(&mq->mq_mutex);
 			mq->mq_detached = true;
-			SpinLockRelease(&mq->mq_mutex);
 			return true;
 		}
 	}
@@ -1091,16 +1118,14 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	{
 		BgwHandleStatus status;
 		pid_t		pid;
-		bool		detached;
 
 		/* Acquire the lock just long enough to check the pointer. */
 		SpinLockAcquire(&mq->mq_mutex);
-		detached = mq->mq_detached;
 		result = (*ptr != NULL);
 		SpinLockRelease(&mq->mq_mutex);
 
 		/* Fail if detached; else succeed if initialized. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			result = false;
 			break;
@@ -1133,23 +1158,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 }
 
 /*
- * Get the number of bytes read.  The receiver need not use this to access
- * the count of bytes read, but the sender must.
- */
-static uint64
-shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_read;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes read.
  */
 static void
@@ -1157,63 +1165,50 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
 	PGPROC	   *sender;
 
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes().
+	 * We only need a read barrier here because the increment of mq_bytes_read
+	 * is actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method should be cheaper.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_read,
+						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
+
+	/*
+	 * We shouldn't have any bytes to read without a sender, so we can read
+	 * mq_sender here without a lock.  Once it's initialized, it can't change.
+	 */
 	sender = mq->mq_sender;
-	SpinLockRelease(&mq->mq_mutex);
-
-	/* We shouldn't have any bytes to read without a sender. */
 	Assert(sender != NULL);
 	SetLatch(&sender->procLatch);
 }
 
 /*
- * Get the number of bytes written.  The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_written;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes written.
  */
 static void
 shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n)
 {
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_written += n;
-	SpinLockRelease(&mq->mq_mutex);
-}
-
-/*
- * Set receiver's latch, unless queue is detached.
- */
-static shm_mq_result
-shm_mq_notify_receiver(volatile shm_mq *mq)
-{
-	PGPROC	   *receiver;
-	bool		detached;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	receiver = mq->mq_receiver;
-	SpinLockRelease(&mq->mq_mutex);
-
-	if (detached)
-		return SHM_MQ_DETACHED;
-	if (receiver)
-		SetLatch(&receiver->procLatch);
-	return SHM_MQ_SUCCESS;
+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with the read barrier found in
+	 * shm_mq_get_receive_bytes.
+	 */
+	pg_write_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_written,
+						pg_atomic_read_u64(&mq->mq_bytes_written) + n);
 }
 
 /* Shim for on_dsm_callback. */
-- 
2.13.5 (Apple Git-94)

shm-mq-reduce-receiver-latch-set-v1.patchapplication/octet-stream; name=shm-mq-reduce-receiver-latch-set-v1.patchDownload
From c8e330ca4d7def18cbfea5e54d03bdaa6775e42c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 3/4] shm-mq-reduce-receiver-latch-set-v1

---
 src/backend/storage/ipc/shm_mq.c | 69 +++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 75c6bbd4fb..ceeb487390 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -142,10 +142,10 @@ struct shm_mq_handle
 };
 
 static void shm_mq_detach_internal(shm_mq *mq);
-static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
+static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes,
 				  const void *data, bool nowait, Size *bytes_written);
-static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
-					 bool nowait, Size *nbytesp, void **datap);
+static shm_mq_result shm_mq_receive_bytes(shm_mq_handle *mqh,
+				  Size bytes_needed, bool nowait, Size *nbytesp, void **datap);
 static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
@@ -585,8 +585,14 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_counterparty_attached = true;
 	}
 
-	/* Consume any zero-copy data from previous receive operation. */
-	if (mqh->mqh_consume_pending > 0)
+	/*
+	 * If we've consumed an amount of data greater than 1/4th of the ring
+	 * size, mark it consumed in shared memory.  We try to avoid doing this
+	 * unnecessarily when only a small amount of data has been consumed,
+	 * because SetLatch() is fairly expensive and we don't want to do it
+	 * too often.
+	 */
+	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
 	{
 		shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
 		mqh->mqh_consume_pending = 0;
@@ -597,7 +603,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 	{
 		/* Try to receive the message length word. */
 		Assert(mqh->mqh_partial_bytes < sizeof(Size));
-		res = shm_mq_receive_bytes(mq, sizeof(Size) - mqh->mqh_partial_bytes,
+		res = shm_mq_receive_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
 								   nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
@@ -617,13 +623,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			needed = MAXALIGN(sizeof(Size)) + MAXALIGN(nbytes);
 			if (rb >= needed)
 			{
-				/*
-				 * Technically, we could consume the message length
-				 * information at this point, but the extra write to shared
-				 * memory wouldn't be free and in most cases we would reap no
-				 * benefit.
-				 */
-				mqh->mqh_consume_pending = needed;
+				mqh->mqh_consume_pending += needed;
 				*nbytesp = nbytes;
 				*datap = ((char *) rawdata) + MAXALIGN(sizeof(Size));
 				return SHM_MQ_SUCCESS;
@@ -635,7 +635,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			 */
 			mqh->mqh_expected_bytes = nbytes;
 			mqh->mqh_length_word_complete = true;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(sizeof(Size)));
+			mqh->mqh_consume_pending += MAXALIGN(sizeof(Size));
 			rb -= MAXALIGN(sizeof(Size));
 		}
 		else
@@ -654,7 +654,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			}
 			Assert(mqh->mqh_buflen >= sizeof(Size));
 
-			/* Copy and consume partial length word. */
+			/* Copy partial length word; remember to consume it. */
 			if (mqh->mqh_partial_bytes + rb > sizeof(Size))
 				lengthbytes = sizeof(Size) - mqh->mqh_partial_bytes;
 			else
@@ -662,7 +662,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			memcpy(&mqh->mqh_buffer[mqh->mqh_partial_bytes], rawdata,
 				   lengthbytes);
 			mqh->mqh_partial_bytes += lengthbytes;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(lengthbytes));
+			mqh->mqh_consume_pending += MAXALIGN(lengthbytes);
 			rb -= lengthbytes;
 
 			/* If we now have the whole word, we're ready to read payload. */
@@ -684,13 +684,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		 * we need not copy the data and can return a pointer directly into
 		 * shared memory.
 		 */
-		res = shm_mq_receive_bytes(mq, nbytes, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, nbytes, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb >= nbytes)
 		{
 			mqh->mqh_length_word_complete = false;
-			mqh->mqh_consume_pending = MAXALIGN(nbytes);
+			mqh->mqh_consume_pending += MAXALIGN(nbytes);
 			*nbytesp = nbytes;
 			*datap = rawdata;
 			return SHM_MQ_SUCCESS;
@@ -730,13 +730,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_partial_bytes += rb;
 
 		/*
-		 * Update count of bytes read, with alignment padding.  Note that this
-		 * will never actually insert any padding except at the end of a
-		 * message, because the buffer size is a multiple of MAXIMUM_ALIGNOF,
-		 * and each read and write is as well.
+		 * Update count of bytes that can be consumed, accounting for
+		 * alignment padding.  Note that this will never actually insert any
+		 * padding except at the end of a message, because the buffer size is
+		 * a multiple of MAXIMUM_ALIGNOF, and each read and write is as well.
 		 */
 		Assert(mqh->mqh_partial_bytes == nbytes || rb == MAXALIGN(rb));
-		shm_mq_inc_bytes_read(mq, MAXALIGN(rb));
+		mqh->mqh_consume_pending += MAXALIGN(rb);
 
 		/* If we got all the data, exit the loop. */
 		if (mqh->mqh_partial_bytes >= nbytes)
@@ -744,7 +744,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 
 		/* Wait for some more data. */
 		still_needed = nbytes - mqh->mqh_partial_bytes;
-		res = shm_mq_receive_bytes(mq, still_needed, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, still_needed, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb > still_needed)
@@ -1000,9 +1000,10 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
  * is SHM_MQ_SUCCESS.
  */
 static shm_mq_result
-shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
+shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 					 Size *nbytesp, void **datap)
 {
+	shm_mq	   *mq = mqh->mqh_queue;
 	Size		ringsize = mq->mq_ring_size;
 	uint64		used;
 	uint64		written;
@@ -1014,7 +1015,13 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 
 		/* Get bytes written, so we can compute what's available to read. */
 		written = pg_atomic_read_u64(&mq->mq_bytes_written);
-		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+
+		/*
+		 * Get bytes read.  Include bytes we could consume but have not yet
+		 * consumed.
+		 */
+		read = pg_atomic_read_u64(&mq->mq_bytes_read) +
+			mqh->mqh_consume_pending;
 		used = written - read;
 		Assert(used <= ringsize);
 		offset = read % (uint64) ringsize;
@@ -1045,6 +1052,16 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
+		/*
+		 * We didn't get enough data to satisfy the request, so mark any
+		 * data previously-consumed as read to make more buffer space.
+		 */
+		if (mqh->mqh_consume_pending > 0)
+		{
+			shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
+			mqh->mqh_consume_pending = 0;
+		}
+
 		/* Skip manipulation of our latch if nowait = true. */
 		if (nowait)
 			return SHM_MQ_WOULD_BLOCK;
-- 
2.13.5 (Apple Git-94)

remove-memory-leak-protection-v1.patchapplication/octet-stream; name=remove-memory-leak-protection-v1.patchDownload
From 1056a963259ef0601bba6b1063d4a9d78bb52f57 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 23:44:02 +0100
Subject: [PATCH 4/4] remove-memory-leak-protection-v1

---
 src/backend/executor/nodeGather.c      | 14 ++------------
 src/backend/executor/nodeGatherMerge.c | 10 +---------
 src/backend/executor/tqueue.c          |  2 ++
 3 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a1f0f7800e..1bdde15f89 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -129,7 +129,6 @@ static TupleTableSlot *
 ExecGather(PlanState *pstate)
 {
 	GatherState *node = castNode(GatherState, pstate);
-	TupleTableSlot *fslot = node->funnel_slot;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
 
@@ -201,11 +200,8 @@ ExecGather(PlanState *pstate)
 
 	/*
 	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.  This will also clear
-	 * any previous tuple returned by a TupleQueueReader; to make sure we
-	 * don't leave a dangling pointer around, clear the working slot first.
+	 * storage allocated in the previous tuple cycle.
 	 */
-	ExecClearTuple(fslot);
 	econtext = node->ps.ps_ExprContext;
 	ResetExprContext(econtext);
 
@@ -254,7 +250,6 @@ gather_getnext(GatherState *gatherstate)
 	PlanState  *outerPlan = outerPlanState(gatherstate);
 	TupleTableSlot *outerTupleSlot;
 	TupleTableSlot *fslot = gatherstate->funnel_slot;
-	MemoryContext tupleContext = gatherstate->ps.ps_ExprContext->ecxt_per_tuple_memory;
 	HeapTuple	tup;
 
 	while (gatherstate->nreaders > 0 || gatherstate->need_to_scan_locally)
@@ -263,12 +258,7 @@ gather_getnext(GatherState *gatherstate)
 
 		if (gatherstate->nreaders > 0)
 		{
-			MemoryContext oldContext;
-
-			/* Run TupleQueueReaders in per-tuple context */
-			oldContext = MemoryContextSwitchTo(tupleContext);
 			tup = gather_readnext(gatherstate);
-			MemoryContextSwitchTo(oldContext);
 
 			if (HeapTupleIsValid(tup))
 			{
@@ -276,7 +266,7 @@ gather_getnext(GatherState *gatherstate)
 							   fslot,	/* slot in which to store the tuple */
 							   InvalidBuffer,	/* buffer associated with this
 												 * tuple */
-							   false);	/* slot should not pfree tuple */
+							   true);	/* pfree tuple when done with it */
 				return fslot;
 			}
 		}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 6da607b7c4..1f9818233b 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -605,7 +605,7 @@ load_tuple_array(GatherMergeState *gm_state, int reader)
 								  &tuple_buffer->done);
 		if (!HeapTupleIsValid(tuple))
 			break;
-		tuple_buffer->tuple[i] = heap_copytuple(tuple);
+		tuple_buffer->tuple[i] = tuple;
 		tuple_buffer->nTuples++;
 	}
 }
@@ -669,7 +669,6 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
 								&tuple_buffer->done);
 		if (!HeapTupleIsValid(tup))
 			return false;
-		tup = heap_copytuple(tup);
 
 		/*
 		 * Attempt to read more tuples in nowait mode and store them in the
@@ -699,20 +698,13 @@ gm_readnext_tuple(GatherMergeState *gm_state, int nreader, bool nowait,
 {
 	TupleQueueReader *reader;
 	HeapTuple	tup;
-	MemoryContext oldContext;
-	MemoryContext tupleContext;
 
 	/* Check for async events, particularly messages from workers. */
 	CHECK_FOR_INTERRUPTS();
 
 	/* Attempt to read a tuple. */
 	reader = gm_state->reader[nreader - 1];
-
-	/* Run TupleQueueReaders in per-tuple context */
-	tupleContext = gm_state->ps.ps_ExprContext->ecxt_per_tuple_memory;
-	oldContext = MemoryContextSwitchTo(tupleContext);
 	tup = TupleQueueReaderNext(reader, nowait, done);
-	MemoryContextSwitchTo(oldContext);
 
 	return tup;
 }
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index e9a5d5a1a5..9fb4d7bc79 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -161,6 +161,8 @@ DestroyTupleQueueReader(TupleQueueReader *reader)
  * is set to true when there are no remaining tuples and otherwise to false.
  *
  * The returned tuple, if any, is allocated in CurrentMemoryContext.
+ * Note that this routine must not leak memory!  (We used to allow that,
+ * but not any more.)
  *
  * Even when shm_mq_receive() returns SHM_MQ_WOULD_BLOCK, this can still
  * accumulate bytes from a partially-read message, so it's useful to call
-- 
2.13.5 (Apple Git-94)

#20Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#19)
Re: [POC] Faster processing at Gather node

On 2017-11-05 01:05:59 +0100, Robert Haas wrote:

skip-gather-project-v1.patch does what it says on the tin. I still
don't have a test case for this, and I didn't find that it helped very
much, but it would probably help more in a test case with more
columns, and you said this looked like a big bottleneck in your
testing, so here you go.

The query where that showed a big benefit was

SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;

(i.e a not very selective filter, and then just throwing the results away)

still shows quite massive benefits:

before:
set parallel_setup_cost=0;set parallel_tuple_cost=0;set min_parallel_table_scan_size=0;set max_parallel_workers_per_gather=8;
tpch_5[17938][1]=# explain analyze SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual time=8675.097..8675.097 rows=0 loops=1)
│ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual time=0.289..7904.849 rows=26989780 loops=1)
│ Workers Planned: 8
│ Workers Launched: 7
│ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67 rows=3375405 width=127) (actual time=0.124..528.667 rows=3373722 loops=8)
│ Filter: (l_suppkey > 5012)
│ Rows Removed by Filter: 376252
│ Planning time: 0.098 ms
│ Execution time: 8676.125 ms
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(9 rows)
after:
tpch_5[19754][1]=# EXPLAIN ANALYZE SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual time=5984.916..5984.916 rows=0 loops=1)
│ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual time=0.214..5123.238 rows=26989780 loops=1)
│ Workers Planned: 8
│ Workers Launched: 7
│ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67 rows=3375405 width=127) (actual time=0.025..649.887 rows=3373722 loops=8)
│ Filter: (l_suppkey > 5012)
│ Rows Removed by Filter: 376252
│ Planning time: 0.076 ms
│ Execution time: 5986.171 ms
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(9 rows)

so there clearly is still benefit (this is scale 100, but that shouldn't
make much of a difference).

Did not review the code.

shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only
consume input from the shared queue when the amount of unconsumed
input exceeds 1/4 of the queue size. This caused a large performance
improvement in my testing because it causes the number of times the
latch gets set to drop dramatically. I experimented a bit with
thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be
enough to capture most of the benefit.

Hm. Is consuming the relevant part, or notifying the sender about it? I
suspect most of the benefit can be captured by updating bytes read (and
similarly on the other side w/ bytes written), but not setting the latch
unless thresholds are reached. The advantage of updating the value,
even without notifying the other side, is that in the common case that
the other side gets around to checking the queue without having blocked,
it'll see the updated value. If that works, that'd address the issue
that we might wait unnecessarily in a number of common cases.

Did not review the code.

remove-memory-leak-protection-v1.patch removes the memory leak
protection that Tom installed upon discovering that the original
version of tqueue.c leaked memory like crazy. I think that it
shouldn't do that any more, courtesy of
6b65a7fe62e129d5c2b85cd74d6a91d8f7564608. Assuming that's correct, we
can avoid a whole lot of tuple copying in Gather Merge and a much more
modest amount of overhead in Gather.

Yup, that conceptually makes sense.

Did not review the code.

Even with all of these patches applied, there's clearly still room for
more optimization, but MacOS's "sample" profiler seems to show that
the bottlenecks are largely shifting elsewhere:

Sort by top of stack, same collapsed (when >= 5):
slot_getattr (in postgres) 706
slot_deform_tuple (in postgres) 560
ExecAgg (in postgres) 378
ExecInterpExpr (in postgres) 372
AllocSetAlloc (in postgres) 319
_platform_memmove$VARIANT$Haswell (in
libsystem_platform.dylib) 314
read (in libsystem_kernel.dylib) 303
heap_compare_slots (in postgres) 296
combine_aggregates (in postgres) 273
shm_mq_receive_bytes (in postgres) 272

Interesting.  Here it's
+    8.79%  postgres  postgres            [.] ExecAgg
+    6.52%  postgres  postgres            [.] slot_deform_tuple
+    5.65%  postgres  postgres            [.] slot_getattr
+    4.59%  postgres  postgres            [.] shm_mq_send_bytes
+    3.66%  postgres  postgres            [.] ExecInterpExpr
+    3.44%  postgres  postgres            [.] AllocSetAlloc
+    3.08%  postgres  postgres            [.] heap_fill_tuple
+    2.34%  postgres  postgres            [.] heap_getnext
+    2.25%  postgres  postgres            [.] finalize_aggregates
+    2.08%  postgres  libc-2.24.so        [.] __memmove_avx_unaligned_erms
+    2.05%  postgres  postgres            [.] heap_compare_slots
+    1.99%  postgres  postgres            [.] execTuplesMatch
+    1.83%  postgres  postgres            [.] ExecStoreTuple
+    1.83%  postgres  postgres            [.] shm_mq_receive
+    1.81%  postgres  postgres            [.] ExecScan

I'm probably not super-excited about spending too much more time
trying to make the _platform_memmove time (only 20% or so of which
seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time
go down until, say, somebody JIT's slot_getattr and slot_deform_tuple.
:-)

Hm, let's say somebody were working on something like that. In that case
the benefits for this precise plan wouldn't yet be that big because a
good chunk of slot_getattr calls come from execTuplesMatch() which
doesn't really provide enough context to do JITing (when used for
hashaggs, there is more so it's JITed). Similarly gather merge's
heap_compare_slots() doesn't provide such context.

It's about ~9% currently, largely due to the faster aggregate
invocation. But the big benefit here would be all the deforming and the
comparisons...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
Re: [POC] Faster processing at Gather node

On Sun, Nov 5, 2017 at 2:24 AM, Andres Freund <andres@anarazel.de> wrote:

shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only
consume input from the shared queue when the amount of unconsumed
input exceeds 1/4 of the queue size. This caused a large performance
improvement in my testing because it causes the number of times the
latch gets set to drop dramatically. I experimented a bit with
thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be
enough to capture most of the benefit.

Hm. Is consuming the relevant part, or notifying the sender about it? I
suspect most of the benefit can be captured by updating bytes read (and
similarly on the other side w/ bytes written), but not setting the latch
unless thresholds are reached. The advantage of updating the value,
even without notifying the other side, is that in the common case that
the other side gets around to checking the queue without having blocked,
it'll see the updated value. If that works, that'd address the issue
that we might wait unnecessarily in a number of common cases.

I think it's mostly notifying the sender. Sending SIGUSR1 over and
over again isn't free, and it shows up in profiling. I thought about
what you're proposing here, but it seemed more complicated to
implement, and I'm not sure that there would be any benefit. The
reason is because, with these patches applied, even a radical
expansion of the queue size doesn't produce much incremental
performance benefit at least in the test case I was using. I can
increase the size of the tuple queues 10x or 100x and it really
doesn't help very much. And consuming sooner (but sometimes without
notifying) seems very similar to making the queue slightly bigger.

Also, what I see in general is that the CPU usage on the leader goes
to 100% but the workers are only maybe 20% saturated. Making the
leader work any harder than absolutely necessarily therefore seems
like it's probably counterproductive. I may be wrong, but it looks to
me like most of the remaining overhead seems to come from (1) the
synchronization overhead associated with memory barriers and (2)
backend-private work that isn't as cheap as would be ideal - e.g.
palloc overhead.

Interesting.  Here it's
+    8.79%  postgres  postgres            [.] ExecAgg
+    6.52%  postgres  postgres            [.] slot_deform_tuple
+    5.65%  postgres  postgres            [.] slot_getattr
+    4.59%  postgres  postgres            [.] shm_mq_send_bytes
+    3.66%  postgres  postgres            [.] ExecInterpExpr
+    3.44%  postgres  postgres            [.] AllocSetAlloc
+    3.08%  postgres  postgres            [.] heap_fill_tuple
+    2.34%  postgres  postgres            [.] heap_getnext
+    2.25%  postgres  postgres            [.] finalize_aggregates
+    2.08%  postgres  libc-2.24.so        [.] __memmove_avx_unaligned_erms
+    2.05%  postgres  postgres            [.] heap_compare_slots
+    1.99%  postgres  postgres            [.] execTuplesMatch
+    1.83%  postgres  postgres            [.] ExecStoreTuple
+    1.83%  postgres  postgres            [.] shm_mq_receive
+    1.81%  postgres  postgres            [.] ExecScan

More or less the same functions, somewhat different order.

I'm probably not super-excited about spending too much more time
trying to make the _platform_memmove time (only 20% or so of which
seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time
go down until, say, somebody JIT's slot_getattr and slot_deform_tuple.
:-)

Hm, let's say somebody were working on something like that. In that case
the benefits for this precise plan wouldn't yet be that big because a
good chunk of slot_getattr calls come from execTuplesMatch() which
doesn't really provide enough context to do JITing (when used for
hashaggs, there is more so it's JITed). Similarly gather merge's
heap_compare_slots() doesn't provide such context.

It's about ~9% currently, largely due to the faster aggregate
invocation. But the big benefit here would be all the deforming and the
comparisons...

I'm not sure I follow you here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Jim Van Fleet
vanfleet@us.ibm.com
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

Ran this change with hammerdb on a power 8 firestone

with 2 socket, 20 core
9.6 base -- 451991 NOPM
0926_master -- 464385 NOPM
11_04master -- 449177 NOPM
11_04_patch -- 431423 NOPM
-- two socket patch is a little down from previous base runs

With one socket
9.6 base -- 393727 NOPM
v10rc1_base -- 350958 NOPM
11_04master -- 306506 NOPM
11_04_patch -- 313179 NOPM
-- one socket 11_04 master is quite a bit down from 9.6 and v10rc1_base
-- the patch is up a bit over the base

Net -- the patch is about the same as current base on two socket, and on
one socket -- consistent with your pgbench (?) findings

As an aside, it is perhaps a worry that one socket is down over 20% from
9.6 and over 10% from v10rc1

Jim

pgsql-hackers-owner@postgresql.org wrote on 11/04/2017 06:08:31 AM:

On hydra (PPC), these changes didn't help much. Timings:

master: 29605.582, 29753.417, 30160.485
patch: 28218.396, 27986.951, 26465.584

That's about a 5-6% improvement. On my MacBook, though, the
improvement was quite a bit more:

master: 21436.745, 20978.355, 19918.617
patch: 15896.573, 15880.652, 15967.176

Median-to-median, that's about a 24% improvement.

Any reviews appreciated.

Thanks,

--
Robert Haas
EnterpriseDB: https://urldefense.proofpoint.com/v2/url?
u=http-3A__www.enterprisedb.com&d=DwIBaQ&c=jf_iaSHvJObTbx-
siA1ZOg&r=Glx_6-ZyGFPdLCdb8Jr7QJHrJIbUJO1z6oi-JHO8Htk&m=-

I8r3tfguIVgEpNumrjWTKOGkJWIbHQNT2M2-02-8cU&s=39p2vefOiiZS9ZooPYkZ97U66hw5osqmkCGcikgZCik&e=

The Enterprise PostgreSQL Company
[attachment "shm-mq-less-spinlocks-v1.2.patch" deleted by Jim Van
Fleet/Austin/Contr/IBM]
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
https://urldefense.proofpoint.com/v2/url?

u=http-3A__www.postgresql.org_mailpref_pgsql-2Dhackers&d=DwIDAg&c=jf_iaSHvJObTbx-

siA1ZOg&r=Glx_6-ZyGFPdLCdb8Jr7QJHrJIbUJO1z6oi-JHO8Htk&m=-

I8r3tfguIVgEpNumrjWTKOGkJWIbHQNT2M2-02-8cU&s=aL2TI3avKN4drlXk915UM2RFixyvUsZ2axDjB2FG9G0&e=

#23Andres Freund
andres@anarazel.de
In reply to: Jim Van Fleet (#22)
Re: [POC] Faster processing at Gather node

Hi,

On November 5, 2017 1:33:24 PM PST, Jim Van Fleet <vanfleet@us.ibm.com> wrote:

Ran this change with hammerdb on a power 8 firestone

with 2 socket, 20 core
9.6 base -- 451991 NOPM
0926_master -- 464385 NOPM
11_04master -- 449177 NOPM
11_04_patch -- 431423 NOPM
-- two socket patch is a little down from previous base runs

With one socket
9.6 base -- 393727 NOPM
v10rc1_base -- 350958 NOPM
11_04master -- 306506 NOPM
11_04_patch -- 313179 NOPM
-- one socket 11_04 master is quite a bit down from 9.6 and
v10rc1_base
-- the patch is up a bit over the base

Net -- the patch is about the same as current base on two socket, and
on
one socket -- consistent with your pgbench (?) findings

As an aside, it is perhaps a worry that one socket is down over 20%
from
9.6 and over 10% from v10rc1

What query(s) did you measure?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#20)
Re: [POC] Faster processing at Gather node

On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund <andres@anarazel.de> wrote

On 2017-11-05 01:05:59 +0100, Robert Haas wrote:

skip-gather-project-v1.patch does what it says on the tin. I still
don't have a test case for this, and I didn't find that it helped very
much,

I am also wondering in which case it can help and I can't think of the
case. Basically, as part of projection in the gather, I think we are
just deforming the tuple which we anyway need to perform before
sending the tuple to the client (printtup) or probably at the upper
level of the node.

and you said this looked like a big bottleneck in your
testing, so here you go.

Is it possible that it shows the bottleneck only for 'explain analyze'
statement as we don't deform the tuple for that at a later stage?

The query where that showed a big benefit was

SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;

(i.e a not very selective filter, and then just throwing the results away)

still shows quite massive benefits:

before:
set parallel_setup_cost=0;set parallel_tuple_cost=0;set min_parallel_table_scan_size=0;set max_parallel_workers_per_gather=8;
tpch_5[17938][1]=# explain analyze SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual time=8675.097..8675.097 rows=0 loops=1)
│ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual time=0.289..7904.849 rows=26989780 loops=1)
│ Workers Planned: 8
│ Workers Launched: 7
│ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67 rows=3375405 width=127) (actual time=0.124..528.667 rows=3373722 loops=8)
│ Filter: (l_suppkey > 5012)
│ Rows Removed by Filter: 376252
│ Planning time: 0.098 ms
│ Execution time: 8676.125 ms
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(9 rows)
after:
tpch_5[19754][1]=# EXPLAIN ANALYZE SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual time=5984.916..5984.916 rows=0 loops=1)
│ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual time=0.214..5123.238 rows=26989780 loops=1)
│ Workers Planned: 8
│ Workers Launched: 7
│ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67 rows=3375405 width=127) (actual time=0.025..649.887 rows=3373722 loops=8)
│ Filter: (l_suppkey > 5012)
│ Rows Removed by Filter: 376252
│ Planning time: 0.076 ms
│ Execution time: 5986.171 ms
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(9 rows)

so there clearly is still benefit (this is scale 100, but that shouldn't
make much of a difference).

Do you see the benefit if the query is executed without using Explain Analyze?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Jim Van Fleet
vanfleet@us.ibm.com
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

Andres Freund <andres@anarazel.de> wrote on 11/05/2017 03:40:15 PM:

hammerdb, in this configuration, runs a variant of tpcc

Show quoted text

What query(s) did you measure?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#26Andres Freund
andres@anarazel.de
In reply to: Jim Van Fleet (#25)
Re: [POC] Faster processing at Gather node

On November 6, 2017 7:30:49 AM PST, Jim Van Fleet <vanfleet@us.ibm.com> wrote:

Andres Freund <andres@anarazel.de> wrote on 11/05/2017 03:40:15 PM:

hammerdb, in this configuration, runs a variant of tpcc

Hard to believe that any of the changes here are relevant in that case - this is parallelism specific stuff. Whereas tpcc is oltp, right?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Jim Van Fleet
vanfleet@us.ibm.com
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

correct

Show quoted text

hammerdb, in this configuration, runs a variant of tpcc

Hard to believe that any of the changes here are relevant in that
case - this is parallelism specific stuff. Whereas tpcc is oltp, right?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#28Andres Freund
andres@anarazel.de
In reply to: Jim Van Fleet (#27)
Re: [POC] Faster processing at Gather node

Hi,

Please don't top-quote on postgresql lists.

On 2017-11-06 09:44:24 -0600, Jim Van Fleet wrote:

hammerdb, in this configuration, runs a variant of tpcc

Hard to believe that any of the changes here are relevant in that
case - this is parallelism specific stuff. Whereas tpcc is oltp, right?

correct

In that case, could you provide before/after profiles of the performance
changing runs?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Jim Van Fleet
vanfleet@us.ibm.com
In reply to: Rafia Sabih (#1)
Re: [POC] Faster processing at Gather node

Hi --

pgsql-hackers-owner@postgresql.org wrote on 11/06/2017 09:47:22 AM:

From: Andres Freund <andres@anarazel.de>

Hi,

Please don't top-quote on postgresql lists.

Sorry

On 2017-11-06 09:44:24 -0600, Jim Van Fleet wrote:

hammerdb, in this configuration, runs a variant of tpcc

Hard to believe that any of the changes here are relevant in that
case - this is parallelism specific stuff. Whereas tpcc is oltp,

right?

correct

In that case, could you provide before/after profiles of the performance
changing runs?

sure -- happy to share -- gzipped files (which include trace, perf,
netstat, system data) are are large (9G and 13G)
Should I post them on the list or somewhere else (or trim them -- if so,
what would you like to have?)

Jim

#30Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#24)
Re: [POC] Faster processing at Gather node

Hi,

On 2017-11-06 10:56:43 +0530, Amit Kapila wrote:

On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund <andres@anarazel.de> wrote

On 2017-11-05 01:05:59 +0100, Robert Haas wrote:

skip-gather-project-v1.patch does what it says on the tin. I still
don't have a test case for this, and I didn't find that it helped very
much,

I am also wondering in which case it can help and I can't think of the
case.

I'm confused? Isn't it fairly obvious that unnecessarily projecting
at the gather node is wasteful? Obviously depending on the query you'll
see smaller / bigger gains, but that there's beenfdits should be fairly obvious?

Basically, as part of projection in the gather, I think we are just
deforming the tuple which we anyway need to perform before sending the
tuple to the client (printtup) or probably at the upper level of the
node.

But in most cases you're not going to print millions of tuples, instead
you're going to apply some further operators ontop (e.g. the
OFFSET/LIMIT in my example).

and you said this looked like a big bottleneck in your
testing, so here you go.

Is it possible that it shows the bottleneck only for 'explain analyze'
statement as we don't deform the tuple for that at a later stage?

Doesn't matter, there's a OFFSET/LIMIT ontop of the query. Could just as
well be a sort node or something.

The query where that showed a big benefit was

SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;

(i.e a not very selective filter, and then just throwing the results away)

still shows quite massive benefits:

Do you see the benefit if the query is executed without using Explain Analyze?

Yes.

Before:
tpch_5[11878][1]=# SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 1000000000 LIMIT 1;^[[A
...
Time: 7590.196 ms (00:07.590)

After:
Time: 3862.955 ms (00:03.863)

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#30)
Re: [POC] Faster processing at Gather node

On Wed, Nov 8, 2017 at 1:02 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-06 10:56:43 +0530, Amit Kapila wrote:

On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund <andres@anarazel.de> wrote

On 2017-11-05 01:05:59 +0100, Robert Haas wrote:

skip-gather-project-v1.patch does what it says on the tin. I still
don't have a test case for this, and I didn't find that it helped very
much,

I am also wondering in which case it can help and I can't think of the
case.

I'm confused? Isn't it fairly obvious that unnecessarily projecting
at the gather node is wasteful? Obviously depending on the query you'll
see smaller / bigger gains, but that there's beenfdits should be fairly obvious?

I agree that there could be benefits depending on the statement. I
initially thought that we are kind of re-evaluating the expressions in
target list as part of projection even if worker backend has already
done that, but that was not the case and instead, we are deforming the
tuples sent by workers. Now, I think as a general principle it is a
good idea to delay the deforming as much as possible.

About the patch,

/*
- * Initialize result tuple type and projection info.
- */
- ExecAssignResultTypeFromTL(&gatherstate->ps);
- ExecAssignProjectionInfo(&gatherstate->ps, NULL);
-

- /*
* Initialize funnel slot to same tuple descriptor as outer plan.
*/
if (!ExecContextForcesOids(&gatherstate->ps, &hasoid))
@@ -115,6 +109,12 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
ExecSetSlotDescriptor(gatherstate->funnel_slot, tupDesc);

+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&gatherstate->ps);
+ ExecConditionalAssignProjectionInfo(&gatherstate->ps, tupDesc, OUTER_VAR);
+

This change looks suspicious to me. I think here we can't use the
tupDesc constructed from targetlist. One problem, I could see is that
the check for hasOid setting in tlist_matches_tupdesc won't give the
correct answer. In case of the scan, we use the tuple descriptor
stored in relation descriptor which will allow us to take the right
decision in tlist_matches_tupdesc. If you try the statement CREATE
TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in
force_parallel_mode=regress, then you can reproduce the problem I am
trying to highlight.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#31)
Re: [POC] Faster processing at Gather node

On Thu, Nov 9, 2017 at 12:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

This change looks suspicious to me. I think here we can't use the
tupDesc constructed from targetlist. One problem, I could see is that
the check for hasOid setting in tlist_matches_tupdesc won't give the
correct answer. In case of the scan, we use the tuple descriptor
stored in relation descriptor which will allow us to take the right
decision in tlist_matches_tupdesc. If you try the statement CREATE
TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in
force_parallel_mode=regress, then you can reproduce the problem I am
trying to highlight.

I tried this, but nothing seemed to be obviously broken. Then I
realized that the CREATE TABLE command wasn't using parallelism, so I
retried with parallel_setup_cost = 0, parallel_tuple_cost = 0, and
min_parallel_table_scan_size = 0. That got it to use parallel query,
but I still don't see anything broken. Can you clarify further?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#32)
Re: [POC] Faster processing at Gather node

On Fri, Nov 10, 2017 at 12:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 9, 2017 at 12:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

This change looks suspicious to me. I think here we can't use the
tupDesc constructed from targetlist. One problem, I could see is that
the check for hasOid setting in tlist_matches_tupdesc won't give the
correct answer. In case of the scan, we use the tuple descriptor
stored in relation descriptor which will allow us to take the right
decision in tlist_matches_tupdesc. If you try the statement CREATE
TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in
force_parallel_mode=regress, then you can reproduce the problem I am
trying to highlight.

I tried this, but nothing seemed to be obviously broken. Then I
realized that the CREATE TABLE command wasn't using parallelism, so I
retried with parallel_setup_cost = 0, parallel_tuple_cost = 0, and
min_parallel_table_scan_size = 0. That got it to use parallel query,
but I still don't see anything broken. Can you clarify further?

Have you set force_parallel_mode=regress; before running the
statement? If so, then why you need to tune other parallel query
related parameters?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#33)
Re: [POC] Faster processing at Gather node

On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Have you set force_parallel_mode=regress; before running the
statement?

Yes, I tried that first.

If so, then why you need to tune other parallel query
related parameters?

Because I couldn't get it to break the other way, I then tried this.

Instead of asking me what I did, can you tell me what I need to do?
Maybe a self-contained reproducible test case including exactly what
goes wrong on your end?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#34)
Re: [POC] Faster processing at Gather node

On Fri, Nov 10, 2017 at 8:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Have you set force_parallel_mode=regress; before running the
statement?

Yes, I tried that first.

If so, then why you need to tune other parallel query
related parameters?

Because I couldn't get it to break the other way, I then tried this.

Instead of asking me what I did, can you tell me what I need to do?
Maybe a self-contained reproducible test case including exactly what
goes wrong on your end?

I think we are missing something very basic because you should see the
failure by executing that statement in force_parallel_mode=regress
even in a freshly created database. I guess the missing point is that
I am using assertions enabled build and probably you are not (If this
is the reason, then it should have striked me first time). Anyway, I
am writing steps to reproduce the issue.

1. initdb
2. start server
3. connect using psql
4. set force_parallel_mode=regress;
5. Create Table as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r';

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#35)
Re: [POC] Faster processing at Gather node

On Fri, Nov 10, 2017 at 9:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 10, 2017 at 8:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Have you set force_parallel_mode=regress; before running the
statement?

Yes, I tried that first.

If so, then why you need to tune other parallel query
related parameters?

Because I couldn't get it to break the other way, I then tried this.

Instead of asking me what I did, can you tell me what I need to do?
Maybe a self-contained reproducible test case including exactly what
goes wrong on your end?

I think we are missing something very basic because you should see the
failure by executing that statement in force_parallel_mode=regress
even in a freshly created database.

I am seeing the assertion failure as below on executing the above
mentioned Create statement:

TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File:
"heapam.c", Line: 2634)
server closed the connection unexpectedly
This probably means the server terminated abnormally

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#36)
6 attachment(s)
Re: [POC] Faster processing at Gather node

On Fri, Nov 10, 2017 at 5:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am seeing the assertion failure as below on executing the above
mentioned Create statement:

TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File:
"heapam.c", Line: 2634)
server closed the connection unexpectedly
This probably means the server terminated abnormally

OK, I see it now. Not sure why I couldn't reproduce this before.

I think the problem is not actually with the code that I just wrote.
What I'm seeing is that the slot descriptor's tdhasoid value is false
for both the funnel slot and the result slot; therefore, we conclude
that no projection is needed to remove the OIDs. That seems to make
sense: if the funnel slot doesn't have OIDs and the result slot
doesn't have OIDs either, then we don't need to remove them.
Unfortunately, even though the funnel slot descriptor is marked
tdhashoid = false, the tuples being stored there actually do have
OIDs. And that is because they are coming from the underlying
sequential scan, which *also* has OIDs despite the fact that tdhasoid
for it's slot is false.

This had me really confused until I realized that there are two
processes involved. The problem is that we don't pass eflags down to
the child process -- so in the user backend, everybody agrees that
there shouldn't be OIDs anywhere, because EXEC_FLAG_WITHOUT_OIDS is
set. In the parallel worker, however, it's not set, so the worker
feels free to do whatever comes naturally, and in this test case that
happens to be returning tuples with OIDs. Patch for this attached.

I also noticed that the code that initializes the funnel slot is using
its own PlanState rather than the outer plan's PlanState to call
ExecContextForcesOids. I think that's formally incorrect, because the
goal is to end up with a slot that is the same as the outer plan's
slot. It doesn't matter because ExecContextForcesOids doesn't care
which PlanState it gets passed, but the comments in
ExecContextForcesOids imply that somebody it might, so perhaps it's
best to clean that up. Patch for this attached, too.

And here are the other patches again, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-pass-eflags-to-worker-v1.patchapplication/octet-stream; name=0001-pass-eflags-to-worker-v1.patchDownload
From 29c38882a6f99ab65bca20a55287bc418b453088 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 10 Nov 2017 10:00:50 -0500
Subject: [PATCH 1/6] pass-eflags-to-worker-v1

---
 src/backend/executor/execParallel.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 1b477baecb..80778e8a61 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -65,6 +65,7 @@
 typedef struct FixedParallelExecutorState
 {
 	int64		tuples_needed;	/* tuple bound, see ExecSetTupleBound */
+	int			eflags;
 } FixedParallelExecutorState;
 
 /*
@@ -511,6 +512,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers,
 	/* Store fixed-size state. */
 	fpes = shm_toc_allocate(pcxt->toc, sizeof(FixedParallelExecutorState));
 	fpes->tuples_needed = tuples_needed;
+	fpes->eflags = estate->es_top_eflags;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
 
 	/* Store query string */
@@ -1042,7 +1044,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	area = dsa_attach_in_place(area_space, seg);
 
 	/* Start up the executor */
-	ExecutorStart(queryDesc, 0);
+	ExecutorStart(queryDesc, fpes->eflags);
 
 	/* Special executor initialization steps for parallel workers */
 	queryDesc->planstate->state->es_query_dsa = area;
-- 
2.13.5 (Apple Git-94)

0002-forces-oids-neatnikism-v1.patchapplication/octet-stream; name=0002-forces-oids-neatnikism-v1.patchDownload
From a19635139eb262f995bb8d9fb094df678bc1bcd3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 10 Nov 2017 10:01:11 -0500
Subject: [PATCH 2/6] forces-oids-neatnikism-v1

---
 src/backend/executor/nodeGather.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 639f4f5af8..2181349d8a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -110,7 +110,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
 	/*
 	 * Initialize funnel slot to same tuple descriptor as outer plan.
 	 */
-	if (!ExecContextForcesOids(&gatherstate->ps, &hasoid))
+	if (!ExecContextForcesOids(outerPlanState(gatherstate), &hasoid))
 		hasoid = false;
 	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
 	ExecSetSlotDescriptor(gatherstate->funnel_slot, tupDesc);
-- 
2.13.5 (Apple Git-94)

0003-skip-gather-project-v2.patchapplication/octet-stream; name=0003-skip-gather-project-v2.patchDownload
From d16e92a127c498a0bedaece74241d0e1b9003071 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 17:42:37 +0100
Subject: [PATCH 3/6] skip-gather-project-v2

---
 src/backend/executor/execScan.c        | 75 ++-----------------------------
 src/backend/executor/execUtils.c       | 80 ++++++++++++++++++++++++++++++++++
 src/backend/executor/nodeGather.c      | 16 ++++---
 src/backend/executor/nodeGatherMerge.c | 24 +++++-----
 src/include/executor/executor.h        |  2 +
 5 files changed, 110 insertions(+), 87 deletions(-)

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 5dfc49deb9..837abc0f01 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -23,8 +23,6 @@
 #include "utils/memutils.h"
 
 
-static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
-
 
 /*
  * ExecScanFetch -- check interrupts & fetch next potential tuple
@@ -237,8 +235,9 @@ void
 ExecAssignScanProjectionInfo(ScanState *node)
 {
 	Scan	   *scan = (Scan *) node->ps.plan;
+	TupleDesc	tupdesc = node->ss_ScanTupleSlot->tts_tupleDescriptor;
 
-	ExecAssignScanProjectionInfoWithVarno(node, scan->scanrelid);
+	ExecConditionalAssignProjectionInfo(&node->ps, tupdesc, scan->scanrelid);
 }
 
 /*
@@ -248,75 +247,9 @@ ExecAssignScanProjectionInfo(ScanState *node)
 void
 ExecAssignScanProjectionInfoWithVarno(ScanState *node, Index varno)
 {
-	Scan	   *scan = (Scan *) node->ps.plan;
-
-	if (tlist_matches_tupdesc(&node->ps,
-							  scan->plan.targetlist,
-							  varno,
-							  node->ss_ScanTupleSlot->tts_tupleDescriptor))
-		node->ps.ps_ProjInfo = NULL;
-	else
-		ExecAssignProjectionInfo(&node->ps,
-								 node->ss_ScanTupleSlot->tts_tupleDescriptor);
-}
-
-static bool
-tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc)
-{
-	int			numattrs = tupdesc->natts;
-	int			attrno;
-	bool		hasoid;
-	ListCell   *tlist_item = list_head(tlist);
-
-	/* Check the tlist attributes */
-	for (attrno = 1; attrno <= numattrs; attrno++)
-	{
-		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
-		Var		   *var;
-
-		if (tlist_item == NULL)
-			return false;		/* tlist too short */
-		var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
-		if (!var || !IsA(var, Var))
-			return false;		/* tlist item not a Var */
-		/* if these Asserts fail, planner messed up */
-		Assert(var->varno == varno);
-		Assert(var->varlevelsup == 0);
-		if (var->varattno != attrno)
-			return false;		/* out of order */
-		if (att_tup->attisdropped)
-			return false;		/* table contains dropped columns */
-
-		/*
-		 * Note: usually the Var's type should match the tupdesc exactly, but
-		 * in situations involving unions of columns that have different
-		 * typmods, the Var may have come from above the union and hence have
-		 * typmod -1.  This is a legitimate situation since the Var still
-		 * describes the column, just not as exactly as the tupdesc does. We
-		 * could change the planner to prevent it, but it'd then insert
-		 * projection steps just to convert from specific typmod to typmod -1,
-		 * which is pretty silly.
-		 */
-		if (var->vartype != att_tup->atttypid ||
-			(var->vartypmod != att_tup->atttypmod &&
-			 var->vartypmod != -1))
-			return false;		/* type mismatch */
-
-		tlist_item = lnext(tlist_item);
-	}
-
-	if (tlist_item)
-		return false;			/* tlist too long */
-
-	/*
-	 * If the plan context requires a particular hasoid setting, then that has
-	 * to match, too.
-	 */
-	if (ExecContextForcesOids(ps, &hasoid) &&
-		hasoid != tupdesc->tdhasoid)
-		return false;
+	TupleDesc	tupdesc = node->ss_ScanTupleSlot->tts_tupleDescriptor;
 
-	return true;
+	ExecConditionalAssignProjectionInfo(&node->ps, tupdesc, varno);
 }
 
 /*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index e8c06c7605..876439835a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -56,6 +56,7 @@
 #include "utils/typcache.h"
 
 
+static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
 static void ShutdownExprContext(ExprContext *econtext, bool isCommit);
 
 
@@ -504,6 +505,85 @@ ExecAssignProjectionInfo(PlanState *planstate,
 
 
 /* ----------------
+ *		ExecConditionalAssignProjectionInfo
+ *
+ * as ExecAssignProjectionInfo, but store NULL rather than building projection
+ * info if no projection is required
+ * ----------------
+ */
+void
+ExecConditionalAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc,
+									Index varno)
+{
+	if (tlist_matches_tupdesc(planstate,
+							  planstate->plan->targetlist,
+							  varno,
+							  inputDesc))
+		planstate->ps_ProjInfo = NULL;
+	else
+		ExecAssignProjectionInfo(planstate, inputDesc);
+}
+
+static bool
+tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc)
+{
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+	bool		hasoid;
+	ListCell   *tlist_item = list_head(tlist);
+
+	/* Check the tlist attributes */
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		Var		   *var;
+
+		if (tlist_item == NULL)
+			return false;		/* tlist too short */
+		var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
+		if (!var || !IsA(var, Var))
+			return false;		/* tlist item not a Var */
+		/* if these Asserts fail, planner messed up */
+		Assert(var->varno == varno);
+		Assert(var->varlevelsup == 0);
+		if (var->varattno != attrno)
+			return false;		/* out of order */
+		if (att_tup->attisdropped)
+			return false;		/* table contains dropped columns */
+
+		/*
+		 * Note: usually the Var's type should match the tupdesc exactly, but
+		 * in situations involving unions of columns that have different
+		 * typmods, the Var may have come from above the union and hence have
+		 * typmod -1.  This is a legitimate situation since the Var still
+		 * describes the column, just not as exactly as the tupdesc does. We
+		 * could change the planner to prevent it, but it'd then insert
+		 * projection steps just to convert from specific typmod to typmod -1,
+		 * which is pretty silly.
+		 */
+		if (var->vartype != att_tup->atttypid ||
+			(var->vartypmod != att_tup->atttypmod &&
+			 var->vartypmod != -1))
+			return false;		/* type mismatch */
+
+		tlist_item = lnext(tlist_item);
+	}
+
+	if (tlist_item)
+		return false;			/* tlist too long */
+
+	/*
+	 * If the plan context requires a particular hasoid setting, then that has
+	 * to match, too.
+	 */
+	if (ExecContextForcesOids(ps, &hasoid) &&
+		hasoid != tupdesc->tdhasoid)
+		return false;
+
+	return true;
+}
+
+/* ----------------
  *		ExecFreeExprContext
  *
  * A plan node's ExprContext should be freed explicitly during executor
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 2181349d8a..856db9e0f1 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -102,12 +102,6 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
 	outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
 
 	/*
-	 * Initialize result tuple type and projection info.
-	 */
-	ExecAssignResultTypeFromTL(&gatherstate->ps);
-	ExecAssignProjectionInfo(&gatherstate->ps, NULL);
-
-	/*
 	 * Initialize funnel slot to same tuple descriptor as outer plan.
 	 */
 	if (!ExecContextForcesOids(outerPlanState(gatherstate), &hasoid))
@@ -115,6 +109,12 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
 	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
 	ExecSetSlotDescriptor(gatherstate->funnel_slot, tupDesc);
 
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&gatherstate->ps);
+	ExecConditionalAssignProjectionInfo(&gatherstate->ps, tupDesc, OUTER_VAR);
+
 	return gatherstate;
 }
 
@@ -217,6 +217,10 @@ ExecGather(PlanState *pstate)
 	if (TupIsNull(slot))
 		return NULL;
 
+	/* If no projection is required, we're done. */
+	if (node->ps.ps_ProjInfo == NULL)
+		return slot;
+
 	/*
 	 * Form the result tuple using ExecProject(), and return it.
 	 */
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 5625b12521..6da607b7c4 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -115,10 +115,19 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
 	outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
 
 	/*
+	 * Store the tuple descriptor into gather merge state, so we can use it
+	 * while initializing the gather merge slots.
+	 */
+	if (!ExecContextForcesOids(&gm_state->ps, &hasoid))
+		hasoid = false;
+	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
+	gm_state->tupDesc = tupDesc;
+
+	/*
 	 * Initialize result tuple type and projection info.
 	 */
 	ExecAssignResultTypeFromTL(&gm_state->ps);
-	ExecAssignProjectionInfo(&gm_state->ps, NULL);
+	ExecConditionalAssignProjectionInfo(&gm_state->ps, tupDesc, OUTER_VAR);
 
 	/*
 	 * initialize sort-key information
@@ -150,15 +159,6 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
 		}
 	}
 
-	/*
-	 * Store the tuple descriptor into gather merge state, so we can use it
-	 * while initializing the gather merge slots.
-	 */
-	if (!ExecContextForcesOids(&gm_state->ps, &hasoid))
-		hasoid = false;
-	tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
-	gm_state->tupDesc = tupDesc;
-
 	/* Now allocate the workspace for gather merge */
 	gather_merge_setup(gm_state);
 
@@ -253,6 +253,10 @@ ExecGatherMerge(PlanState *pstate)
 	if (TupIsNull(slot))
 		return NULL;
 
+	/* If no projection is required, we're done. */
+	if (node->ps.ps_ProjInfo == NULL)
+		return slot;
+
 	/*
 	 * Form the result tuple using ExecProject(), and return it.
 	 */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c4ecf0d50f..9c268407e7 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -495,6 +495,8 @@ extern void ExecAssignResultTypeFromTL(PlanState *planstate);
 extern TupleDesc ExecGetResultType(PlanState *planstate);
 extern void ExecAssignProjectionInfo(PlanState *planstate,
 						 TupleDesc inputDesc);
+extern void ExecConditionalAssignProjectionInfo(PlanState *planstate,
+									TupleDesc inputDesc, Index varno);
 extern void ExecFreeExprContext(PlanState *planstate);
 extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
 extern void ExecAssignScanTypeFromOuterPlan(ScanState *scanstate);
-- 
2.13.5 (Apple Git-94)

0004-shm-mq-less-spinlocks-v2.patchapplication/octet-stream; name=0004-shm-mq-less-spinlocks-v2.patchDownload
From d776246453ad54e85367dca4b3c2490260187a29 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 17:42:53 +0100
Subject: [PATCH 4/6] shm-mq-less-spinlocks-v2

---
 src/backend/storage/ipc/shm_mq.c | 237 +++++++++++++++++++--------------------
 1 file changed, 116 insertions(+), 121 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 770559a03e..75c6bbd4fb 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -31,27 +31,27 @@
  * Some notes on synchronization:
  *
  * mq_receiver and mq_bytes_read can only be changed by the receiver; and
- * mq_sender and mq_bytes_written can only be changed by the sender.  However,
- * because most of these fields are 8 bytes and we don't assume that 8 byte
- * reads and writes are atomic, the spinlock must be taken whenever the field
- * is updated, and whenever it is read by a process other than the one allowed
- * to modify it. But the process that is allowed to modify it is also allowed
- * to read it without the lock.  On architectures where 8-byte writes are
- * atomic, we could replace these spinlocks with memory barriers, but
- * testing found no performance benefit, so it seems best to keep things
- * simple for now.
+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.
  *
- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.
  *
  * mq_ring_size and mq_ring_offset never change after initialization, and
  * can therefore be read without the lock.
  *
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * At any given time, the difference between mq_bytes_read and
  * mq_bytes_written defines the number of bytes within mq_ring that contain
  * unread data, and mq_bytes_read defines the position where those bytes
  * begin.  The sender can increase the number of unread bytes at any time,
@@ -71,8 +71,8 @@ struct shm_mq
 	slock_t		mq_mutex;
 	PGPROC	   *mq_receiver;
 	PGPROC	   *mq_sender;
-	uint64		mq_bytes_read;
-	uint64		mq_bytes_written;
+	pg_atomic_uint64 mq_bytes_read;
+	pg_atomic_uint64 mq_bytes_written;
 	Size		mq_ring_size;
 	bool		mq_detached;
 	uint8		mq_ring_offset;
@@ -150,11 +150,8 @@ static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 					 BackgroundWorkerHandle *handle);
-static uint64 shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n);
-static uint64 shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n);
-static shm_mq_result shm_mq_notify_receiver(volatile shm_mq *mq);
 static void shm_mq_detach_callback(dsm_segment *seg, Datum arg);
 
 /* Minimum queue size is enough for header and at least one chunk of data. */
@@ -182,8 +179,8 @@ shm_mq_create(void *address, Size size)
 	SpinLockInit(&mq->mq_mutex);
 	mq->mq_receiver = NULL;
 	mq->mq_sender = NULL;
-	mq->mq_bytes_read = 0;
-	mq->mq_bytes_written = 0;
+	pg_atomic_init_u64(&mq->mq_bytes_read, 0);
+	pg_atomic_init_u64(&mq->mq_bytes_written, 0);
 	mq->mq_ring_size = size - data_offset;
 	mq->mq_detached = false;
 	mq->mq_ring_offset = data_offset - offsetof(shm_mq, mq_ring);
@@ -352,6 +349,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	PGPROC	   *receiver;
 	Size		nbytes = 0;
 	Size		bytes_written;
 	int			i;
@@ -492,8 +490,30 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_length_word_complete = false;
 
+	/* If queue has been detached, let caller know. */
+	if (mq->mq_detached)
+		return SHM_MQ_DETACHED;
+
+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */
+	if (mqh->mqh_counterparty_attached)
+		receiver = mq->mq_receiver;
+	else
+	{
+		SpinLockAcquire(&mq->mq_mutex);
+		receiver = mq->mq_receiver;
+		SpinLockRelease(&mq->mq_mutex);
+		if (receiver == NULL)
+			return SHM_MQ_SUCCESS;
+		mqh->mqh_counterparty_attached = true;
+	}
+
 	/* Notify receiver of the newly-written data, and return. */
-	return shm_mq_notify_receiver(mq);
+	SetLatch(&receiver->procLatch);
+	return SHM_MQ_SUCCESS;
 }
 
 /*
@@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 
 	while (sent < nbytes)
 	{
-		bool		detached;
 		uint64		rb;
+		uint64		wb;
 
 		/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;
 		Assert(used <= ringsize);
 		available = Min(ringsize - used, nbytes - sent);
 
 		/* Bail out if the queue has been detached. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			*bytes_written = sent;
 			return SHM_MQ_DETACHED;
@@ -900,15 +921,13 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else if (available == 0)
 		{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			Assert(mqh->mqh_counterparty_attached);
+			SetLatch(&mq->mq_receiver->procLatch);
 
 			/* Skip manipulation of our latch if nowait = true. */
 			if (nowait)
@@ -934,10 +953,18 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else
 		{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
+
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);
 
-			/* Write as much data as we can via a single memcpy(). */
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 */
+			pg_memory_barrier();
 			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
 				   (char *) data + sent, sendnow);
 			sent += sendnow;
@@ -983,19 +1010,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 	for (;;)
 	{
 		Size		offset;
-		bool		detached;
+		uint64		read;
 
 		/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+		used = written - read;
 		Assert(used <= ringsize);
-		offset = mq->mq_bytes_read % (uint64) ringsize;
+		offset = read % (uint64) ringsize;
 
 		/* If we have enough data or buffer has wrapped, we're done. */
 		if (used >= bytes_needed || offset + used >= ringsize)
 		{
 			*nbytesp = Min(used, ringsize - offset);
 			*datap = &mq->mq_ring[mq->mq_ring_offset + offset];
+
+			/*
+			 * Separate the read of mq_bytes_written, above, from caller's
+			 * attempt to read the data itself.  Pairs with the barrier in
+			 * shm_mq_inc_bytes_written.
+			 */
+			pg_read_barrier();
 			return SHM_MQ_SUCCESS;
 		}
 
@@ -1007,7 +1042,7 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		 * receiving a message stored in the buffer even after the sender has
 		 * detached.
 		 */
-		if (detached)
+		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1037,16 +1072,10 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 static bool
 shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 {
-	bool		detached;
 	pid_t		pid;
 
-	/* Acquire the lock just long enough to check the pointer. */
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
 	/* If the queue has been detached, counterparty is definitely gone. */
-	if (detached)
+	if (mq->mq_detached)
 		return true;
 
 	/* If there's a handle, check worker status. */
@@ -1059,9 +1088,7 @@ shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 		if (status != BGWH_STARTED && status != BGWH_NOT_YET_STARTED)
 		{
 			/* Mark it detached, just to make it official. */
-			SpinLockAcquire(&mq->mq_mutex);
 			mq->mq_detached = true;
-			SpinLockRelease(&mq->mq_mutex);
 			return true;
 		}
 	}
@@ -1091,16 +1118,14 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	{
 		BgwHandleStatus status;
 		pid_t		pid;
-		bool		detached;
 
 		/* Acquire the lock just long enough to check the pointer. */
 		SpinLockAcquire(&mq->mq_mutex);
-		detached = mq->mq_detached;
 		result = (*ptr != NULL);
 		SpinLockRelease(&mq->mq_mutex);
 
 		/* Fail if detached; else succeed if initialized. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			result = false;
 			break;
@@ -1133,23 +1158,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 }
 
 /*
- * Get the number of bytes read.  The receiver need not use this to access
- * the count of bytes read, but the sender must.
- */
-static uint64
-shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_read;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes read.
  */
 static void
@@ -1157,63 +1165,50 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
 	PGPROC	   *sender;
 
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes().
+	 * We only need a read barrier here because the increment of mq_bytes_read
+	 * is actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method should be cheaper.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_read,
+						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
+
+	/*
+	 * We shouldn't have any bytes to read without a sender, so we can read
+	 * mq_sender here without a lock.  Once it's initialized, it can't change.
+	 */
 	sender = mq->mq_sender;
-	SpinLockRelease(&mq->mq_mutex);
-
-	/* We shouldn't have any bytes to read without a sender. */
 	Assert(sender != NULL);
 	SetLatch(&sender->procLatch);
 }
 
 /*
- * Get the number of bytes written.  The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_written;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes written.
  */
 static void
 shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n)
 {
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_written += n;
-	SpinLockRelease(&mq->mq_mutex);
-}
-
-/*
- * Set receiver's latch, unless queue is detached.
- */
-static shm_mq_result
-shm_mq_notify_receiver(volatile shm_mq *mq)
-{
-	PGPROC	   *receiver;
-	bool		detached;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	receiver = mq->mq_receiver;
-	SpinLockRelease(&mq->mq_mutex);
-
-	if (detached)
-		return SHM_MQ_DETACHED;
-	if (receiver)
-		SetLatch(&receiver->procLatch);
-	return SHM_MQ_SUCCESS;
+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with the read barrier found in
+	 * shm_mq_get_receive_bytes.
+	 */
+	pg_write_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_written,
+						pg_atomic_read_u64(&mq->mq_bytes_written) + n);
 }
 
 /* Shim for on_dsm_callback. */
-- 
2.13.5 (Apple Git-94)

0005-shm-mq-reduce-receiver-latch-set-v1.patchapplication/octet-stream; name=0005-shm-mq-reduce-receiver-latch-set-v1.patchDownload
From 71247764266fce5097b473433a1822a9b165eb72 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 5/6] shm-mq-reduce-receiver-latch-set-v1

---
 src/backend/storage/ipc/shm_mq.c | 69 +++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 75c6bbd4fb..ceeb487390 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -142,10 +142,10 @@ struct shm_mq_handle
 };
 
 static void shm_mq_detach_internal(shm_mq *mq);
-static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
+static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes,
 				  const void *data, bool nowait, Size *bytes_written);
-static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
-					 bool nowait, Size *nbytesp, void **datap);
+static shm_mq_result shm_mq_receive_bytes(shm_mq_handle *mqh,
+				  Size bytes_needed, bool nowait, Size *nbytesp, void **datap);
 static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
@@ -585,8 +585,14 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_counterparty_attached = true;
 	}
 
-	/* Consume any zero-copy data from previous receive operation. */
-	if (mqh->mqh_consume_pending > 0)
+	/*
+	 * If we've consumed an amount of data greater than 1/4th of the ring
+	 * size, mark it consumed in shared memory.  We try to avoid doing this
+	 * unnecessarily when only a small amount of data has been consumed,
+	 * because SetLatch() is fairly expensive and we don't want to do it
+	 * too often.
+	 */
+	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
 	{
 		shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
 		mqh->mqh_consume_pending = 0;
@@ -597,7 +603,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 	{
 		/* Try to receive the message length word. */
 		Assert(mqh->mqh_partial_bytes < sizeof(Size));
-		res = shm_mq_receive_bytes(mq, sizeof(Size) - mqh->mqh_partial_bytes,
+		res = shm_mq_receive_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
 								   nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
@@ -617,13 +623,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			needed = MAXALIGN(sizeof(Size)) + MAXALIGN(nbytes);
 			if (rb >= needed)
 			{
-				/*
-				 * Technically, we could consume the message length
-				 * information at this point, but the extra write to shared
-				 * memory wouldn't be free and in most cases we would reap no
-				 * benefit.
-				 */
-				mqh->mqh_consume_pending = needed;
+				mqh->mqh_consume_pending += needed;
 				*nbytesp = nbytes;
 				*datap = ((char *) rawdata) + MAXALIGN(sizeof(Size));
 				return SHM_MQ_SUCCESS;
@@ -635,7 +635,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			 */
 			mqh->mqh_expected_bytes = nbytes;
 			mqh->mqh_length_word_complete = true;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(sizeof(Size)));
+			mqh->mqh_consume_pending += MAXALIGN(sizeof(Size));
 			rb -= MAXALIGN(sizeof(Size));
 		}
 		else
@@ -654,7 +654,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			}
 			Assert(mqh->mqh_buflen >= sizeof(Size));
 
-			/* Copy and consume partial length word. */
+			/* Copy partial length word; remember to consume it. */
 			if (mqh->mqh_partial_bytes + rb > sizeof(Size))
 				lengthbytes = sizeof(Size) - mqh->mqh_partial_bytes;
 			else
@@ -662,7 +662,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			memcpy(&mqh->mqh_buffer[mqh->mqh_partial_bytes], rawdata,
 				   lengthbytes);
 			mqh->mqh_partial_bytes += lengthbytes;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(lengthbytes));
+			mqh->mqh_consume_pending += MAXALIGN(lengthbytes);
 			rb -= lengthbytes;
 
 			/* If we now have the whole word, we're ready to read payload. */
@@ -684,13 +684,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		 * we need not copy the data and can return a pointer directly into
 		 * shared memory.
 		 */
-		res = shm_mq_receive_bytes(mq, nbytes, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, nbytes, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb >= nbytes)
 		{
 			mqh->mqh_length_word_complete = false;
-			mqh->mqh_consume_pending = MAXALIGN(nbytes);
+			mqh->mqh_consume_pending += MAXALIGN(nbytes);
 			*nbytesp = nbytes;
 			*datap = rawdata;
 			return SHM_MQ_SUCCESS;
@@ -730,13 +730,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_partial_bytes += rb;
 
 		/*
-		 * Update count of bytes read, with alignment padding.  Note that this
-		 * will never actually insert any padding except at the end of a
-		 * message, because the buffer size is a multiple of MAXIMUM_ALIGNOF,
-		 * and each read and write is as well.
+		 * Update count of bytes that can be consumed, accounting for
+		 * alignment padding.  Note that this will never actually insert any
+		 * padding except at the end of a message, because the buffer size is
+		 * a multiple of MAXIMUM_ALIGNOF, and each read and write is as well.
 		 */
 		Assert(mqh->mqh_partial_bytes == nbytes || rb == MAXALIGN(rb));
-		shm_mq_inc_bytes_read(mq, MAXALIGN(rb));
+		mqh->mqh_consume_pending += MAXALIGN(rb);
 
 		/* If we got all the data, exit the loop. */
 		if (mqh->mqh_partial_bytes >= nbytes)
@@ -744,7 +744,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 
 		/* Wait for some more data. */
 		still_needed = nbytes - mqh->mqh_partial_bytes;
-		res = shm_mq_receive_bytes(mq, still_needed, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, still_needed, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb > still_needed)
@@ -1000,9 +1000,10 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
  * is SHM_MQ_SUCCESS.
  */
 static shm_mq_result
-shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
+shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 					 Size *nbytesp, void **datap)
 {
+	shm_mq	   *mq = mqh->mqh_queue;
 	Size		ringsize = mq->mq_ring_size;
 	uint64		used;
 	uint64		written;
@@ -1014,7 +1015,13 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 
 		/* Get bytes written, so we can compute what's available to read. */
 		written = pg_atomic_read_u64(&mq->mq_bytes_written);
-		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+
+		/*
+		 * Get bytes read.  Include bytes we could consume but have not yet
+		 * consumed.
+		 */
+		read = pg_atomic_read_u64(&mq->mq_bytes_read) +
+			mqh->mqh_consume_pending;
 		used = written - read;
 		Assert(used <= ringsize);
 		offset = read % (uint64) ringsize;
@@ -1045,6 +1052,16 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
+		/*
+		 * We didn't get enough data to satisfy the request, so mark any
+		 * data previously-consumed as read to make more buffer space.
+		 */
+		if (mqh->mqh_consume_pending > 0)
+		{
+			shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
+			mqh->mqh_consume_pending = 0;
+		}
+
 		/* Skip manipulation of our latch if nowait = true. */
 		if (nowait)
 			return SHM_MQ_WOULD_BLOCK;
-- 
2.13.5 (Apple Git-94)

0006-remove-memory-leak-protection-v1.patchapplication/octet-stream; name=0006-remove-memory-leak-protection-v1.patchDownload
From a17d169442b13a5655c3d5f75941f30b7452e505 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 23:44:02 +0100
Subject: [PATCH 6/6] remove-memory-leak-protection-v1

---
 src/backend/executor/nodeGather.c      | 14 ++------------
 src/backend/executor/nodeGatherMerge.c | 10 +---------
 src/backend/executor/tqueue.c          |  2 ++
 3 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 856db9e0f1..30f4394b30 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -129,7 +129,6 @@ static TupleTableSlot *
 ExecGather(PlanState *pstate)
 {
 	GatherState *node = castNode(GatherState, pstate);
-	TupleTableSlot *fslot = node->funnel_slot;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
 
@@ -201,11 +200,8 @@ ExecGather(PlanState *pstate)
 
 	/*
 	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.  This will also clear
-	 * any previous tuple returned by a TupleQueueReader; to make sure we
-	 * don't leave a dangling pointer around, clear the working slot first.
+	 * storage allocated in the previous tuple cycle.
 	 */
-	ExecClearTuple(fslot);
 	econtext = node->ps.ps_ExprContext;
 	ResetExprContext(econtext);
 
@@ -254,7 +250,6 @@ gather_getnext(GatherState *gatherstate)
 	PlanState  *outerPlan = outerPlanState(gatherstate);
 	TupleTableSlot *outerTupleSlot;
 	TupleTableSlot *fslot = gatherstate->funnel_slot;
-	MemoryContext tupleContext = gatherstate->ps.ps_ExprContext->ecxt_per_tuple_memory;
 	HeapTuple	tup;
 
 	while (gatherstate->nreaders > 0 || gatherstate->need_to_scan_locally)
@@ -263,12 +258,7 @@ gather_getnext(GatherState *gatherstate)
 
 		if (gatherstate->nreaders > 0)
 		{
-			MemoryContext oldContext;
-
-			/* Run TupleQueueReaders in per-tuple context */
-			oldContext = MemoryContextSwitchTo(tupleContext);
 			tup = gather_readnext(gatherstate);
-			MemoryContextSwitchTo(oldContext);
 
 			if (HeapTupleIsValid(tup))
 			{
@@ -276,7 +266,7 @@ gather_getnext(GatherState *gatherstate)
 							   fslot,	/* slot in which to store the tuple */
 							   InvalidBuffer,	/* buffer associated with this
 												 * tuple */
-							   false);	/* slot should not pfree tuple */
+							   true);	/* pfree tuple when done with it */
 				return fslot;
 			}
 		}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 6da607b7c4..1f9818233b 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -605,7 +605,7 @@ load_tuple_array(GatherMergeState *gm_state, int reader)
 								  &tuple_buffer->done);
 		if (!HeapTupleIsValid(tuple))
 			break;
-		tuple_buffer->tuple[i] = heap_copytuple(tuple);
+		tuple_buffer->tuple[i] = tuple;
 		tuple_buffer->nTuples++;
 	}
 }
@@ -669,7 +669,6 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
 								&tuple_buffer->done);
 		if (!HeapTupleIsValid(tup))
 			return false;
-		tup = heap_copytuple(tup);
 
 		/*
 		 * Attempt to read more tuples in nowait mode and store them in the
@@ -699,20 +698,13 @@ gm_readnext_tuple(GatherMergeState *gm_state, int nreader, bool nowait,
 {
 	TupleQueueReader *reader;
 	HeapTuple	tup;
-	MemoryContext oldContext;
-	MemoryContext tupleContext;
 
 	/* Check for async events, particularly messages from workers. */
 	CHECK_FOR_INTERRUPTS();
 
 	/* Attempt to read a tuple. */
 	reader = gm_state->reader[nreader - 1];
-
-	/* Run TupleQueueReaders in per-tuple context */
-	tupleContext = gm_state->ps.ps_ExprContext->ecxt_per_tuple_memory;
-	oldContext = MemoryContextSwitchTo(tupleContext);
 	tup = TupleQueueReaderNext(reader, nowait, done);
-	MemoryContextSwitchTo(oldContext);
 
 	return tup;
 }
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 4a295c936b..0dcb911c3c 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -161,6 +161,8 @@ DestroyTupleQueueReader(TupleQueueReader *reader)
  * is set to true when there are no remaining tuples and otherwise to false.
  *
  * The returned tuple, if any, is allocated in CurrentMemoryContext.
+ * Note that this routine must not leak memory!  (We used to allow that,
+ * but not any more.)
  *
  * Even when shm_mq_receive() returns SHM_MQ_WOULD_BLOCK, this can still
  * accumulate bytes from a partially-read message, so it's useful to call
-- 
2.13.5 (Apple Git-94)

#38Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Robert Haas (#37)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Fri, Nov 10, 2017 at 8:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 10, 2017 at 5:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am seeing the assertion failure as below on executing the above
mentioned Create statement:

TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File:
"heapam.c", Line: 2634)
server closed the connection unexpectedly
This probably means the server terminated abnormally

OK, I see it now. Not sure why I couldn't reproduce this before.

I think the problem is not actually with the code that I just wrote.
What I'm seeing is that the slot descriptor's tdhasoid value is false
for both the funnel slot and the result slot; therefore, we conclude
that no projection is needed to remove the OIDs. That seems to make
sense: if the funnel slot doesn't have OIDs and the result slot
doesn't have OIDs either, then we don't need to remove them.
Unfortunately, even though the funnel slot descriptor is marked
tdhashoid = false, the tuples being stored there actually do have
OIDs. And that is because they are coming from the underlying
sequential scan, which *also* has OIDs despite the fact that tdhasoid
for it's slot is false.

This had me really confused until I realized that there are two
processes involved. The problem is that we don't pass eflags down to
the child process -- so in the user backend, everybody agrees that
there shouldn't be OIDs anywhere, because EXEC_FLAG_WITHOUT_OIDS is
set. In the parallel worker, however, it's not set, so the worker
feels free to do whatever comes naturally, and in this test case that
happens to be returning tuples with OIDs. Patch for this attached.

I also noticed that the code that initializes the funnel slot is using
its own PlanState rather than the outer plan's PlanState to call
ExecContextForcesOids. I think that's formally incorrect, because the
goal is to end up with a slot that is the same as the outer plan's
slot. It doesn't matter because ExecContextForcesOids doesn't care
which PlanState it gets passed, but the comments in
ExecContextForcesOids imply that somebody it might, so perhaps it's
best to clean that up. Patch for this attached, too.

And here are the other patches again, too.

I tested this patch on TPC-H benchmark queries and here are the details.
Setup:
commit: 42de8a0255c2509bf179205e94b9d65f9d6f3cf9
TPC-H scale factor = 20
work_mem = 1GB
max_parallel_workers_per_gather = 4
random_page_cost = seq_page_cost = 0.1

Results:
Case 1: patches applied = skip-project-gather_v1 +
shm-mq-reduce-receiver-latch-set-v1 + shm-mq-less-spinlocks-v2 +
remove-memory-leak-protection-v1
No change in execution time performance for any of the 22 queries.

Case 2: patches applied as in case 1 +
a) increased PARALLEL_TUPLE_QUEUE_SIZE to 655360
No significant change in performance in any query
b) increased PARALLEL_TUPLE_QUEUE_SIZE to 65536 * 50
Performance improved from 20s to 11s for Q12
c) increased PARALLEL_TUPLE_QUEUE_SIZE to 6553600
Q12 shows improvement in performance from 20s to 7s

Case 3: patch applied = faster_gather_v3 as posted at [1]/messages/by-id/CAOGQiiMOWJwfaegpERkvv3t6tY2CBdnhWHWi1iCfuMsCC98a4g@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
Q12 shows improvement in performance from 20s to 8s

Please find the attached file for the explain analyse outputs in all
of the aforementioned cases.
I am next working on analysing the effect of these patches on gather
performance in other cases.

[1]: /messages/by-id/CAOGQiiMOWJwfaegpERkvv3t6tY2CBdnhWHWi1iCfuMsCC98a4g@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

gather_speedup_test.zipapplication/zip; name=gather_speedup_test.zipDownload
#39Robert Haas
robertmhaas@gmail.com
In reply to: Rafia Sabih (#38)
Re: [HACKERS] [POC] Faster processing at Gather node

On Tue, Nov 14, 2017 at 7:31 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Case 2: patches applied as in case 1 +
a) increased PARALLEL_TUPLE_QUEUE_SIZE to 655360
No significant change in performance in any query
b) increased PARALLEL_TUPLE_QUEUE_SIZE to 65536 * 50
Performance improved from 20s to 11s for Q12
c) increased PARALLEL_TUPLE_QUEUE_SIZE to 6553600
Q12 shows improvement in performance from 20s to 7s

Case 3: patch applied = faster_gather_v3 as posted at [1]
Q12 shows improvement in performance from 20s to 8s

I think that we need a little bit deeper analysis here to draw any
firm conclusions. My own testing showed about a 2x performance
improvement with all 4 patches applied on a query that did a Gather
Merge with many rows. Now, your testing shows the patches aren't
helping at all. But what accounts for the difference between your
results? Without some analysis of that question, this is just a data
point that probably doesn't get us very far.

I suspect that one factor is that many of the queries actually send
very few rows through the Gather. You didn't send EXPLAIN ANALYZE
outputs for these runs, but I went back and looked at some old tests I
did on a small scale factor and found that, on those tests, Q2, Q6,
Q13, Q14, and Q15 didn't use parallelism at all, while Q1, Q4, Q5, Q7,
Q8, Q9, Q11, Q12, Q19, and Q22 used parallelism, but sent less than
100 rows through Gather. Obviously, speeding up Gather isn't going to
help at all when only a tiny number of rows are being sent through it.
The remaining seven queries sent the following numbers of rows through
Gather:

3: -> Gather Merge (cost=708490.45..1110533.81
rows=3175456 width=0) (actual time=21932.675..22150.587 rows=118733
loops=1)

10: -> Gather Merge (cost=441168.55..513519.51
rows=574284 width=0) (actual time=15281.370..16319.895 rows=485230
loops=1)

16: -> Gather
(cost=1000.00..47363.41 rows=297653 width=40) (actual
time=0.414..272.808 rows=297261 loops=1)

17: -> Gather (cost=1000.00..12815.71
rows=2176 width=4) (actual time=2.105..152.966 rows=1943 loops=1)
17: -> Gather Merge
(cost=2089193.30..3111740.98 rows=7445304 width=0) (actual
time=14071.064..33135.996 rows=9973440 loops=1)

18: -> Gather Merge (cost=3271973.63..7013135.71
rows=29992968 width=0) (actual time=81581.450..81581.594 rows=112
loops=1)

20: -> Gather
(cost=1000.00..13368.31 rows=20202 width=4) (actual time=0.361..19.035
rows=21761 loops=1)

21: -> Gather (cost=1024178.86..1024179.27
rows=4 width=34) (actual time=12367.266..12377.991 rows=17176 loops=1)

Of those, Q18 is probably uninteresting because it only sends 112
rows, and Q20 and Q16 are probably uninteresting because the Gather
only executed for 19 ms and 272 ms respectively. Q21 doesn't look
interesting because we ran for 12337.991 seconds and only sent 17176
rows - so the bottleneck is probably generating the tuples, not
sending them. The places where you'd expect the patch set to help are
where a lot of rows are being sent through the Gather or Gather Merge
node very quickly - so with these plans, probably Q17 is the only that
would have the best chance of going faster with these patches and
maybe Q3 might benefit a bit.

Now obviously your plans are different -- otherwise you couldn't be
seeing a speedup on Q12. So you have to look at the plans and try to
understand what the big picture is here. Spending a lot of time
running queries where the time taken by Gather is not the bottleneck
is not a good way to figure out whether we've successfully sped up
Gather. What would be more useful? How about:

- Once you've identified the queries where Gather seems like it might
be a bottleneck, run perf without the patch set and see whether Gather
or shm_mq related functions show up high in the profile. If they do,
run perf which the patch set and see if they become less prominent.

- Try running the test cases that Andres and I tried with and without
the patch set. See if it helps on those queries. That will help
verify that your testing procedure is correct, and might also reveal
differences in the effectiveness of that patch set on different
hardware. You could try this experiment on both PPC and x64, or on
both Linux and MacOS, to see whether CPU architecture and/or operating
system plays a role in the effectiveness of the patch.

I think it's a valid finding that increasing the size of the tuple
queue makes Q12 run faster, but I think that's not because it makes
Gather itself any faster. Rather, it's because there are fewer
pipeline stalls. With Gather Merge, whenever a tuple queue becomes
empty, the leader becomes unable to return any more tuples until the
process whose queue is empty generates at least one new tuple. If
there are multiple workers with non-full queues at the same time then
they can all work on generating tuples in parallel, but if every queue
except one is full, and that queue is empty, then there's nothing to
do but wait for that process. I suspect that is fairly common with
the plan you're getting for Q12, which I think looks like this:

Limit
-> GroupAggregate
-> Gather Merge
-> Nested Loop
-> Parallel Index Scan
-> Index Scan

Imagine that you have two workers, and say one of them starts up
slightly faster than the other. So it fills up its tuple queue with
tuples by executing the nested loop. Then the queue is full, so it
sleeps. Now the other worker does the same thing. Ignoring the
leader for the moment, what will happen next is that all of the tuples
produced worker #1 are smaller than all of the tuples from worker #2,
so the gather merge will read and return all of the tuples from the
first worker while reading only a single tuple from the second one.
Then it reverses - we read one more tuple from the first worker while
reading and returning all the tuples from the second one. We're not
reading from the queues evenly, so that the workers keep busy, but are
instead reading long runs of tuples from the same worker while
everybody else waits. Therefore, we're not really getting any
parallelism at all - for the most part, only one worker runs at a
time. Here's a fragment of EXPLAIN ANALYZE output from one of your
old emails on this topic[1]/messages/by-id/CAOGQiiOAhNPB7Ow8E4r3dAcLB8LEy_t_oznGeB8B2yQbsj7BFA@mail.gmail.com:

-> Gather Merge (cost=1001.19..2721491.60 rows=592261
width=27) (actual time=7.806..44794.334 rows=311095 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Nested Loop (cost=1.13..2649947.55 rows=148065
width=27) (actual time=0.342..9071.892 rows=62257 loops=5)

You can see that we've got 5 participants here (leader + 4 workers).
Each one spends an average of 9.07 seconds executing the nested loop,
but they take 44.8 seconds to finish the whole thing. If they ran
completely sequentially it would have taken 45.4 seconds - so there
was only 0.6 seconds of overlapping execution. If we crank up the
queue size, we will eventually get it large enough that all of the
workers can the plan to completion without filling up the queue, and
then things will indeed get much faster, but again, not because Gather
is any faster, just because then all workers will be running at the
same time.

In some sense, that's OK: a speedup is a speedup. However, to get the
maximum speedup with this sort of plan, it needs to big enough that it
never fills up. How big that is depends on the data set size. If we
make the queue 100x bigger based on these test results, and then you
test on a data set that is 10x bigger, you'll come back and recommend
again making it 10x bigger, because it will again produce a huge
performance gain. On the other hand, if you test a data set that's
only 2x bigger, you'll come back and recommend making the queue 2x
bigger, because that will be good enough. If you test a data set
that's only half as big as this one, you'll probably find that you
don't need to enlarge the queue 100x -- 50x will be good enough.
There is no size that we can make the queue that will be good enough
in general: somebody can always pick a data set large enough that the
queues fill up, and after that only one worker will run at a time on
this plan-shape. Contrariwise, somebody can always pick a small
enough data set that a given queue size just wastes memory without
helping performance.

Similarly, I think that faster_gather_v3.patch is effectively here
because it lets all the workers run at the same time, not because
Gather gets any faster. The local queue is 100x bigger than the
shared queue, and that's big enough that the workers never have to
block, so they all run at the same time and things are great. I don't
see much advantage in pursuing this route. For the local queue to
make sense it needs to have some advantage that we can't get by just
making the shared queue bigger, which is easier and less code. The
original idea was that we'd reduce latch traffic and spinlock
contention by moving data from the local queue to the shared queue in
bulk, but the patches I posted attack those problems more directly.

As a general point, I think we need to separate the two goals of (1)
making Gather/Gather Merge faster and (2) reducing Gather Merge
related pipeline stalls. The patches I posted do (1). With respect
to (2), I can think of three possible approaches:

1. Make the tuple queues bigger, at least for Gather Merge. We can't
fix the problem that the data might be too big to let all workers run
to completion before blocking, but we could make it less likely by
allowing for more space, scaled by work_mem or some new GUC.

2. Have the planner figure out that this is going to be a problem. I
kind of wonder how often it really makes sense to feed a Gather Merge
from a Parallel Index Scan, even indirectly. I wonder if this would
run faster if it didn't use parallelism at all. If there are enough
intermediate steps between the Parallel Index Scan and the Gather
Merge, then the Gather Merge strategy probably makes sense, but in
general it seems pretty sketchy to break the ordered stream of data
that results from an index scan across many processes and then almost
immediately try to reassemble that stream into sorted order. That's
kind of lame.

3. Have Parallel Index Scan do a better job distributing the tuples
randomly across the workers. The problem here happens because, if we
sat and watched which worker produced the next tuple, it wouldn't look
like 1,2,3,4,1,2,3,4,... but rather
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(many more times),1,1,1,2,2,2,2,2,....
If we could somehow scramble the distribution of tuples to workers so
that this didn't happen, I think it would fix this problem.

Neither (2) nor (3) seem terribly easy to implement so maybe we should
just go with (1), but I feel like that's not a very deep solution to
the problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: /messages/by-id/CAOGQiiOAhNPB7Ow8E4r3dAcLB8LEy_t_oznGeB8B2yQbsj7BFA@mail.gmail.com

#40Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#39)
Re: [HACKERS] [POC] Faster processing at Gather node

Hi,

On 2017-11-15 13:48:18 -0500, Robert Haas wrote:

I think that we need a little bit deeper analysis here to draw any
firm conclusions.

Indeed.

I suspect that one factor is that many of the queries actually send
very few rows through the Gather.

Yep. I kinda wonder if the same result would present if the benchmarks
were run with parallel_leader_participation. The theory being what were
seing is just that the leader doesn't accept any tuples, and the large
queue size just helps because workers can run for longer.

Greetings,

Andres Freund

#41Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#39)
Re: [HACKERS] [POC] Faster processing at Gather node

On Thu, Nov 16, 2017 at 12:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 14, 2017 at 7:31 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
Similarly, I think that faster_gather_v3.patch is effectively here
because it lets all the workers run at the same time, not because
Gather gets any faster. The local queue is 100x bigger than the
shared queue, and that's big enough that the workers never have to
block, so they all run at the same time and things are great. I don't
see much advantage in pursuing this route. For the local queue to
make sense it needs to have some advantage that we can't get by just
making the shared queue bigger, which is easier and less code.

The main advantage of local queue idea is that it won't consume any
memory by default for running parallel queries. It would consume
memory when required and accordingly help in speeding up those cases.
However, increasing the size of shared queues by default will increase
memory usage for cases where it is even not required. Even, if we
provide a GUC to tune the amount of shared memory, I am not sure how
convenient it will be for the user to use it as it needs different
values for different workloads and it is not easy to make a general
recommendation. I am not telling we can't work-around this with the
help of GUC, but it seems like it will be better if we have some
autotune mechanism and I think Rafia's patch is one way to achieve it.

The
original idea was that we'd reduce latch traffic and spinlock
contention by moving data from the local queue to the shared queue in
bulk, but the patches I posted attack those problems more directly.

I think the idea was to solve both the problems (shm_mq communication
overhead and Gather Merge related pipeline stalls) with local queue
stuff [1]/messages/by-id/CAA4eK1Jk465W2TTWT4J-RP3RXK2bJWEtYY0xhYpnSc1mcEXfkA@mail.gmail.com.

[1]: /messages/by-id/CAA4eK1Jk465W2TTWT4J-RP3RXK2bJWEtYY0xhYpnSc1mcEXfkA@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#42Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#41)
Re: [HACKERS] [POC] Faster processing at Gather node

On Wed, Nov 15, 2017 at 9:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

The main advantage of local queue idea is that it won't consume any
memory by default for running parallel queries. It would consume
memory when required and accordingly help in speeding up those cases.
However, increasing the size of shared queues by default will increase
memory usage for cases where it is even not required. Even, if we
provide a GUC to tune the amount of shared memory, I am not sure how
convenient it will be for the user to use it as it needs different
values for different workloads and it is not easy to make a general
recommendation. I am not telling we can't work-around this with the
help of GUC, but it seems like it will be better if we have some
autotune mechanism and I think Rafia's patch is one way to achieve it.

It's true this might save memory in some cases. If we never generate
very many tuples, then we won't allocate the local queue and we'll
save memory. That's mildly nice.

On the other hand, the local queue may also use a bunch of memory
without improving performance, as in the case of Rafia's test where
she raised the queue size 10x and it didn't help.
Alternatively, it may improve performance by a lot, but use more
memory than necessary to do so. In Rafia's test results, a 100x
improvement got it down to 7s; if she'd done 200x instead, I don't
think it would have helped further, but it would have been necessary
to go 200x to get the full benefit if the data had been twice as big.

The problem here is that we have no idea how big the queue needs to
be. The workers will always be happy to generate tuples faster than
the leader can read them, if that's possible, but it will only
sometimes help performance to let them do so. I think in most cases
we'll end up allocating the local queue - because the workers can
generate faster than the leader can read - but only occasionally will
it make anything faster.

If what we really want to do is allow the workers to get arbitrarily
far ahead of the leader, we could ditch shm_mq altogether here and use
Thomas's shared tuplestore stuff. Then you never run out of memory
because you spill to disk. I'm not sure that's the way to go, though.
It still has the problem that you may let the workers get very far
ahead not just when it helps, but also when it's possible but not
helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#43Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Robert Haas (#39)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Thu, Nov 16, 2017 at 12:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I suspect that one factor is that many of the queries actually send
very few rows through the Gather. You didn't send EXPLAIN ANALYZE
outputs for these runs, but I went back and looked at some old tests I

Please find the attached zip for the same. The results are for head
and Case 2c. Since, there was no difference in plan or in performance
for the other cases except for Q12, I haven't kept the runs for each
of the cases mentioned upthread.

Now obviously your plans are different -- otherwise you couldn't be
seeing a speedup on Q12. So you have to look at the plans and try to
understand what the big picture is here. Spending a lot of time
running queries where the time taken by Gather is not the bottleneck
is not a good way to figure out whether we've successfully sped up
Gather. What would be more useful? How about:

For this scale factor, I found that the queries where gather or
gather-merge process relatively large number of rows were - Q2, Q3,
Q10, Q12, Q16, Q18, Q20, and Q21. However, as per the respective
explain analyse outputs, for all these queries except Q12, the
contribution of gather/gather-merge node individually in the total
execution time of the respective query is insignificant, so IMO we
can't expect any performance improvement from such cases for this set
of patches. We have already discussed the case of Q12 enough, so need
not to say anything about it here again.

- Once you've identified the queries where Gather seems like it might
be a bottleneck, run perf without the patch set and see whether Gather
or shm_mq related functions show up high in the profile. If they do,
run perf which the patch set and see if they become less prominent.

Sure, I'll do that.

- Try running the test cases that Andres and I tried with and without
the patch set. See if it helps on those queries. That will help
verify that your testing procedure is correct, and might also reveal
differences in the effectiveness of that patch set on different
hardware.

The only TPC-H query I could find upthread analysed by either you or Andres is,
explain analyze SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET
1000000000 LIMIT 1;

So, here are the results for it with the parameter settings as
suggested by Andres upthread,
set parallel_setup_cost=0;set parallel_tuple_cost=0;set
min_parallel_table_scan_size=0;set max_parallel_workers_per_gather=8;
with the addition of max_parallel_workers = 100, just to ensure that
it uses as many workers as it planned.

With the patch-set,
explain analyze SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET
1000000000 LIMIT 1;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=430530.95..430530.95 rows=1 width=129) (actual
time=57651.076..57651.076 rows=0 loops=1)
-> Gather (cost=0.00..430530.95 rows=116888930 width=129) (actual
time=0.581..50528.386 rows=116988791 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Seq Scan on lineitem (cost=0.00..430530.95
rows=14611116 width=129) (actual time=0.015..3904.101 rows=12998755
loops=9)
Filter: (l_suppkey > '5012'::bigint)
Rows Removed by Filter: 333980
Planning time: 0.143 ms
Execution time: 57651.722 ms
(9 rows)

on head,
explain analyze SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET
1000000000 LIMIT 1;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=430530.95..430530.95 rows=1 width=129) (actual
time=100024.995..100024.995 rows=0 loops=1)
-> Gather (cost=0.00..430530.95 rows=116888930 width=129) (actual
time=0.282..93607.947 rows=116988791 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Seq Scan on lineitem (cost=0.00..430530.95
rows=14611116 width=129) (actual time=0.029..3866.321 rows=12998755
loops=9)
Filter: (l_suppkey > '5012'::bigint)
Rows Removed by Filter: 333980
Planning time: 0.409 ms
Execution time: 100025.303 ms
(9 rows)

So, there is a significant improvement in performance with the
patch-set. The only point that confuses me is that Andres mentioned
upthread,

EXPLAIN ANALYZE SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET
1000000000 LIMIT 1;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual
time=5984.916..5984.916 rows=0 loops=1)
│ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual
time=0.214..5123.238 rows=26989780 loops=1)
│ Workers Planned: 8
│ Workers Launched: 7
│ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67
rows=3375405 width=127) (actual time=0.025..649.887 rows=3373722
loops=8)
│ Filter: (l_suppkey > 5012)
│ Rows Removed by Filter: 376252
│ Planning time: 0.076 ms
│ Execution time: 5986.171 ms
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(9 rows)

so there clearly is still benefit (this is scale 100, but that shouldn't
make much of a difference).

In my tests, the scale factor is 20 and the number of rows in gather
is 116988791, however, for Andres it is 26989780, plus, the time taken
by query in 20 scale factor is some 100s without patch and for Andres
it is 8s. So, may be when Andres wrote scale 100 it is typo for scale
10 or what he meant by scale is not scale factor of TPC-H, in that
case I'd like to know what he meant there.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

gather_analysis_all.zipapplication/zip; name=gather_analysis_all.zipDownload
PK
�qpKgather_analysis_all/UX#P
Z7P
Z�PK�qpKgather_analysis_all/.DS_StoreUX7P
ZFP
Z��Z	lg~��mf�!����F������NS��zmo��Ido|$N6�;c{�xf��q\�K=�h+D+�z*G���
�( !�Hj	��WH���x�>pU":o�~�=����{��c"B����(AD�ZiA��<�:�_�|"�,��ky��B	%�PB	%�s,��Pqno#�PB9���@'=�k��@����t2�3�F0.��@���t2�3�����+#8�@
t"��7��PByK�{v��a>�;��}���d'�|V7�<��l^�l���"e�\�B�^T�~��E�{F��Gv��v�:��4E��n�)�����-��^M��H5�����ZES�4C1'��1C���>MqF2Rm�����n�P{5[�j��L
Q������.miJ�l�nJLm����)q����i�zAXsIK������<����<����|#�
U�X���������u"w��b�u���;R;W�����i�h'��Ti�;���/g�c����b9Y��������;�vG�,S��M[s4��w��Na�9]�m�r:��2�4f��U^KWS�"/U�^�+���{U�f��l������x�e[�]y�R�U�5���q���z`����l�TC�����$G�T++�#�6�Q�t��\]��>Sq��I!�������=��A������i��#���
E1}��dG���Mb��.gU]R����-b�o����'hly������|svMA������{e��������[����6Eq�IhLN���1���L�Y'�����6�{8�i�L��n�{�+�,������Ez�~G�����w����Q��x.C3��J$��v�A7��iF*�0�<���p
>��p;����4>����x_�)|��y|?��x���H�����Z��[G�^�
�;��S�=p��t�Z]S��.�o�Yx�,F�5K�
*���fP��x��p�'��K���$�
rk�������=�����acC�a_����Q*��V�
���*�X��=����;��B�n�5�K	��jj#/zKwI���A���^e��e�B61X�2P��n��^�C/�p��C��q:�0�I�����7�f��'�I�����#x����2��Sx�����-��+��w��2t�&��V���x�Y��p-���h$�e^.[���\!D�d�Z/I�� ��^�&V���nu-2����r�����������1����)=y9�S	�J��W�#�M��<��)��RU�[�kh�V������w`��Z�:�i����i@�,�V�'{���h7�]"6���]rO�8�4������ HH��F���^���h��1:A��=�=C/�/���:V@�$�����he�|'�4�+s%~������?�������E
������0/WH�Gd�kD���,��T����7ka"��,@d���D�m�Kd��"�6�#2n�D�m^�����m�PDe��$����K�S�C�Q��}����91<EO�)���~N��?��89��^��a�$�9M\��qs����:�&�AAG����p�� n��pn�1�����Y������/K���Y>�������������qG�[���|7ys�M����0�.��~F���r�3�L��1x/e��2Oa'��.��i`�f 3W��L�-������q��6|�#���=���~<��z�����g��7�o�	�����2_%���w.��?�PB�?��zR��x�qkO��ha-�" ��������f�%=��s�2 <���������PK���(PK
rpK	__MACOSX/UXXP
ZXP
Z�PK
rpK__MACOSX/gather_analysis_all/UXXP
ZXP
Z�PK�qpK(__MACOSX/gather_analysis_all/._.DS_StoreUX7P
ZFP
Z�c`cg`b`�MLV�V�P��'���@�W!��8��A�0]�PK���4xPK
�JiK gather_analysis_all/gather_head/UX
�Z&�Z�PKfepK(gather_analysis_all/gather_head/10_1.outUX�Z�:
Z��Z[s�6~���[�����Sw&i�m�4M����S�������:��= H�7������=��8���(������W��F���x��^��rv��L���7q��"L���J�b�i��6g(K��A���b
v�AX���&�bL+�)�7M�{h��������?���,.QX�E��2���w�����$�D��"/7��:N���6x�kt_D�2Zn�8��?��}vy���P�����0-���3u��M�~�����'�����2�����]�IwT�n6QR�]]|���4�w��*���B)��������QR;z�^���R��������}�������b�./Q�n/��Ul�
���M��n.�^><���>�W/no��6(�s��T��5D*�-?qJ%�q��k*b��?E���9���^���r�
`l���gN��*�v�4F2�5�$��`"���%�,�6k�6��1k��{z��G�;�!�|�qt�������.�r�n$I���{�I�zd`�(�C'`�,����,�����e�XY.��_B�
����@����2�!�?����0�-��	�������]����|�H�9 r��hYC���;��:N��A��lxu���;1PV[L%��&�\��d�i9|f�[2�b��)�l���)S�01.�`�m�d!A�nfV�t�q�(�%E%D&:�%%F������&=�K^&��99���6��4}zm��&H�e�y�T����Nmi���B9T\4��L%Mg�����
v(��I@�Ng�1�*56��j.��3�(���|P���J#/�L��,X��5z}D�� Ai����:q�s�3�dcE
a�R�p�BZ�P9�(�0��f�S����2���@//
�F�c�� +�}y����\��(�t,�����$�t�}��,*�,��:�� /�����]��%���NG��tgQ��a��#Nr��a1��&]'"���������/x�"x�g��l�l�U��5������h�n>����A������{g�-5�d�Lm�Rk�
=�3
t����q����j	UGj`C
�i� *L���D��#5���4X
��F�I+�y�*b4Zs�jG�
����&��8kB�T#!���Q�E�o6O
m?��|6�z8���2����1b5���P�@m�{�!n!!f�A��.����YCr����'��8��@��j�a�:�^��JF��"�f/d�|V3�e8��Mf����eA�95'r����0/�J���s��R��P9<���=�U�����p�B������X����2#�
�������k*6N����S���p���X�Zn �g3�r�}�`����G/�������Z����+k���G���p�I��$������H�����Q-���������"-�u�Nw��\����J��Au�o���
�����:�p�5�]����\�Z+.(� ������o�Ec���FK
_����s��m��j�J�������(�*�z�N�;��{0�+a%������^���?V'f#�z~Fr�h��������:��p���	�n2�����jw%)�:��Zn�5T��;�/�kz���S��R8,����~��*X���o��x������e�F�sL�{�����Q����������Fi?zWT�NOK��*��D�`��N}c�(��o�vq�pnZ^	' ����~���'e����Z��)��C����c��+p��'��Wgx���=�������5����Wj���5��>��P�������(!gg�q�n��������1�nL��R�"�]Fpm!���(�� ������U�2�������j��Ui������Y���T�I}��9��uu�}��L���c�������6��#�u�pC��n�������?=�.HP����1��������!�d�����sh����������?^�)��9��ep�+hw�o����dw^f����
��V�"TN?���C��f�N�D�_�>q��Q��<b���������%m�3��>
��V�w���.�� ����	PK�?@B1�.PKrepK(gather_analysis_all/gather_head/10_2.outUX�Z�:
Z��Z�o�F���b�E>��}?��@����i�:���)�EX">Z�}g����lQf�����^/gvg�7���{�/^]��{����z'O�����gBg�M���a���P��Lc��9CY�{~��=^k�s���6����%c�Z�5����C�4�����,�sY��b��2/�m���W7��>=o�%�6z�y�],6q�E���_��"JV�j��a�������2)��8D��t�aZ&�9P�u��u�y�����'�����*���3�[�IwU�n�QR�S]|���4;�EN�Zy�PJ���QU�{U�QUz�2��R�����lG��z�MO�OQ�NWKT����h��2�O�6�@6��/�q��>��/nn��&(���V)�4ck�T[~���YV�Y�*�1|�'��d��'�~
��,���,-w�����������XG���H&��QV2	#r�@��
d5�H���J,���Yc%=h���tXs~������g�8sY�����6�r�n$I��{�I�~`a(�C'�������,k[1m��M��0,[Y�6�F��`�K����@�yz
����$�XF��r�6U��&M�j��	��A
����z�A�zw@d���Zce f2"���Y�YE�m�ug���q	���(V��I�`rg6�����������93����������@�
g	����(�&E%�&:�&%F���C�����&O�r�������u"P?_2-i��2��j��t?w����$�ss���0Sa��?
�[f��LO��(����a5B��@"L6�WBO�'�wAl6���>��a��4A��z���t�4�
FQ����������y����)�"��f��1�1�����v{dE./w���BMxe���e�q1F��"����EE�%��7��EP�yw��x����3� ��,
�x7�w<�$�l�?l����#Y�M��7E�u���!<��z�z���h�^�o�+hL~�V���^8�
��g<��'r9��-k���'`�#�$����Y�����JiI9Q�����:Y�(�5�'J��3H�%��V�F�b��i���M����������N8��5IZ�a��+���?l��i~r�lr� u{��ex�1p*��#0C�}��(�gJ�v��0����8��c�x"�\
������9��dj���b��b�	�@�hN�'�-��2`�&#�M�5hBVUK�4|y�rP�iZ�9�v��J�H�C5M��8^������2�}��uD�0�88W������'��[�	��V�bk����.��`��;`��QpL���2.��}�G�����O��R�z��H����QX�����*��6P��9@����Q-����]�����"-�M]Rw��z�;��*��m@t�wW��-�V���*���6�]����z�Z+.(� ������o�Ec���VKX
?��e��9����;�j�N���%�B��I�����&
o�&���A.�Q��S���	YZ9,
w�B��#�g�o��>���G��������(8jny�����V�
L�(pO��
;�������xuW�������R�f,��;~�V��z0_�����wD��qq���Q=����=,1(F��`*?�k���7c���V��F(��+�O
���x���B�,-�����}����P��-<W�Z��)��C����S��or���P�+������@M(����4�1�����13W�7�����4$7�*�f4��>
q\���l�o���W���i(�g���9���QO_P=�{u7x�����f����m�p�!>��F���Y�+R���]�I�#7�N��9�r��~#���g����y�U!���������������Y��%��A�1����lrp�����zm�����r�}��{�����7p7*�w._�������������P�8j�+��y�:
�������y�<t#b<g6�\}�*Nn��K��h���WwQXV�R��q��5����������PK`s7)/PK|epK(gather_analysis_all/gather_head/10_3.outUXA�Z�:
Z��ZYo�~���7S���� ��]gw���0�<-F��89C�����S=5%9N�F
��jvWMW�WG�}��>���z��������KW����.��h�-�$��)��+����s�������[���0`�h�ylPm�5�Dj�	�����M���kzy��k���|��"���Kq�����/�sq�u/�"+���&�]��-���r��j�F�C@���r�)
�j-]EY�q~	�a�a~l^�8��$�q-��d�J]�u�v�$��
�����?��w}H�CZ�TQ�U�J��V>�Je��*��*+�6ZQ������������=��|���(OvW�����4�/n�� �����[��
��������kln��Jcv��H%��GZ�*m��,��6X�z�V��Z��������+�1M�]�cG�H���v�1��.�������=L������h*1���j������5V��z�L�5WM2��1��g.kT��$�wi��o�8v����A��'6�BN���V����VZ�em+�����(�`+M<�3{���7X�L�
����X)�G�
�.-�iSi�������������z�'�Ti��,Q�Z51��Z�Q���%��V7�:�3r���9S�0�M��2�3�l�,��sl��9sm����rV�zf��gI4�p~
MDy4)*!5�14)1�&��G�}B��h�hR��=���N���kC�� IW.�S3������a�x�&�,�
E�D�t���G����
L)�
��QR=���L�P��(���%F��j*�*�pz��f�6����>�A�����M���
����C)jc�9���c���a�L�����^KZ���VQ��8�.H��\V�v�9?�������OE�Q>F��<x�����H�n���dy�Y�a��n�wg�C>���]��{Q�A6���d5d�L�m���M��n4x<"��7Po���u�m��
t����
�~���^A�c����D��N8�����"�`�#�$0/��zjPn��'J`C	���0�5%�Q��%������1���d�����M�V�}�����Mw����I�R�$i������6;�S~r�jr� u{��Ex��l� @�`&����t��c�AQ�8�>���?M�s3�Pyk��1k�	���d{1M�8O�H����ZL9�C�H��#�j}?<K�hl��P���
J_u�
k��WE��*��b�]A�������&�{�\C�%���aSv�wD�0+��W�O����Q?:MV�\���-������O��P,�hdBg��!�:������W���<d�=��+]$|k�(���EB��$$�<��������A�T-����]�T���<��M]Uw�V%mw&iY�?
�������[6��m�����7�]}%/�n���+�����%z��4����j	K��,`���h������K[�C��<�T�{��$�=���Z=��x���cK��Z+l�� ytD��*�A��:/�������T���n�9[G��������J�6������>��N�z���r�N�O��
3z�M?������z0_���z��#+:�������`6�w���P-�z�e�������z�e���?�5�I����x���ZZhiF�N�����L��&X��Qk�r=���?��>WP+8O���(���}n������[33����k(���n����������W����"�Z�{5�M`�e��Uv����6
����o9�`Xq2D+��Z���W�r�!�����9��--����/#sxE�KE* �X>t;2���e3��\b�9�y�`�����_�6��#�w��z�g�v�{�������g�O��'3%�d��N,����qO~�����=�����gsx������@\����G������J��&`��(n�4�R\���(��y�:��p��&Z���39��	������\�D�L���ypaQ�J�Iy�Mx����2����PK�w(b/PKvepK(gather_analysis_all/gather_head/10_4.outUX�:
Z�:
Z��Z�s�6����"���	h��$m�M���3�����)���D*|�N��.R�K�(�wi�v42��������7��������7/������tv��L����q��"J���J�b�7c�P���_2�~�W��-��(�
*���d4�c0�CI���M���Kz~���,ve�DQ���f8��
o�����$���h����b'6.�o~�w�MVv�������3�l�LJ�(��j-]�y��Iq�aFQqn��$,�4�I-��d��l�w�v�4������&�{���z�f�����8P^!�R��*�F���QUz�2��R����r�G��z�MO�O�X��%*���[���.�L����������%��������Mfo��667JI��06�H%��GZ�Lc���8�B����d��'�~
��,���,-w���#����wa����ZK&p��@���`"�Du�@����0+�������F��z�L�5�'���}��3�5<�+�nm��w�0I,�F<��MX&����
��i���*hl����`Y���s�TN*����a�_'$��@I��h/O��V���K�.+xiSe�o����<�{vP�l���|P��Y������S�{�0	V��t��@�auc:�X�J�3�4�
a|c6��S��8�;6�{)e�����*�T�1fJb-�5<���($��a���Ia� i�9�O��IYjN@���~:��?_--i���K5�q��;Bk ��*�i(*%US�a?�"�c@�TLN�M��z �Vw/
b2z:��0��W��~J�b��C��07�A��G�>
�&hW^o�7Vn@�D�^N�jj7A��o�z��k("��ai�fI��4�������u�zfE./w���B%xm����2L���
��d�u��lQf�6����",��{��x����3�����F6�
��8�!	F���m�2$�������Y7�?"��gWPe_��u�m�+�
t��ov��?����A����|�c��DWg���D����Jw�;�P�,
 �D��$��$��j�HXZ�T���%��g�fP_�QW1�x��}�n�s��k]Hl�MO
gM~�j��P(q�Qs����O!�M�$�c��e��"_"M%����fh���yx�0�L���?L���:~���l�Dy�.�5�9�34�h#$�s�i����hFO2Z�j��F
��+X�0	@�Zk�4\y�r��iZ?9�v����Z#�$�j���q�r���6t���������#��V.\�%�2�N�ORGT�����.�*(�{�:�z����\3i�Z`rE9����m�C�[����=�	�H���+�u������-$���"�����,���@{����X�.j
��l���.��K}����iV����6 ���+i��M+j�te!�F�
x��D_C�K��\���f��������_U�%,�"dY}�2g9P�vV-�iYT�D@�;?�T�{����-���Rf9����5d5�0Bm�����m�$���N�����P���D8�`ny��#���s����<��jS+�����nw��]
���v:}JWP����~�����xU�<������iFLH�!������J�B�+��_#��������Q&���q���.b=�)��;[�1�7#/� M����0:j�\���Z��)��C����Sa��
j�p�@s+������,5�n��S���3gz#�I������(�#U��������H��.�5��Vo�w�W���i(ug�n��A$�����z:���:C���I:��;2��`)c�J�52���L�7�$�r�X��9��uo
P.&tz���1�9��}����k"|�D�����~��c������Y��1fJ����k��T���G�,�<��y���YN>it���I��<����Q�H����z��y���:�:
��k��(��y�:sq9[�������y�F��&�C��VqrS�]"^#�����QY�����7��KO��E�������PK��\�.PK�epK(gather_analysis_all/gather_head/11_1.outUX �Z�:
Z��VKo�8��W�a������@���M�l�b��@�	[�,9z������d;r���t%��o�y�������_��8�|��a��7�\���N�����rF<���<�YL(��g�b"��iYE	��JO��	���� ��5
���i\^%Q
��.�<-�V�x��C�FI�����<�����/��4+����=DTpI�P�v�l#rb������r4�U�X�U��)T�B����?Y��y��T�������t�8������2F4���)v��sGI����z���1��q��R���:����Z{����~���G���D�����#�� :(��b_�8����q���DGqa�U�O����F0��RCx����aQ��I��[J��4*�,]�
L���s�}>����Q��n��L1���*�tv���,�mP[$��h�����p����S��a��#�N���,=����u��V�V��.�1P�	|�J�$�
�U��|��h��.��~��i�s�����|�f*�����j�C��:��3��b���]�]������(���������7�&����t��-MU�s�g��k,W&8o���R����m���m�h�A�~M����������I�[�M����vr��\_YGC��=Tp&}`\1IB��g�?�HY�HY_#�Rk�Ko���i��l�v�D?�8�+7��8�V:������������.�W��eQ[f��.���������:�X;�&%��LA�hCA�N�Q`�c�Uk����f�������/�o���x�(O���>�/�iS����x��#A�#>���qM	�����`:����,�1�`�Rf4�Tm�����NO}F��u�d��A�_^�\>�@,�m�i�IS!����&�B���-7��r��\���~��S���F����c}����`&$�N����N���wc�Q�c	1W�����������S�}h~��5�PK��so��PKvepK(gather_analysis_all/gather_head/11_2.outUX�Z�:
Z��V�n�8}�W���(!^D�\ ���v� ��X�S���-X�]�x�~��.���������$�s�����������psuq��g����l��*ZG�8L�b���3�8�43�������h^,g���8�2����7x���3��np N�
n�����IT��A��.�,��V}x��c�q�����"����
"*|O�R��>"�Q�$�R��z�l�E�c�SP,u�k��Z���gOtZ��o7�^�v��f+��`HI�|
�����L���E�M��9�����?�#��V�T�}�x�+�k�zW��A�_�a���j"���WC���r�p/)a�V����������K��FID 
�SI�ZAG#�#T!	\"����@vT>�	�m���M�����>	�(MVz3�s|�t��^�=�c��~���+M�1������)����!���S���N��A4�@PB��Ae96��Y�CEx<���J���>f�y���i��oy��SP���O5>�rnC�s��A���������1�������%�����������&\��O�E@p�d�����m-0q>
?'s�hi*�(Y@4��`�2��7?&
���];�e�������H��T��N#��
��yn�i��[�3����uud��4k{�������@p�\��
�	�L#�m#�}�����z[U��x��Fj����_a�K�#���b;�N�r��(�L&���������b���e���3�,tw7O��>j<X����0�Q"�`
�����4v|��rc�����1:�����������r>O�#|�W����iS.�������'D��g���`q��������)����i���'K�4���ndnR����~O}A��}���^�x�r��y�<�u�n'u�l��^�\���������[�B��g������w�p�yYK8����7��\<����{����/�a�VE��\>����l���s�N��1�.LF�PK-�����PKQepK(gather_analysis_all/gather_head/11_3.outUX`�Zb:
Z��VYo�F~�����
d�=��
P'u�$����>��Q��#���;�S��_:�������k^\�~���n�.�_�l�������m�L�8��K9#�EH�A�L~�l������^Y���AQ�Hn6�Z���0�w�q6AZ>FAvzP�&:��(��X�	����A���?.V�D��L���pA�MH��cD�� ���,�*��p��c����urh����G���T��a����i����d��)�^��������_�,2�o�$�9w�q���wR��J��1w�����N3��+\Z#��k��uU)���
=M(�0��R���S��3�������SD-�40N%�*�8J��g �����i��Q#�R@xG�i��va��{J�����8��=,���u��uzw/uw��|,�8���6[LaS�x���	�4p}
E��b���c�p�6��{�Y���b!��M�sy����s���eXI���������J�Am��?��t�y�&��|�r�t��>��R�a�L9�P[�)������������_{��O�E@p�d�����o,01\���~,i�� ZA�|��a�2�yo~L u��F9dQ�taM��5���H��p�����r-����D\��.m0.�������E6�I3Cg�C\A�d�E�&*&��Aj5������>�}�Y��iM��{~��/�/��l>���N6���w�NN���u��?���I��b����uR�����f�>PC���%M�tk
P����I��J&=5:F�tuS��4x�^^��DZ��tu�������[T=�+�8�IO
�L��8���n�~� �I)m�V]Ex
�m]����H����������Oo���S9���u�K���5������j�T���Ga�%F]���k���������R�c�%f��y#ah0�1O����?u����_����0����m:��G��E>�_l�)�
5���z6��PK�R#���PK\epK(gather_analysis_all/gather_head/11_4.outUX�Zw:
Z��W[o�6~��8j(!^DI\ ���md	�aO�*�`Yrti�����|��dylZ"y����G�������_p{}q��g����l��:^�%�8��r���3�8�����bF�g</�3�L`Fe&P�k5��	c!vf68�d�7NFH��4.o�0
�\�U���_�}�!N�$�[��b��EX*��
����43y���#D����������m�cX.U�o��������
2�*�A'0�Y�f���+��II�|
�{�uX���c���e�hN���+f�{�cc.�+fv�?����*J5�k\j8�����z��R�^�'�C�0����B^/����b	��8�H��j��8lZ���S��H�������!��R����6�$V�%�C�q���f`��8�=tz�0IT���#,�,k
v6w���,�mR�{$��l��TGOnc���C�p�] (!�����gB/���Ix:.�h��b
T�a���|U�,���"\�)����j��97�k�y�� W5��U�����!NJ�c���Z+L�7�/n/n�L��7�2��}��;$����mk���2�������*�t���a��J'����e���g�d�E����e�[��G�&�m���2��8
�� n�vS�gpP�����:���P����Wp�\�&*<&�!���sBjFm�����e=[�H8�j �+�	�q�Xn'�iZ�UG��~��t�WU.3�����UQ{f����������fP�%Q0�Q"�`
�c�8�����yVm���9F�xu[��4x������t�/���n�t�L�H��d����y�IO=T6N���#)�#�i����R�(���"�v
%�$&�c%c�cyg�L}~��n�<���������e
��=�}����C��~&W�N�tN������������x�3�
�2I��&��@��������������B��[�9��\=���������M]2�r�
��h�PKy�
B��PKaepK(gather_analysis_all/gather_head/12_1.outUX�Z}:
Z���]o�0���G��a���`�TB��1�����&�&DM�(1+t��IR>6��nz1+R>�{||^b}�4���t<��H:�V��\-�il:�)��(e�)B�r�rI_@aJ��C����.t����l�j�3�9�B���uI����ma�zW��Y�����������p����&����|������:+���$q�c�S���e��&�{��0�K]�{],N�	%�"�I��G^uR���},#UN��0�e=A������E	�$�2
@���,\�4��Ltiuc���A��"�/� �	�	)?'�"?�f�B�h�"9[	���=�4(�$�	�d�^��0�`U���h=��lw2[XJ\a�T���{
��#X�"�#J$����������:��[���_�F�0��]��
o�?���].��������������I��EU�_B2/t�����b�7��;���n���|h3��e� �Y���P6�@��~��*]i�4����Y����h2����f:5����6;��#=�u��/�����<�������N��P���M"zt�/'�_�N��p�f^#T8��������Z�W���K���-�u��1�hv�U�����FD���	PK^_z�f'PKuepK(gather_analysis_all/gather_head/12_2.outUX�Z�:
Z��TQk�0~��8��V!�������e����\[$��elem7��w��$,I���a�!v���ww��%������g��I�;�?��4O-� 6�
��0E�P>U>	��}2�O�
y0�A�u��Msr��G.�^��%3���a�wzpQ�u9Y.+����K�	_I!h���9����5Y��~C�:�:'��^�en��5#������?�U�?��!_u�� �{� ����������>��NW5\gQQ�d�b��xu�
����	�0������#���S���+�0�#"<��H�EW��W���uTEY�3�,����u�KH���v`��d:�����R1:"�!�L��u2�C(UO����kS`O;�B�\�����d>�������|r9���o�xU_��
�6���0(F�M��-q�?�lQ�X�e�=�4o����9�����L)��2|��Q�Y�-(7z���6�Kl�]��m�	�M��/k������7����L���O�T����L�����~tZh?w� 	�$��Dto�8t�����x�8����EC��	wwF�/Fj.���R��%8�u�:^��hO�����F�t��"���/PK��A(e'PKbepK(gather_analysis_all/gather_head/12_3.outUXDP
Z:
Z��T]k�0}����%	�B�>l�����e%���\[$��elem7��wm/[�n{��.���=�\	�9�����
L'��g�������I��`����RF�&DhI�$�����!��4q����0�b��2pinB��#2��l��%��%��=���9�eeW�x���"rf���������0�����E�7�qYZ����d�z���M�6�e��������	-�&����R��1��H�%�e����a[������LU�4���$#�2&����S!��v&�	�lt�|8��Q��	%�wL%�o��4'�y��yb-A�I�r�FU�e&��"1�>�
X�i��4y�o�l���5Y��C����D"��>Y�6�2�������c��������70�;�]\��jv�b6��|��F�e���O��O~�f�TX��6���4�?�l^���e�=D�Wk�M�63���C�3��)e� ���gm����Q����u������v��f�W[��e���03���F�}���\����r��=;�*AK��;��x���q�"�_�v����h��~�
��y�?l���:���������po�G��tp��\<�x�R���4�$����u"���PK�2J�d'PKTepK(gather_analysis_all/gather_head/12_4.outUXg:
Zg:
Z��T]k�0}�����	,B�%�
u!l])�J�1F#��HLm����t����n>�$}��.�����{t$���O_.�[����_�7�?��4O-@?6�
��0E�P�*I|�y�ib����b��2�i�N]�����uIfL��A��\VfQ�g�J�"�� !��:�����0�-�s��>�������9���<-s��m^�0�s]�G]��JPE��&�G��I�s����9q��qCY������uU�M�NF �eL�E��4�\���&����0�q<�e�l�P���0���	I}��s�A����pUQ����D/�s���bi��n�L���5YJ��d����<D��=�Tq,�����l���@M�;�_�B���Ixq	����M8���:���xU���m�>����)v�M��-i�����N��{�0�����m�f}��8eJ�!e� ���gm��%�����F��Fy����Y�VLx4�~~���P���n��q\O�=Wn}�gS%h�iy�W���'��P���n"���h����
��q���7��BC'��3��b���6��0"�����R����
�^U�j�uCz��PKH�dRd'PK{epK(gather_analysis_all/gather_head/13_1.outUX�Z�:
Z����o�0���+nUaZ-;�$&�]�ik���=UY� *�Yb����}GB�������r�����|6�������O��<�~�x�:�O��I�x�v�
�c�ri(Q����{(z�����@;��,��OR��\��0���[K&�M��iaN>\�'�o�(=����t<lJWY����wIg�o��t�����C�������J������qe��
������6�%��M]��y��,X�F��t4��(����kh�:|�SC��T"!�+���������-�]7
�����J����S4��=����F�"CQ�nz�K�Y�]js�-\$h���OFc�\�� Z�LA	1����e�Cj�%���`\Q��$�A5���JK���E��i}K�5���V�(�q���_�&��"��4�����a�h��%^0��J��2Ml�}N&����F@��6��n��G�����hj�a�������������}�	�x���!���r�@p�Xy�q-/����FZ�����a���{�<o��,������3�T�J����V�-p[D#��\o�2�(7��������X�(|y�1�ho&Q����,�N@Z�����3�`)���JRJ��m/oR���PK���y�PKxepK(gather_analysis_all/gather_head/13_2.outUX�:
Z�:
Z���Ko�@�����!
T�j_�]#Q)I�Vm��z�\�+�K��	���6B4R!<Z{��������������������TZp���h���=��%���!wE��C2��ShG��E�Ij{W4$Z(BjU�&������'.�������
'��dS�J������:�|�}��.���m^wW*�v��`����c7��w��k�hZ�,W6u9������d��5*���QnG�����^C�pU������U���Au%_r7�V��P�Mv�M�	tP�M����E1���v��*A�0wf'J���%���.�9��*4���OFc�\�� Z�LI	1��J.�Cj�e�0�� ���j����@���J���]�����z
|{��7���Ae��Wz�I)�H)��4R�h	�-�2�8_Y�@���K>'os�l#�KS��N�����wOOp|4��0�FGQ\gqt\���}�	�x���!���b	.U����FBK&�i�����/fA���bQ�J{�<o��,�������*�$����	�w�����0\��-��rs.��z>�!F�+�h�w��L�,���6�0������6��S���3�b�-���)vZ��PK'A�y�PKlepK(gather_analysis_all/gather_head/13_3.outUX��Z�:
Z���MO�@����"��j�����J@i�
��r�Ub{S{S�����'!8�H:���:����k�7�7g��pu~|�6��s��Z��4���~���$d��}��}1`p���d��t����)�$��
cDCH�J���s3|�:8����.��g�1��r�)�n[�������_�g�����Og�����l^w[9wv���V��'n��fG�0����X.l�rL����uH��5*&��qn���O��ZC+���~h�W���A2h���K���~�v���)��
��i�~H��oA���w���J��t�;%1��D�]�1��w��I|[�H��`�0O<|sI��(i�$��5v�����c[����[WEL�"��j��P��{(j��:u
i��%
Z�^�V�(���(��*cCJ	EJ)(��������0H�b��!VQ*u��e����9bv����|��������G8<��l�d��(��Y���!�{�!^�?v��Z�K����U;X���D���k��|�I����J�5z���y|g}��;T�I�����|[������Bm~e^���E#���x
1F�(�E���E{5��wZ�`�XF@Zt����s�`+���iE4���.>���N�/PK���Qz�PK�epK(gather_analysis_all/gather_head/13_4.outUX$�Z�:
Z���KO�@����"��j��n�TJ[��h����������B.��N�HBpB�8td������c�D~��
����������*8O��tcW�cT(MI�Y����b��>���=�F��GS�Ij�Re�P��FU�����{,��G�k��	��]YrJ_
'��D[�J������:�|�}����O?4.�� ��T����a����7��w��K��hV�,6u9����dm�d����8����'�a�����/C�54���j���K���
~�����)�*TU�T�Y�-(�^���r�ZqA���H�y��
C���c^x�����K	�M&���o.��P&���h�Yc�e]}��Yc�-
&�r��0D3�@�
i?E%eR�.�AZw�F�A�^o�F��pG8|J�
)%)��K�U+��f��P�b\*N�a+�P�m�v��d�m��]tij3����}����f6%�� ���,����{�9������*��R0�R���5���\NC����|6* ��S��f�7�d�Y_�AbO�x�xb��5w������\����>���h���d��-B�{����h��Q�aOK�>Pb���������>�T�/��:����n�I�N�/PK�=Zu�PKiepK(gather_analysis_all/gather_head/14_1.outUX�Z�:
Z��TMo�@��+�8��o�Q���i�Fi���)Z��06��B��;���B��9�#�e�o��7x��vsz�.�O.������g��s?����Er%87D)B�4�e~W��q��A�&aeS~��ZsIb�fmA����uP��1�G���?88�L�nb��{X����`�(�?�0u�@F�����
����
��t]����=_������f�@�qnWY2�{����.��>����4��=�m
FD�[�=�M�Fc���-��%�Y����\�_*A5��~���b����&�a�H����5M51��tRG���<C�����nN��J5s�0�2#�����j4�i�R�0�8s�u���Z�
om�"#�I�J.����j�(1+��)��z5�:��%�����m1��qi��!�8V}�);���\|�������J���}���!L�U�:��<s�tV�6J�dV������@B=_Z���s6v���~��m�e�L���gZ
J8m���R4S����qt*i��FJ��]>u��� 6��($
�51��6k^t#2�W��L�b�W���Q�
��V����d1�I��
h`<a_�<_��Ma'n\C���ev�v?7.)���)^EK�(�S��V�r�M�"���a^t�t��U��H��������.�{�u:�PKL�i���PKgepK(gather_analysis_all/gather_head/14_2.outUX�Z�:
Z��TMo�@��+�8��O{�J���M�4JEUO���`��)^�_�Y@B�I�������{of�������\��\������g��s7w���r%8�D)BZ7�eqW�����A��~e2�nn����(��zmBV|�:(L����M�,�L&K;1��}�hS6�W��P���S��)��� ��eS0�{��-�h�����{���e	���s;�<qnVy:��_��w�c�n|K�v�S���1Q���T7��-�3SN�K���>c�rAv�K%h�����D)�i��!Y�H"*E@�dQ�0���*,��5�f.���9�n(�����G���j46Yf39?78��w����������&9�@d��6�;�e�P��Z\��W���������7�-�n1�q<�#�$�O�>eG�A8����x���������7����b�+��P���d�l��Hg����X�&�����F�����7��������A�
��b"����e�9�U���.H-��k�6GJ���|�j��R#6��($5K"�u \{������V�#�`1'�aZ�=J���:Z�3�qp��MO��������=��fb�#�i<�l\������E:���)B�}������V�r�O�$�O���������+�H�&����D���h��t~PK������PKlepK(gather_analysis_all/gather_head/14_3.outUX��Z�:
Z��U�N�@}�W��CV{�%��HKK+�(U}B+g���KjoD��_!PyhGQ<Z�x�9s�x��qu|��O�����0�g1�S�Z0���E\	��!]�C����;s�H�u��:gSy��=B��/H�|����|��N������0s�����}[��mU�/�-L����mC�����'�6�a�P�M������y�4E	���23��|n��^g�b��
��.�E{t�[M����%��5�:{��zBc��D���������\���R	�O5��R���!9���I%�A����4Qc��g��0�����$�^�TKsTYu|V��N����T�����e�3�3���Q;E|F�E ��C����R�>z�W�e<x3�&.�%^v���u���Y���X�1���L&����>��m�]�J'��W@K���p�|�j�p�gf�
d�l�$�����GA�3���j��_�����������@;	J(��yRP�ike�������d���i��FJ��g������/��Q[��&�AP�����~�yU��$���-��e�(}�Y��xi�A��$qE;�/�0��&��[�*��L�{"��r�:�\���K����)�DK��F�����[��k}��l^!~���r���E �o��BT7���0�PK�[�-��PK`epK(gather_analysis_all/gather_head/14_4.outUX�Z|:
Z��TMo�@��+�8��O{��H�M�Vi�&�����Y�
cS�($������B��9�#dF��}���5�������py~r��gC���E���y�nVV>�Jp��R��i���J��������k2�49x7�iLcIc�4o���������>�������x��c��_������>�:�'�'v���������>@�d(����:����]Tp������C;����&{���/��;�c�n|K�N�S����D�5�������hl�83����h�S��"��J��D�q"Q��vxb�	�qp���D
�k��N�?�wR����e��usWX�����s�jj�!���y���Y��&�mC�gfg��)�,����4W$F�3�Q"�=�9���(�M",����|5�:�,�%�����m5q�Qp�q
G,IT��}���������x��v��]@�`��������I�I������,l���e�e6�`W�>E��?��)4�.Fv�Q��V����QPB��,��N/+���v� 5�#i��FJ����P���K����F!�Y��:k�t#��Y��L"������2J_���2�Z�~�,�1I\1
�+�����{�����GB�x:|�]����K�Ej�W������*9�u�-w�x]���H�U8]�l�6R��9�D+^vY�C���PK.xw���PKRepK(gather_analysis_all/gather_head/15_1.outUX�Zc:
Z��W[o�6~��8b
0��D��$��fk��];�)pm�&bI�D��~�%Kvl��!�v$�}�Ey:��9�w���z����x�7\�8����;���X��e�����)�\I��<�T�|�s�1*�Og�������CAX�	i\����l����u��i
��qe��
�{<8_,r��:��Z*��[W�G#�F�%!���1��i�<>;�*��?�3����F����-I��B/��D�6�['��~��r
��!�lj�3�
#���\�o��q�a]2]�s����P�,m���ihI�XDB��D��DQ�NJAy���G��B1$��4�$��	�9�������-���f�|�>�Z����dgLk1`|@��p��p���/��Z%T�*�tX��r��p�d���2+]���R<����b��n�!�
R1�h��"���L�f��m���V��VoJ(j&#J5z]R*�� �b�n����������te����3��Y���B������"���V%e�����:}���c��}����I��� 7MZJ��7i�����w�ea��K�=��������I�����b�0%"�*����4�Hh�;8yHo���R%QH�*����G=I0�"��������:��E]�u��2�(���bYU��L���o/c*n������bj���R��alY�.�	����7��>9����@pBF�)��X�@b�e@j�	���~�+g����X�qs|�������?�m -�`�����L]X<����	��.�d��������������G^w��������Zq��
��}�8���Q���S�a����j�QC���~4�����E$ )z0��Y�,*W����~4q�����n��=�������PKN��*��PK�epK(gather_analysis_all/gather_head/15_2.outUX"�Z�:
Z��Wko�H��_q? 5F�h�~D�(�
��~��3JF��`�!�_�w�J��-����(��8����6Og����������������p����7�|0��boLb,�0�
;	9��)������S1a��,�j���`8�m9_�5��*�O|�	i��]���
���E���^��)0���yZ�#��w<zp�\�z9��E#"HE��;7��6��0%q
B�a�6��/�����0�Q	��9��z#NEHE@�#�G���D�]��E/����k}3��I��:�bd}U�����9^�����7�R������n��6--0�!��J���
:)�����hq��R�*�,$r�,�4��v?f:^���f�b�!�Z���	�d',����e'��{����g�?��j�Q����q���_a����UV�
1�����O���:���1�-R�0�0w���<]�m��Yl�:�;����PJP.J#�I}�tHQ!^P3�2���~��J:_jx����:���fZUBa�5UIY����������>�!�jfmtN��|`����RS���J��
���-�.��Z����>��p�|�!)~��kZ;`�)��B|���(}EQ'�]��[BJH�
��#!�<IP4!"�e��>StQ�ZyQ�j��/4����4XV��d9��������[��C�������v�y�fN��0Z	�gf�U]GX���@��� �!�������P�����p_C��s��:�
!A���[�1��������@Z&�h���)��L��yX<]����B�O��z�o:�#��|������"F���}��_c��������Y�������n�i_{u|����M]cpO1���
�b���KkP��
v��',�����I�
�f�^�?n�PK�����PK�epK(gather_analysis_all/gather_head/15_3.outUX��Z�:
Z��W[o�6~��8b
0��D��$���k���{
\���X�+Q��_�C��[I�nE�� 9�x���x.������)�~9�c�����t�'\�<�6��g6��6�`��������)�]I��"�PN|���0*�W���lj&��\�$!��1�<����_�ef��z��aa\Ud%|G���GO����,���h��J��G���U��~���q��&�s�����}>/W�g�:B%��gs�Q���SS� �	�)�<�b����8o�(����
�l��������7��o�j��5w���u�|�
�z���y�El�������^Q
�H�C'��<li�Z^�P!+��"�C����_�C��de�[x�g�1��je7/��	�1����eg�����gp���z�Q����q���=����
bw��r5b�+�L� �Z��u���c0[�:aB�8E�}�-��S�.�7������PJ��(���I}�tLQ�v����X�[h�U�2�����f]�q���`�ak��1�vUI��dg����.���i_3kk
R��(�{�U����M�l��\mU�l	��z��������2$��UzCK�#V��(�	k[��������&/�=!%�
�]E�>��'	&eD����uk���S�^��Z��+�)����,�U����y�{����8��V�}W���+���=a��(�k���H,�ac��a�;����w�	����@��#|j=#���������.�;�>�l��_l
���g�
dUj���Y�2���A�����~�!)��i��,��]��c���b�ttq�c����_�KZj��?6������W��;�8'����9_z�}���`3?�SLo�@AZ`�5I�,������0A��!o�����������PK�$��PKoepK(gather_analysis_all/gather_head/15_4.outUX�Z�:
Z��W]o�6}���b
0�OI4�I����.s�}
\���X�+Q��_�K��[N��E�� ��x�s���t2>�co.�������'o�����o�yo�?����u�YV�Q�W�HAH�J"4���b�������Q@:s�t�&f$hJ�eDH�61�,[���_�ej��j��~n\���@���O���,��4h���T���������SJr\��]`��>����c�L��9��x��"�b@�#"	��@#�k��u����<+��������:��0��)�����x��%�5<7������6zo�������J%����F'��\5��-�5�|�*&*MX�i����M�lif��4K�C�������^�'#8�����eg�����g����j�P����a���=��M�5bw��tb�;KMpO���*��C0�:b"�q,NE��}����U��77��-����PJ�)��uI}�4��`�|�������^�+�/��l�f�ZR3[UBa�m��2�hgQ���Ns}������59)������IKCI��&��6�����,l��vi�^w�`w��e�PI�{�^���+l*1v����,Z�N��,w���$�(�w���#!�z��a,	�I��5Eo��U/�R�s����@�����B}e�,�=���8�������&�k��9lG��QDh\��Eb�I�Q�PE5aAtH#��9����S�$�Q����u�����~�+g����X�qs|�������?�m -�`�����L]X<����	��.�d�������=�����K��Bb��?j���V�cClg�?�:����-�1�g�l�/�������_���)�7vI�����JgQ��
N,V����������{6������PK�!���PK�epK(gather_analysis_all/gather_head/16_1.outUX%�Z�:
Z��Vmo�F�������%��W{��I	I��MH
��SUEY��>{�������_I���}��k{���3�����������<}� �9��[.�Uh��8����Q�!���S6`�)|����t����%�pe�QW�G�v��,�8���A�N�L�t{(�� T�~�1O~-�X���UVF��<���yd���'��G�[H��fy��$�/���8p~1��w$�H�����O�io��O����y��''#�� �J�Wf�x�gk����i�'��Yjf�5-1�'�f1B���E���_��%k�=-�r*+OQqt(�5��}���r"��e�uJK�g��&�����b�N���w@�$�cN/�(�D�s�I����M���r�4!Bj�kB�^t�"W��'��BFe������,�(2H���2���������l?�a����\P�	�5\���w��F�|D+<M�F3�S���zle���q���>/�b�E�F��=>Y�	�`�4K���0��Q�D�L4�4��K�I����:s7u�9qz!uq�:�H�F�����5)�]�Bw�t�G����9{g���1���xj��\���m�}�v�����e��'Jh��J������6���B]�	V�iB�=���Zj���4^�Ld�~����|���a��xe��a�J�A�����pp���ck�_��5�e-4�I5+�O��fY1�4�%o(�����8dy1��U��K����|�06C������b�)�4M��`f����]�v;���|�������E���
��PW��O=���7p|V�����L�A������T��F`�
�__]���n�����V��j��NG�{��d=�c��xO���	��otB]��I����T�K�BR�)��r�����S�c��U����f���R���!��o��_���PK`����PKMepK(gather_analysis_all/gather_head/16_2.outUX
�ZZ:
Z��Vmo�6��_�ab�w��\ q��[�dv���@q[�-��6-����^���S{C?�`@����y��;#�������;tu~<��AP��o-t-#�P{���/����x�DY�1�S�1�w���'��EK�����`7K��,�$�k�d�Bh�d����qLt��)
����u<�\���������Z��}�v|�v8�Fi���HS����:��:=��;���ea|��s�]���g����Yr�C.I��hf�4/�^�e������]<���H����NCgW���+�5��p
K�1���$/��SU'�8�'9����8�����>N��@I�~�2�0����"$5HI�V���w@�I��"����� �r���������R#��1�Bk�1a��a���(�Qi�Rr2*�=��6���"�c��.���"�����c}�3�K�k����8�P����|�|���y�D`/��K�^��/���<� �����E��P���H�~��'=��0�@c��'a��x�D��H����6�nm�XPC�j3�T�@��}`J��� ��+�9Z8����5j��{���������3�C�����=��K�Q�~cvW���<�������D0�Q1]w~FUm2�lx�U�L2��������v�,�6v�^��O}��E�����2]�Q����j�����o�/�X�k�Zh
����6��h�#RC]��B�Y��H����\�j/#��_�'�dn]8��@���A��O�&�n�pj{H��u��oV�S��\4_FI+��PW��O=��O����_��)�.���������&?����v5���D'�����������P��v<|���_��.�]��~��Y����	ti�6��������m���HM�-��&��%��RL
A����>�I�"HK�F���Dd�m����i��PK�:M�PKsepK(gather_analysis_all/gather_head/16_3.outUX��Z�:
Z��Vmo�F������Y�������]��U�����}��wi�����/$r��>t��`�x�y��Y�v�������<~�$�u�������C�=Ir�Z(E0
0n\�%��>E��G7��A�p��p�\��}���Xi�q��:d�$)\;- ��
B�$��*`�c��8M�_��L�O�m�J[e��>�|�"v��������-J���E��4����}�t���x�]=���C���/����G�i7�\Y7K{�%���l��+�Wv�d����!���YR���if���
1Rq)�'Th��e���`8�%1���T�`J����V�(9:����>�'sY�* T���A���c�T��� �r'��b�; [����&��u�D����$�m�f6�����j����hL�~t��T� �oM�!d��k��m���E��}V_�E<�}u���.�g��$��pu
�q����������/S�`��GK)��*�P%��
����I�o?or�C��~-�����@o�,\,���G4��1J�f����	�P1m$��m�@�%t*7�����{A��� ����c�p6���[������D;{�g�@#P������J#�}G�v��j�<�������D0!����R�U��,�b�bO0Wu1�<���j���$Y.m�:������
��%K�
�e����G����s���m^��5��^��h����qKF
}�j
�fcZb"a(�^e5���������s_I�:6�N=d�]Nmf��I��fw�T�A�!D@�4��������|��	���y�{NA(�dft:<_-�GV-����nF�W��lt7~wq�H�\_j0�N�P��/A��K�.e]��������S���{g�iP�.QrA8�Z�A���PR�i-���l'���,�����T�i����i��PKyx~��PK�epK(gather_analysis_all/gather_head/16_4.outUX#�Z�:
Z��Vmo�F������Y�����I	I��MH
��SUEY��>{���t��;�/$���}��������gf�������;��<~� �:��[.�e���8s}���S�!���C�g�!|t���h�p����6��
!�R��e�	�wZH��+�q�n�8���O3��o���x����U�_�S���G�}�v|�v8��$H]�'	I���n�:8���g$�H����rO�m.��o����Y��''C�� �Vp��2N�\���.��8ON���Ngb<_x�hV#4.�~��
�d�$FR�+���%�BNi�S��{
�Ef��Q
���g�L
�Q����TUj���b�N���w@�$�}N�/�(�D�s�I����lZ���Q������������K��)hd���BFi������,�(�H���2�������o�l��a���k�\Pa<�+����w��+lj�<���Z������
����q��o?/�b�E�Z��=^Y�	�`����0��Q�Dj��(C4b���(�:�
�T	�y��G���3S�}�We+��(����s�p6E���[h������������!`�������2�x���	��C�N�q\����D	����uE�ZU�
��C^�`�"*�A��)������O���F���9����/p|4�3/mz4���"#�����~c���'�U-��I5+�O�G�,Q�����������n���2����y�O��e�����=�8l6�����]Lm�^���nG�Y�/Uspt�Qc'�'������f��
�������P��,H;p:<_m*FV-�
�������lt7~sq�H��_5�o��w�>�$YWv��2������w��N�K;��v������T��n��&��+
1�84,�\|���������O"-���O�N��PK�	��PKkepK(gather_analysis_all/gather_head/17_1.outUX�Z�:
Z��U�n�@}�+F�C�RW{�z�J�M�i�&��>�
��������_�Y|!���C���Y��sv�����O��_��dx�o�C��8I������8����CH�K
E��0��L�l x�v��6����I*MD���9k^J�|������9��$�i���p:-��z�7�GF��1�������3Wl�U��M[W��my��&,�����i[����rE	g��27������]f���#���-|�X�i� S��C
i�&x�������
�SWz7�<��`�1�#j\���&����E"�&:�5��0&��8S�0U����,���Cw�R��(�\� M2�x7'i���j��[������Jf�#�6�t�N�������F��4��b��E�M�c8br�0���i�R�p_�bl3�3%���5����	S��C�cJIN"%~�i�uv���f���($�X����\�g�����%
2����B�}>���������}�0��y�
�r����g&>L��]��#$W1������E���%���Dj��:�}����e=Z���/�e���J|b=�>a�w��O���>pa��WG��n��z��D��-��������4�JIM�Zk �$n�w5��u�/����{�����7��x���>1����hUo3�H5B0��Wtm���o�F�A"�o����p.�$�B2��(�Q�\,�H������������.X�wM�*��j@Rd�������?w��M�����|�W�����������^����28����d������e���x����G2��8wk�z��OPK�d�6"PKqepK(gather_analysis_all/gather_head/17_2.outUX�Z�:
Z��U�N�@}�W���J��j{��R��^(� ��-�*�p�`o���;�	!B))��^��sv�,�������_pr�=~�����O�G�8v�~V��4b2$BR�%�<�-:n��uoA����$����0�pB����$�&��j�L�>��I����an���R\�eq�7/�C84nd�����������B1M�W�Ve���e\d���8IL��A�c;��4��6n�NL�b��I���r>��R���'S�W5Tkd�h�m���p����hQ�����'�
U%ZG��"Q(*$�(V!�,���)�[��f��y���Lz�����I����1I��Of���bT�}@O�2�@)��E;)7�h0�'4
]OJD�V���#����S{lK�K��X&Ilg���&�,��fN�'r;L�sF��)?`�c����a�7L�Xgs���M:@�;{~����v�r�����O(H�����
��vq�=�����r��uO0��qv�
~9[���g:�N��Y-GH��{�[�K]D���A�$]o�#����r]����e�����+�e���J|b�>a?�w>���m����W{�����{�\�_3�P�Rl��4�a�w�5@�!Q��N�2�l:��v��8>/K����:������$J�3�D1��T-ksE�/����X��F[��Q~M���9-�t����@�����d�Bb�\a'.����g�`<0JX��c�J�����DY�]�z�[�I������n�[(����������jmc�YG��_�^f�>"\1
8�����Q��
��+SD���r�[���PK����7"PK�epK(gather_analysis_all/gather_head/17_3.outUX��Z�:
Z��V�n�@}�+F�C�RW{�zQ�D���Fi�����
���K����	�$�����+{v���3���������O�M|�<�O�'�<��q^��4fRai��B�/�'?�����M�'s7`�*#�V!�u�)��>{���s�WIf�����tZ�����$7��}��U^[?s�v���h�V�A��V(�c�%���2lR{���W�(�,�Y�&}�wy��e6��u����'�e������i(4���7�i���24���n'���`���ppA(!����� ��wIQ#�Su	p��{�'�Y�]����"�Q��r+@�d.�nN�����*�ol9ao�Af�#�6�������i��Q�A�a[L�A�������}G�@��XX6M]
�+\�myU��9E��Y0�0a���4�o1����E�[	�;L����]�.�M����j�������b<�E��/Q��8��E������^/�7~�=G�p���7,��U���}����jGHiTEl�A).x�(I����PUO���]�/�z�_9_��a�sT��q���y+�T���&�����
��E%7���D
��ZJB�ap��z8�Qa�I#�L�`��|���n���>���~����7Bo��(I�H�\��##���N_Q]��d��.�����P�|�M�u��,�l
��z�������b�E�E��6G�����]�X����J3$EfkP��.��I�N�t�ui3��<����
���6�nn|z�C�]#���X_����.�cD�P��8�v��OP�����":��n������PK����;"PKWepK(gather_analysis_all/gather_head/17_4.outUX�Zm:
Z��V]o�0}����H����E�`|�ilB���k�6Z����V~=��G[�A`���+���9��8��>�\|����������8�������
8��!��z,)��m1`pO�l x�f��&�����E�'d3�%Y��g��2�x�&��[N���g���tkszh�2<�7��l��m@t��V�ey��BP�k�g%�E���e|��k�p��4��>��f��e:�������������r�NCI��9C2�xWC�Gf����N�'68X��0�E��B�3�U�*(�>�5V�F`)�e��7`�x��)�����JE#���+@�6vvN���/�y�oM1+��DO�2&���U��<L��rM�+N�R����V��X#z���_1��7L�`�$�	\�op96)d)�5s�<��`x��������cJ�'��#4��������*7�i>=��'L=������{0<{����Y����i_.���rxy���z�v����Z58��L��4]��98B� $�{\3.u����������ReoT����:�X������]�i)��7�����y�+�\���:���m����GQ���6���0��P�R*��Bt(� ��I�z�e����>�U�`��Y2O}����[�w��I���(�:�i�e�j!���+��[.���Y��e4j%�G�.�������)���Q�����b�D�\a6�c����>�����j@�-
U���.y��5n���-M�b����-�����vw�sz�6�q�ep�����e��`�����y���;;^�e���(�+,��nU�^��PK�+6"PK�epK(gather_analysis_all/gather_head/18_1.outUX&�Z�:
Z��V�n�0��+x�����tI4)�E����DLD��j�~}�Z��Q'h��i��y��pF�;��u��;::x��Ah����	:�k��E��3k��V`��-����jN	A��2�����i��:�P�k7�ZK���������)+�sX�M@�/:)�;rJ4�����H��C1Y3.�x����������.�Iv^�����'o�o��2	n����U������gU�����N�����;���T�W���;���DPMs$L��s�#a!���'�Y/23#�[{_�y�;��P�]��E\���m��h�?������B�4���,��A�am�!�X������� �
��8�.b��Q��R��f����X�8f,\`������(�c� �6�Fq6
������`��y���n��p� �QZ(��AhFjl�l6�$Ox�63�;�zAqvg-�e�|\9C��^Oo�tQ'y��j�D�8��f���+���A;t��'$���&�`F�]��9y���+�(+��)��$������*�Y_�d+��M�#�;����erjL<:�}+@��BGY����x��AR���	D���2x�:Z���^3������kK��G2���~9T}��WB�)&�Xk��G�H��e��]��4�Q�o"\��C`�bh;�0	#�%��N���'f1L�����������,��|�.[�u���n�,�A����;��%L���Ad�C ����O���pfv�X�M�C]N�����I{���"��
���@X�H�����	o	�'�Au3���f�w���������b��F��9�f����]I�S|\�D��zG�7(f��y���i:Z$�B��!������K�����
%��Xh*��v{��PKj(�|�PK}epK(gather_analysis_all/gather_head/18_2.outUX �Z�:
Z���]o�6���+xWh	�����#���E�`ve�2��D��d�~���GI�`�o�H"������������g���w_bBfo��cFN�m��e� �Yf���vsG�#U��^r���l6K���<IC��$d[�tZXMc��S`���,�pv4C���%���	(83@5��*���O����b2=�m��������W5-W�I~Ue�'��?����u���/>l���\7YzY�_���pW�/��_G�_�z��������$7�:�9��QC��XK�������l�:���C��MWqz�o������������	�0���2���1&p����{�Z+���@#��K���Q�P�q���'W��5CZ��p�j��Z}���+��8-�T`B��iA�3G+j�x��Q���H��?]��B<v�j#c(5��N<i-�P�20/��6���Y������������W2�����N�MR�,��������bQ4x��`�gHG�����I�v��5p�>1'�������$����(&��R�f*����n�^�1�u�S*9s�j�������w���D�����IQx������$M�n^�A4�4�B���u��<��<)�{&���>e����Q��+���
z��3��U�V$y�sr���y��,v���a4�`%�-���j���5\�Y��4^��J��
\��9�D��k�)m�������
M���G������%�~�@���/���i_�`�X'�CY`]���G��I����vh�;���,E��|x"'\��\�����#������������j�����9�{���
��y���iA���z1����ip��F�1*^�m�m=#�7>mB����A�������v4��PK&sW*|�PKtepK(gather_analysis_all/gather_head/18_3.outUX�Z�:
Z��V�n�8}�W���%x�H�n7�.�i���������,%�&��uM9v���������s�hF�������Wtv���oL������������<-�p�8��l���bjQY|��)!��_����Gh���N2���-*-��R��=�cXV��;��j/_!tQ�;RrJ4����D
e�f%cV���������-PQ�\Y�b��d��O�������f�J�������X-�M����a��m�N����������Z���,�%4R&�j����)�`��l@��L�������weQ_����
���8]���������#���
n�N!EO�(�
��)!�VM�b��2�Y����X�N'������� )����e�k����`���,�������>V	"$flg������B���[�'������@���`l�Jb-' �I�5�����]��?3�;�fIq�����[�W.���7�'��I|�;B��N���"��5��� ��:�W�����!���f����.	kW�EPVjqt��D�����T	�z���20b!H30�������Qk��K���d/+t�%y�������&u���q@�,)����U���)��4��������$��+aT1���vq�+������X��+#OH�d�����Ai��"2�K��C00���	��'�j��4�oIA���h���8Y�[������m���|~����5���G����t������'���s83W�������q������m����Cv��_	i�_:�����I�y�T��L���]zb�>��W��A�A����
�:����~�|��a�&�@�(f������i:Z$�B��s��f����u����>h�5��s������'PK[T;�{�PKyepK(gather_analysis_all/gather_head/18_4.outUX��Z�:
Z��V�n�F}�W�[d�Y��"@��I�:�k�(�$���Z�"mr������j����C�E+R�s���3B����~?�������� �x��_t��> �L�*�g������[L-*�������m��5=A�$
u����nm%��$��V�1(+���,@����,�#	9%�aE�XI�P���b���t�v5Yu�+T�[WV���"$�M�S�~:�������l���?��+�+t[���jx|v���S���q�u��sR��_]��
	A5�VG��K0gs	L)�	+�;��Ln����oZ�i]�b�J�n����2�G����W�Eg���a  X6e#����	B���r��jY(��1m����x�����'����e�k�c��X8o�0��^Vt>���U�����$ e�{���=�	h����I��q�6l��1��A�	�f n��X�<�x3�����7gG�xl}�Yp�
-�z�|t�m��������T�Y����Og��v�.��P����d��2��='��Sv��k�� �b���"���P|�Y�[rP���LH�������x��������
�gI�;�s�u�gI���WE:O�����o���f�j-
f�f`JJ�s���6`\����nZ(_�����X��+#OH�d�����E�i��"2<H��C0!�'0�m'L4�QJt�P~ 
	�S�`����)�!�B��/K�K�uw-����U7_67 B��~�JHL���Ad�C#B$������s83�+���P����8��z�h/6�'d�;�����4�8���P���5�Q�=0�}���I�����m��:r�+�b�����~�|T7D�	�g}D1{X�'�i����E1+4oh�
��:�si<�h�X-����]�v��,�PKJ,���PK�epK(gather_analysis_all/gather_head/19_1.outUX"�Z�:
Z��V�n�@}�+F�����x}A%I(B!��z��
9f����M����]ls5���}��e������xU��C��N�����G���8���t��q���G���b"EE(����Q�#�I��V�l;<�]�lJ���"S�J�����@|W
�PN��1�v�w
��8�c��!�R��LB�l>��2���5S�����#��0S0���'?��a=��<:�����c��3�-��{v��`�9/}��,��]��&2E��$��"-,8ti��:�0CW1�*R�t��b�ul���6�i������2�u������ �Y���E0��a3��FH�[Bj���3�X���$�1����r0������+��^�#qv���P:���K��]�L�0-
���q�yB�uqt�����.�f�Df�7������J��V����/_+��f$������>�pP��p����FK��	�I�4�|^����<��G���Bd	
������Y`1����@�N�oB�w����03e�<���>%��#�������h�q:J��8�h���	i�����
�b�������2�Z�e��bB��������{�,�-3d��|wMX0�Gt��F�R�o�o'y�yc�KU\6;���>���k�7�uyH����8���=+�����4d���y��{	/W�:\��]��d�%�����rEea��w�DR�7���Y�������r|g{W��a��l����ie�[�S?;Ls�}h�Ua��?7���A�#�0�
�|�N�����S%���oe��7�
�PKi�MPK�epK(gather_analysis_all/gather_head/19_2.outUX�:
Z�:
Z��V�n�@}�+F�����x}Au$�P�B�Ur�
V1�c/M����]l�&��j�<�h��93>2�1^U�����B�]��o&���Q8=�1��*�m6a��1�uCSL���Y��1�1<�!�U�e��S��&�6M
iF(M��^���R�rz������S��F9�i�)�ifA&!���h�@�Z�)����T%����d�%�)��iC�s|�k���h;S�g�H�]'�L���K_�0�t%W/E���f��^d��%��9B[��D���T#U��Sl���Q��I�DM�&${M���}�}��!�!�,�(b8D�-Y���6�b(����B�1��Jai�^��x�<N���� �+}f�@�s)���]��C��t.�7�j���;QZ�����^�������5\�������o>'I�~q�fW����U_�V��H������dpP��p����zS��	�I�4�|^����<��G�9�Bd	
��Z�,W,���1��X��I�M,���b��e*����-H������>
<��t2�{:��a(�v~�� 	�\|U����0G�XKJ��j���G�	���VS�f����P[f(����{�x��I0����[�����^#{:����h�r���v�c��t��Mgs=$����9���9+����Nh�����1�^�H�p�L�.QI�%��wIu���4�D�|�H���U���m���*�wvwu��6���V\,{`�V���?���v.����*Lq�r#�NT
T��*L�4��;�L�]��i�*^��oe��xT
�_PK"L�w#MPKtepK(gather_analysis_all/gather_head/19_3.outUX�:
Z�:
Z��V�n�@}�+F������6F%I\�B�Ur�
V1�c/M����]l�&��j�<�h=�9sf|�1^U��`�n��nt�7�c��(���xQ�6�0Pv���
S�U�U��L�(x��
<�!�5�e��S��&�niU���,Q����P|W
�PN��1���w
��(�#��!��Y�IH����4Z3�`)a�%0�R0�J���$>�=�b�z���a
�]mg�������D�	�z�f����R��	��d]/����C����-
��3A���JOL#�V6��jD"����TkBr��Tp<�z���wB!�����`�2�|:��T�b�
B��0�M�EM��t��$�1����r8��J��A�@�s)���]��C�C�\&o�R���Bw�DiQ8p�;�{YG�P.�(�������'2;���$���U�]5�?W}�ZY�6#��D�� ���'�k��M8�f$d&��L�y���w� d?��\�%4D�j!�\-!Fe�a��@�N�ob����5�*6p�z��!}J�r3l�4�D3���@�� ���\������tk��U���������$��e���G�	���VS�f����P[f(����{�x��I0���5Z�������=��G��y4u���K���h��������t6�C"=L�3�,���!*���F�]��(s/)��n����KT�{���]��VT�����I������\c�m9�]e����0�������el��v���~v��e�u�1j�gn��I�G�L��O��r&�.ybi2t"��c�^�B�PK}�$� MPKpepK(gather_analysis_all/gather_head/19_4.outUX�Z�:
Z��V�n�@}�+F�������F%I(B!��Ur�
V1�c/M����]ls5���}�F�x��3��������}�N�����G���8���t��q����nTul"�"�fD�c\W��
����(;.�:p6�u[SC(���^���R�rz������Sh�F9�Z%h���&!�[�h���5������p�bfp$����%�)��iC�s|�k�o��8S���H�='�L�����71��S3�@X��&�lU3����E����#
3|�z��!5��j���:��>���������_��<�A�>@�u||�:B#D�a�JHC���k�0R41Qb���ja�x/FI�c�Q
��p��>��R�F�R������P���L��T��������p�>w�/��.�n�\�Q�_�E��,�����s��Wiv�*��_��ke~����:qtcCuO��K8o��,�����N3��e
�����xtZ�"Kh�$�Bf;�P��=�i��N�ob������Z&6���l�C����4f���i��f�����A<
C�������tk�
V�������� JR�-�eX�-&$/_�M������Bm��0;��������`HSk�����v3:�C��y4u����f���iC���������Dz�:>g�Y�g%CT��	��;�u^ �^R�+�=\�C!�.Q��KTu�%�rEea���]"���Vq��+�6-'�������s��s;[r���mZ����������4FM������;iP�H�+�`��D�)g���o�'�LK�_�cf_�B�PK���l$WPKdepK'gather_analysis_all/gather_head/1_1.outUX�Z�:
Z���]k�0���+�]X��Y6������#KcWEs�XD�SY^����V�v]�����HG�}�^�Z|�r1������;������T�����1�BBQ��0gt}�d�����1�xnZ���Rd���b� ����U��{��w���J>��u�9_��Xq#��a���x'��/	p�W���>u��iu�C����d%�M�<n�T/j}�G��r~xD#��N#���5����,�I��0E�L�������_����� Z���!^qS�d���=Bz�B�����{)]|��Z�f�W�������������j���V��/y
?a��4:�k�,r����r�-a��������+%,�-,r^A]A�.�(>���(%��[����O|�����$��u�Z4���AH0���B�R�S�����e�����$alG��8K���E/7V��nkz5�PW��Zs����?����jG��Q�9O�j��AQBi�|�K����yn������VG�yw�y�PKz�	jPK�epK'gather_analysis_all/gather_head/1_2.outUX	�
Z�:
Z���Qk�0���+�[X�$��d�AmKK�0����h��l��������8��&
��"c����Z}�|>�
������	����T���rS�	N9���	
�a�XsWO0���+&$J�0�k��J9I����^4,����l��8��8}p�*����Kk���je�J8��@2���d�����3�W�>��������w-Vo��V���pM�{{�za��%I���<"
�q�������Fi��+�2��F���3%Kc�IQ�~�T�"^
WH�d��v�$����!Ki�C�l!	�K��c���0�����N������[���NysD]<�5���['�YKY�x�!�%�Q��[�x�?�o/��ZjX�[X��SA+WN�
Q{#1C�M���"���/~��E�
6b^Dw��b���(��.�v�f0�7u�6�v�o'p�9g��?���$�����x_>�����������#�}��%|����K�8�3��Uwh�'A��|�S��S~x��6�QH���_G���� �PKI&�!
jPKhepK'gather_analysis_all/gather_head/1_3.outUXW�Z�:
Z���]k�0���+�]X�d��������#KcWEs�XD�RY^����v�t]����������x����b�
���O�w&�7�����,�e��S%�P1�P?�1}WM	�����~��0����+1eQP�"��~��(���;�\;&�.e��|pet�=_��Xs+�PL�����A}N@me�(�SP7F���?_�qJ������a{�z�����O_���^#�s������`��r����*��Zf��-
m�IA�y�4�A��6�� A4>@�p $��2�'���]|�f#Ls��R8������,?����sc���W�s^���9�x���7��&~�Cw���r�Q.��P����x	��&]ZQ�|agQFb�7n��C`?���/|������G1F1�wY%��ha�RYaR��*��U���S8#IO�0N�q��Mm�^l�.g���j�A�b@���g��b�����k����<%�u{h
,
�p]��%��J��n��Q�p���H�����PK�8q�jPK}epK'gather_analysis_all/gather_head/1_4.outUX�Z�:
Z����k�0���W�[X���2������%KeOEs�XD�SY^���;[��vMFv|�����t2�����g�o0��^�]Ng���3YH0���Li�(
<7"�;�	]��S
wre���Gc��4\����Fq�.!��n2�����s�`;&�e��|p��f{�^k��F���I`�E�
w��X�*�gq�����4������)Y��p������e��(�l9/8�1p�kl]6p���ZW�h��b.���U
���6u'�R��L~���<�E��&�	��� �A=}2���"{)�]Wz#t
s��R�����o�,?����sm$������Y#�;?
��5���YDt����0�C�O����+%,�-,3^BUB.�(>��E��x�����|�1���gx�S��}�FI���%4��"�v.�:����s�]��?��X2!� $��I������.��tj��*���8�?�
����&	�cgJ��.i
�M"v����#�y���{0��_G������PK���jPKzepK(gather_analysis_all/gather_head/20_1.outUX�Z�:
Z��Xmo�6��_�/C�a!�.�X
�k��$A�l��Pm""K�$�I?���(J�,�����0����������~����������������o�i��J����(O8�FH�5�x����sqB��hV�O$�a8-�a��haO5RQ���x��,��4���<~��U�#u���lR�G��w{?F�2������$��v���r����L��s4�aVT���E��j&o���8(��(���Jte�-���RsI1�-lt6)5�X�M2�x��pE�t~T�����c4l�t�7���8JlT����=)��]�O�������	�������D��`^U�%V;G@P��
�#8�T4��v�+�W�]M��	J�2J��B���.���v�`���?�N�p�����p���x�1������]�iLl��k2�w����k��&��'���Hh^��C*�����}N�X%�����[�S��\,�f� ��8Y��Ir8�B�"
���]�N��n;i�%V��T���
�������F�H��p����a9_���iv �$k���5��j��^|���76/�E&�R�D����p�L�;�o���6h�
��45Z�-��� �ZN$/V���b�+�
�_�nJ���������d�AP�&�F�������d�x0,�O0�oZ�Y��8���^��_�-�[ �J���r�
�	����t���J�
�lt��V&x2N?Vt�3�45�vMo���$�]_�>���NI��)��!^��Jg�H�������~�t�a����
�������2
s�;n]�q��C��Q��C�fU�����+���������wG����o����9�5J�������a�`����620�
��Wv�R�H��Q,��7
2��0��\�5�}�.�������7ZKe��$k!�J�Y�r��Z?*{}��g�6���Q�K�-:����q��
)`��|��!����k �@�+59�����*�����&��N�yHU	Eo��R�\��Y�=���������U�����C�s��MT.��'����b��(TZF��ZSM��w�*�b���1�D@[�J�o������wOo��p{he3�,_���8&>Pz��z}����X��~"d\}\�$E.2@	Y�6�%M���vW���Sh�����8!��Tr���#,���_���5T�<�6��t�s	���]%��9�������/\vw�|��W���R�{��X�tj������nL����P�����u����S��/�q����VoXL�Oa���� &KHn���_��=*��aK��2L����W��U$�wr�B2�n��E1@���tY�h�_��@��~J�5�PKT�<fmPKyepK(gather_analysis_all/gather_head/20_2.outUX)�Z�:
Z��X[O�F~������*��xnQA��n�@ZU�y��Hlc;����{�c'NC��z@�2�9���u�������.��\���������yT �'yq�97�X3�W�(K��'}�&��D����b�P��	c< �@b�|V�G�$I����"t�d��.h����R*��>Q�H�Yd3���pn7���b�L��H��4�a�����y���L��]}��\���t*���G��$�klL���6��'�	�V�L	��I��7\���R�������8���}@'h�6*�����]�@�������]�Zq���6������3�h*�b8�
��b�M��t^���w�z�(�QQ�, c�c��������FmY������������S:����_o�]���n���{��.�(�A��~T{{����������?�4X��?$os�!��������)�'.+�|�g�^�vr8�C�"
g�W��NKS��&Z`Y�
Y��r����Y� E��JW0�G"�CXLW�m3���N\>3�]e�ZS�ym���`'>�N�[���r��R;"d}�Y����'�wd/�U
����
-�D�.�4���.��B��d�+������ZE�3�1}C�jj��j8�F&��0��a�oZ�i�8�?�J[��y���(���/�'�!���!��Q��6.�!�
�FW�XK��������C�c����-���z{5�4��.�C��������;�ew���l��M@�����;��.J/�,���l����X���H#�>�n�Q�BY5L�8h�,[t?-{�`8,�}��}CG�ltc�;�o�����+C9i�I]����0�b�s��	�E���h�j&u��qh5�����T=�
�dOR���%����Mfo����R��~!�c-�&�9�lzk������.�?l��Q���
:���U�z��J��MU��:x>49Is�<i�qR���eU�GJCS] �l�y�A��(���)��+hS�:����z���������������y�0E���{�P�@�����(��H�/e�%������
s�8$]%�`��BRy��0v�o���94���+��pD�����/��v��\�C�e?���_����S���)��,����v�b�K������<w�B�/�����	������W���c��[�L6�(���_���9���I�?���w���j>J�����p!�5�����M��gLyaB6/L��R���:i��!���>q�����l����a4�+`@�P��1:���{�/�����E����kT$=�wr��0��y�C���xQ^hU�a\��b��}�����PKRH�`lPKNepK(gather_analysis_all/gather_head/20_3.outUX%�Z\:
Z��X[o�6~����{X�/�R�]���Y�a����D,D�In�>���PK���q�am��L���;W���?�zs�]�����!48���:a��p�����
)�a��(K>�g}
g��L��bD���1f�������Y%I
��0x���$�F�HG�&��������4�B��|�1��+��l��$=�@s�y��;�H2X����v��r�����9�D�n���0n�1#
�S��F��&��k%+hT��z
W�lAW�R��aT�l������=�3����Q�����AW�#������f�XE��\��D��`VU�`V�@P��	&Bc,$�B5��n���kw���A���A&q�=6X}�D���<�F���;.�	8>y�����'���t:�GWo����'���k<s��e�7(��OkO*�on �y�n�,��g���p�������F��OI<��X�U9a����eo'��]T�A~v��D0����#��S��QP��Bc��Tc�y��
���#z������o��>�@��r��!��������X�������e9���8v@�Y�},�����;���ZM���F�v���C[h�����z	WL�`��`;N����RK�(�J1&�n��O&�m�l�b�0�g���b��+V�J���� +p:�?�L[��{���(W��/�'�"�G���*���Q�7Iu����g��F[@���KSc����v��M��u�Q�:"�=v�����>��;�K�Cw�{�.������ ��Ek��;�z�eDY��x�m��C����z����D��F�������t���7�����A��*��RN�sb~k�����>��&�5�PT��j&M/�1��R�J�y����?&�X&����U1K���������[��R3�B>G[�s��B����D�j��V������"���;t0��>����ZK!���`�3@��0P��G�1�M�`����1�0>�����������4���tQVP�:�o�HJ�����*����j���������������y� E���{UPGl��c�*����v���*����qf��TJ�����AU�+m���i�C�0��d����ZqJ(|����#���5�N���%�Y��q���
��E�H%dY����$v����$�U�@��

��1�	��kw�X�s����kc��ZA����k���d���j���7	����\������[�
|v��|��7���p��5Hs�m:5�����-/L���0��+�7������������K���wz�||���x�1^Br���d��Q�\;���A\�0w���]�"T�zo �����r������.����|O!��eu/3�PK���u]lPKOepK(gather_analysis_all/gather_head/20_4.outUX�Z^:
Z��Xmo�6��_�/C�a!��b,���^� K6�d�6�%E�����w%[���q�a����������H��/����=�<y�/(C�w���:��q�P����
)�a��Q�~*�(�O��������"��2��3�4�Kb0^>�f�,M3�����]��7R�Z��6uA*����!*Y6�]��Q����w����!*���M]�����<�a6���VS<�W�n��A%�v����I��i����6�6�$7�(�)���aW�lA�R�s<+]>D�N�x������K7���eg�+����G@��a}U��Q�P�9k��?���$������N0c�a�h����W����G	J�De�&+���`]tu�X��Hue�wR�p|�����?O���x����$�Y���XM&�>�]qr�������(�yw	��uH)m;�P��	N���{�#H���4��X��
A�q�z�Y��|q4�?���D0����!FbU�
U��js;a��S�R�k�"�{=�G"�mTNW�m3@�'�B+���[*,��5����^|����./��,J��������"O��c�Z�B����h�N�RA&�Zbl��sZ'/�!��`��`[A����R+�(�F1&A���d���0��#�a�H�qzi$��
L�#<��g#��������,��re0�P(�7QL	|v�3
�Ch�|T1lT�M@~.� ����U�{�������(����HuD���N\����d������:��v���}�^Fy4���Z����^}QVa.�[_q ����*��AdY��YU��a��K��:�����Ixs���J�����=���qi
2 �1�cRk�����&M'�IHu��4�7�����X&���VU1O��������[�2�B>�Z�N�%�qh����A*{C��&�����(��������k!�Z
il��<�|H�*����P=�@���c�a|Y���������i.O�����u\����s�P[vU�#����#��T�#����c����U\����eAi��W���?�@gj�������"�)'�hL�Ym*	I�l� W���m��aM�l���8"Rk�)�����������=��j��!���+ )�h�Jp������i�_mwE��Y

�w4�w�'���]����	������W��j��]��n&^�i�0�y:�����t�C��zW��������!h^���5����%l�e�[]�t����J���:Y��!*��=q����VoX���Q<�+�AL���1z��{T,�����E��1���lW�Hz�X���"8���E��w�Eu�U���0N������I�z�PK�f�YalPKcepK(gather_analysis_all/gather_head/21_1.outUX�:
Z�:
Z��WMo�8��W��d�!HJ�26�4��n�	�bO�*�`Yr%�M��E���8�GrY^$9!g���7C��t}�zy�/��:�~[;
N�/k���Yb:�����i�(&��;*�_�)E������1:�b��Rd���$U�{��
�-��4���v<�N�#t��c.\���is~U�1�Cgt������������Q������e����|6v�������k41���|�lfy�jz���b���������G�t�$X�V���(�RX��Gb���"5@��_�>�����]����{�z���^��y)4����K�h���>3/��)?I<���������6�5ct>���&��<�����!�A+v�B�4�6BJ,$�>�V�Dvb��T�NP�C�u"d}'�;��)�����{�6�oy15E�n�(�n��*Zd�d�M}&����D�0^��jP�1�<NX->L*-&��U��1I�)@1R������yD�G)]}o}�-��n!KB���&$��n{R�%dp���J����D)����_�p�q�lbJ�,�����r�#X����`����q�6��@}@��
����J��&����_�y6v���D�m���>
7Q��I!���8�P��4�Lb����@+h�"P���
�������=�#�x��e��T
d�p�Z���0�I�v����(�g0/����Dc��C���y�l��h��zz���[�[��o�.��s�����"�[BxT3��,�%|��u��eto��|S���'�%�v
�
���!8��l�D�$���@p�����+���#����<�kYY5��:��x{v}q�n8�>�'Q�J�6�~u�^�����9����F�z�5([���!�8����E�����x�(�]�x���8[��JV+���f�!2�i������ll��$��QCJ����0����^����u�7��/�����ja�(������Q��5���'R���+G�1�8������Q�|T����F��{���JT�[��.�J���M���8���K�f�Y��~X�4�A	��\
�|n������pA�rG:��m�<�8���}1�(�t[
���c�z��l�{�L]���c":���,m�Ema��D���r�po�hV�����hQ��D����_�j����PKoq����PKmepK(gather_analysis_all/gather_head/21_2.outUXu�Z�:
Z��XMo�8��W��d�!DR�26�4��n�	�bO�*�`Yr%�M��E�����Nr��X�E��{3�##�������������m� 4:���]E��@�(L���yTKJ�G0�\�,����+��S��� ,VA��ha�%	���w�h���P��K�:n'��K��q���	������?�ql�������c�����w����82�'I�0�5�M1K�cT���k43�2/c�li�15?�,���������}f�������V�,����)�%/�+_�%RD�4���lw&����E3J{�����QrYF�+�hE��w��e��X���c��6Zj��H7�k�f�� &t��;��:}"��%k�.���g�QK�tx���	���ZAQA�E;����DM$��	�.+�}E�u�.��fs���&�����5W�*	gu��3�6
���=��X�Z�5U�s���RiQ3�@���c&��)�'�:�����ds?x�[�B�����?�]H�5�S�5d[=��P^)��)�`��.h������Z*�ETi���a���)�gB�5*�G����#�)'�	R�%�(��w�������Y�E�L-�
:�iS������M�qlb���.�&(�feT���Q��,5����t��5�#�X���������lS����&ZS{�A�O�t���}������A�r����+��U�EW~:���nu�z�z��������%h����)rH�h
�Ab	��>���<�7c���u~
�`�88��p (�B�S
Nt��8l�l�DQ�����~g
��G�a�A���|���{ )GP�wo��/.������,�^���f8��f�>��E�D�`z]C^�\�B���
Q�
��S,W�
*���@���Z��`h�	������W�Z�t�0[��y��A���p����dj��D��IMJ���&�>#(��c����M ��x3n����U�U%�(��0nN��Ps����� ���AJ� �r�����y�A��H'eh�r����{�D��y����i�.��k��d��t6@��� 
�U5�3"�z:����}�������9�q:��C1`m�E��b�0p/|0����������Dtv��z��5���Z��Z�-@c�a��h�����	W�h�~���L������x4�PK�����PK�epK(gather_analysis_all/gather_head/21_3.outUXy�Z�:
Z��XMo�8��W��d�!�!���.���vw� MP{2T����+�m��w(J�>�Dq����Rlrf�{3�7�/_/n�E���Wo������F�2ZFBGa�N���bB1�>�,��O(�����{��(�u�"Z�	�9��������j	!(N�<�������a���;�w��t��y[���8���}��=����*�L��i,Mg�gS������
�M����?�e��v\-��Klnf�zuzw����0�,	���%y6K�8&>+�d
{�sK���i:+#p��Lno��f���%�����3�a�+.�b��f��gb}���(\���R{MF��^��03t	1������i���S)m����3�{�R���f�
O��������� (����c� �A0�S������c�������d9���$1�7|�e�N���E]&n���De1��b���m�0���p�gRi�fB��Y��(.L#f8�Zu.���=���}��7������?�]H&4�S�
d���3*�BL�X
[O�������ZgJ{���m|��|�!���}�0�A���LP��Lk�� U��TZ����2��4�Yzt�I�
���.
�A���A�?�m$(MP%&*�dT�C�(h�"��.K
&����]�`
S�b!�&-��
��k�q���43��V���g �i�.a^�����[�?P� l@��Fy�~-���!���yK�z��	��;i_���:\�"���f��,���G��wf��P����<�|8��p ��<;(�?��8l�l�DQ����@g�4�~	@������`:��@R�$��}�9�:�x7_�� ;��]��_4�B}�`�>�����ns�
����
�v��C���B*XTI��;��P\�v�8���@^K���f�12�����@r��|�������hv?�I�R��D�g�z�t��b��|��7�����m��P50Z�Qr��l���<�5��I��!H�|�`e��'�z��yCpB����E:-CsP�#F�����NT��A������^����h��d�\:[ Z�k�&A�XU}�S�����ms�<�'�B�b,n2�@��0�����1������?��%Dv0p�����J����1�^}��v��xzB��}�Bn#������]��p]��7>�fn�=��>��PK������PKZepK(gather_analysis_all/gather_head/21_4.outUX~�Zs:
Z��X�o�6~�_��&�B��Hs��M�mi�%(�=�L��e���6��w�$[���v���E�m��������������������m� 48������q��I���f��P���(�~#�~��r:����Q�T�s;�2c`��T�KAI�-��t���G�.��c.X�#�is�����Cgt��'������w~E�r�Hb��b��s�[����l2De�8�FS.
��;�r�����z����<[.���s{��%�F��$/GI����kLuP-Q ����*����������{�z��'�dFz/����\(����f��|}����8�u�0��6#�h�mQ�	���yZ���,N���R+��]���8`
.5j��[�S�w���;'(�"�:!����T	-0�
a�����m��,���@7I�����
�i4�qQ��;;�2��@;�K@q*�	�J��1�M�0!w����OqR�#a8����G��{���|�-o�tY�����5��4t����� �\$��Q%�45b�h��6jWc�D�J�v
�6���a[��9,�=B9&PYA@5����I��!�Cb�AH%V-�����]5��t��m��Fm���>
7a&�M ����(LQ��$Nm\�9�Q�����0VQR��g�0�w*4	�T���Tq
��p��N�qn#/��;� ��Q6�~�M_�� {��B;s�l�k�$�Vi�n}���|����n���A�b�lY@x�0��%�%List��Exo�HJ�����'�!�s���R;���yT�^�k"
�8K��@@�{��Y����4�<���X�@�[P�w�n��?\��-�i����b8���A}��p�>C������^�H
�63DofH�z}�c� CjX,TI4k��5�.Q*���q���HV=NI���{��!9��->�H'�aEH<y7��!�]��3�K����b�d�?���jT1�
?�F�"N�Q�O����P��k�d�A����y��5�M;������Q�l�]���-F3?h����eX./H���m��W;���K���Y��~X�48s��m9U�4=p����
��'��6c�8���C1�]�#�1�0�.|;`�=�}3u���������������g��W��!2����|�����7��"�C��=��>�PK�c��PKXepK(gather_analysis_all/gather_head/22_1.outUX�Zo:
Z���}k�0����8���B/~�)d[�uK��e�2Fp��8vj�k����Slg��`eL,bI������k|�tvs�����	�8��Gz�$�V�������HAH=u!K���h�G�7�
B]1�h�:����9>!��������p�D�:``eJY������)�y�q�]Aw8��0��f���'��9�Db���q<NH5��S�M�G*[
��2\���N�U8��������9��*����A��kzA�����u��Y6u����3~P7���D�D���lId�"�c��z���&�N'(m=���J`�p+0.<�V�]c���R���Zb��~� ^9��X��
���0��x�)�&t�^�u�����dxd�LGi����V3��T��+��:�����-�[�i1�b~������/_��,7H7j�~S��/�l���m�m���t���J�����P*���_����!��K���]������JUF�U�''��4����[lB�	���Z��-)�����"���?9�R�Q��{(�p�/2���ex8v���/�����T���=�n�#x�F���7U`�s�j����F1*pF�������kfY�m�\��t�k���p(�a��
��?"8�	eUI���0������].�.�(�n=/�����$���9�B4���l�wv��<Vs�w��j�)�k�,�W��^rS�����QK��&V��Jl\+]�t���?���VS8W??yh��^�\
#
E$�0�p6Sa�#��|�`)=���e��F�PKK��
�	PK[epK(gather_analysis_all/gather_head/22_2.outUXu:
Zu:
Z���mk�0���SePRa=���m��-���1��uDc���-�����N��5i+cz���t����2�s����/���t�l[�����~<�
�e��Q�p���
����Qx�f��X�02e�����'�GI@%!u�X�d����p���"	S�����yZ�����!�I��I�]���]��B�&�S������\&��D�b�!u���6���u����E:V����E:U������9�G:/�
��A��9��L���I��"�M�,Ou��T������|,d���5�,Q�I����pa����0��m�����u�`�P5�^���V�p>gH�J<�nM&q���W��81:���D7a�[��,F��_��W������n����T��]�����Vg^�����w;�u��P��A������/_��,�H�z�}���-��@�U�9�yVN�t���t�m��	?W�r�X�Tlv�<1�����*u�Zy�`����������Bs�	�x:��V��E>��F��
��m��f���2�F���3�[�����b�ru�-�\;�G@8JM��8�}���2x�{D�?���+��E�@�5�1�(�r�,:���K>�E���W\�n�V�ha8�5���Z�pl	�w�ls"���7��)�����<Mz
�dV�Z�`z����o��#=�zWo�{���q�W���r�ig�c����[��^e)\'[������[�y��	�h����n��_�Z-#:�p��q�����Jc���c8Tf?:�*��j�PK>d6�	PK�epK(gather_analysis_all/gather_head/22_3.outUX��Z�:
Z���mk�0���SeP\aI�)d[�uK��e�2Fp��8vj�k����Slg��`eLo|���w��_x������\��W��%�q�z�$Vf��C������&t!����x�F���#U�	�x";���P��D��$���l��E��$L���KU�i/�O'��q&�w	��0��P�������p�#�J���\�#OHl�i�N�&T#����t��U����8����|�5]5>g�X�hAR9h��{N/,�h�c���s#��n���*bus��.d��&��B&�L�V>�m���R��Be��	�����m/����`�{A��������d��AM&p�s��q�d����a�{��4F�W��<*��B�q:<�s��,��v[����Z�jup���?�cq���b���������j��������9������l��1�5�c�g�t������-�~�H�����C.K����X$��r�^W�n#�*��4�m�������������FG�l���"�z�N-�
��m.�e���2�����K�[���������2]��,v����MU��8�}���e���� ��W�jl�]�aCEv_3�Nl���}����������qq���#,�s����[��0���?���].d.�Q��z^�9�i2�d-��x0�g����y,�Z���j�m��\�B��y���-B����Q+��V���R<�f�����/���V�S8�
?El��_�Z5#:�v>L
�f2*U��V_��q��v�G�U���PK|H3��	PKYepK(gather_analysis_all/gather_head/22_4.outUX�Zq:
Z��Ukk�0��_q)�:�
K�3�B��[���Z�(c���c���&���~�yV��E���s���x������\�W�@;����h)=���Q��"'�9:����G�1�I�e��P�A*���m9.��OHsjb)��w4��"��u$@AO���$��/��S��(	�����x��q�D��y�118!��^E��y�M<�R�6<�*R�	�D���(m�����t���t��������Y�S�fP��Q��6� O����u���l����T'v�7F\���<��fo)dE�"�[��!6Z&���3��9i�$�Y����1��
��6������0��6�[�U�<��H����0Cu��O������k����>Si���
��D&���*�PP��Z��~��Z7
�f��7�y\����r���F��71��e���\�9Zs-S��7�JM�6���P��|FZ:,�9�R�U��[ebx/��uV�6���<�P�������(��&h�m����}�W;y�uv��L�7����j"Q{yN���K�[����rk�r��(�Jd
;;@��OT�d�����S��#�%6��������O�v�1��|��i;�M�+:��X(�*0�M����X9�����}����O���c�R!s������"�|H�eEk�!X�F��LG8���S�,������$8���5��-/�b���
�"zoa�W2����M�(���7�[��\(|~�.�Z�^�Q8F8�2
�"�U��VP����O�U�v4�PK���w�	PKnepK'gather_analysis_all/gather_head/2_1.outUX�Z�:
Z��WMo�6��W�R�6�DQR ���v�4M=�C�BI+��I��;I������Cy�D���7oG����vz�7�:?�|�����9F�<z�
���$/�W~ ����(K���}����1
���y�W��5�(_`N������J��&#���g�n��GX#Am�z�����}��G(_��*��g�|^����7gPQ�x�����:3��Y��Y����[G_�b���"I/�R�i^Ex������t��D�Bg�~O��� $������^��W���8H3�b0�z�0�?Kbpy�W����4w��*��V���nG�s5�Y�]���I��kp]@_�=�{n�9��
%l���Rg6��`T>�,<)r���X�@S��Rn�O�����W�=�,GeL����u��:�/�,,c�
�p��+�)��O�fx��(^T���eV��zq!C�P.B�@`F��<��.�L
�=�0l��
j��v�kF���8������@�����.?�}����	:��|���?�OU7��8�L���`[9���� 25���b2b������R?�����w2	���FOz�e��_����)��)E�e������<��@y����a��r��H	^��D���+���3�0���M�<�]O�*��JLs4(����(`Rk��� `V!�����h�a}�dzFT�3�6��O�+��=�������q����;Gt�".G7�j������0:��^�q5-�K��0{����d�����Id!n5���R=�
#F���ct�*���f�'��f�bp��2�����V�o9����V�(>�j��e�T��GC����@��8����������0�DV����<��_�����_%B"o�l��V�h';���Q�N���b%u��<\�����E�{B���kU)�|�3���oa�l4�{���Z�[�=r
Fe�^*l�U3*��>���1	��r����	����#��!J`��%^�����v
�6��?i��3�������zmF������_	������5m����,�_�=E��*�,����[���M~p����T\���U�J��Tz�)M>%}K�r��?�"?B�������/������6:�]Mh7����C��&.��w�lj��6gw7�f��n���������Y7
��v���{{*����zK��h�`_o
5������2��	>�z�Gh����*2��l�u T��X�����_PK�3���PKiepK'gather_analysis_all/gather_head/2_2.outUX�Z�:
Z��W]o�6}����h~��d@�z[�$K���!l!��Jr����}�")[v��y_,�y���^!������otyvr�&�����r��Y����iQ<�C��q}���{qL����\��M�8���h���A30�e(06�>u��Vi���d���u������M�A�nfT�}ROG�Xg�*V9.f�|^�������P�q��d�D���<� ��g3�s��ZK��r���2�/�REYQ!<Wi����������
�����AH&=�����=�
|�1�a��� ��2�z�0�l~�&��	WH5|��ig�*���U�r��{�"�I#�������U� :�,qa�+ ���C�O�%9��������\����A�	@���b�Y�<)s����pc`���>�wRf��4�Wy�4�D{b���h���{&jL�Q�Vj�>&w�]���u'�*�Lg%M��B������8��.yh�d�����TP����:��q6+�259:*�c�~�@�\|<�4=;0�&����aV��(�?
�y����?��>l;��TB��������0�Fl���[7\b�# ��(��f*�d�6~�S,{^q>�u���z^�m�hC�PE	4�AdZ� {H/	��'�20N����~��g�/�F�ua��X�&q��n$D'�2�=���C89�D�L,`
����4��$"tS��nF�
��$W�0�j�q�y8x��F���e��Q_�����@�|Q�<sD�.�WtQ�f��.3j������7W]N�Kn��2�_��.3�a��3d�f�[�a����x��5{��t�� �{���df����=n���P��l���V�L��
�>�h�&d�T6F#���� ��8����:�m�hd�Y�N���<i
{UFe��E]���
���1���G���\�u���,P�QY�9��h��5���@�15�U��������D����R�I���R�n�c��Z�������N>N��)	��3G'|:P����Dz�\v�x�M�5��mM����)�qC�����k�6����������D������QB�������" �"��fx���W�]����lw�w	��^����#��JO>}�B�U�����^����	��J�2���E7E�P���jB�8z���cp���������m��nx%.���n����	�����Y7
�3�v�o����i9{}��h�`_o
5�����t
O�$�9/�C1B�G5_W��?�R��r���~LF�PK��:��PK�epK'gather_analysis_all/gather_head/2_3.outUXx�Z�:
Z��X]o�6}����h~��d@�z[�$����!l!��Jr����}W")[v��y_$�"y����WF�-��7���������L�F���6Bg�CT 4^$yq�������EY�-?��[tW�Nh�&h.�M�FE���G�
�@`ln��B�:IR��������N��u����_��j���i��M��#��|.�-��qv����"Jb���A��}�t�aV�t^^��Sk�s]���)*����t���s��d�����o����l���I;�d����[��<�>S�1�:����L�"L3�Hby��N*�%|����3C�/���������{*�T�����:��t���0��U�������S�P��kX�tfX�`�	���^��gY��t�q�"N`
��0�N9�O�����W���,G%�X{��{g�&^��Xb��p��k�)����z|m�(^V;1O�]I���A��.B0�7FM�<��>��*(��/��Z������<���)���������t���������������y��a�����!
��@�����BR	��@�B���9lA��Fl���'o8��[)YG9�G2�gt����d90����M��oZD�yJ�]�hC��y4�Ad[( ��I���`JJ��f�Gq"�����2���X�&q��n$Y�zv��E�L,`
��z 5���DH����Mk�a}�dz	7�T�=����g?h����������	����3Gt�"�.J��/������:a�������3���t�
�N��L�X����l3����ZR��#f8
�w_���Q�=�]M27������[i(!T��j'	EO��({/l�������fB�EEnc4�������K�3����c�|l�+d�����S�?��Q�OueB�70���@��mgg��6��S2p�\B��E]�0<
���k&)��&|cj~+K������u�U�0���fYK�o���sNV��jZ����S:�}J���Zz�����}�pY�H���=�^�k�aM�|W����)o��? jj]��Q��Ty�sa�W�`"��N���UB�������$ &	H+	<p<�Zv�|�?8���]*��xd�BR4+������#�k
�����.�)��	x�2)]
�n�p���W�vq��
�3��^gDOQc.��\�@-Z���o���7?��9�C�Y�5��r���b��[���eiQ^^][������r�Y]W�%�2�)������|�f�z����_h��W�������&���PK�[����PKeepK'gather_analysis_all/gather_head/2_4.outUX�Z�:
Z��WMo�6��W�R�6�DIR ����M�4AP�d(aq$�$�&=�o�H$%�������X�)r����'��b�y;��]��]���hr��t>�9B�e����\_b�c\]�4���R�-�����g34
��6��<|T��p�aJ$���9�B�&���M����������v��7=��>���m�d�g�`���`�����P�a�h����<� 	�'���A=���P�:�?Ay�_��
��Dx��vu����������=#���L:s�������/�R��9�����d=D��7�G��	��H�8����
e����\�f	�FI�s����l��6A�����5��������=7�����m��Z��!RN��1.u|,m�9RZ�8i'0�c�c.=;�%� ez��*�P�)R���?�<�F������ 
6�A��{��n���6�U��ERd%���B������s�:Cp	f���fL��TP���G���4Y��������)G�����.?}�<??��f���#��"�Q��u����VP������rI%���/.�b�}d���+�}���aY�! %�z�I�{�j=�i�=�X���
�M������L��_�,��!����A����x��Z�Q���8~M'�O<�F0X����E@o��FB|^�mH�-c��QQ�K!����=�A�W�$����w�����I��\hQ��8��^}�P22v��{��/�':�6_T6�����%]��3�K����<a������Ws���d����}�!��6C�4�J�����Q��{XEw����ia=�B�0�}�jC	�����iB�-/}v�R�����aw�����L���w�U>��	�b[2�s,����ab
����������u,Ou)B#�70��!��G���\������V��A^���H�8�&�=���R����9�7����� [7�)����-u��=r������F��G��-[��U<[K/���=�e�"]\��2�I���bM[�[����Y�3wD��izu�z4����0��0��*�I[��Jh���������j5�C8�~%�]^�'���%���
�,UH��S���+K8�|�Qy�]>�<;A�u�9|������,X)p�C&������X8�7�X�=�w
���[3��
|"�=#���y��2�Z{��Y�m��6���t��x}��h�`���l���#��W�$a�A�����r["��0"��=�T��B���d�/PKC�����PKmepK'gather_analysis_all/gather_head/3_1.outUX�Z�:
Z��WKs�6��W�f�Sa�&��2�v����r=��<��8�H�����K��iYr(?.�qw�}�zc���t�7�<����`�?�:�W�Che��0EX�8��Z�<�^L(A���[L�>F�0r�0A.^�	'<��0�q����I�l
��P7z��U��qG5'�k��b����d�O&v��Ry����������$Nm��
'7���tfg�<�,�	
���x�nV6�u>��E�mRw�~=���3������T�Y����/�-���l=�@����/v��p��'�*%Cgq&��}�����|��9�mH��q)V%iBib�i�P���"
�`�kE�T����T'��u�� /��.:{�"^�Y��m�%���[����Ar���X�.H��T��Q<�B�$�"0{Az�+Hy�.�0M-\�x���p�F�'>,!^�����{o��`e�1���@c^��JH��ZFW`��XR�E�Qc�{����n��M����Banvaf{0KRb�����Z��t/�^����!�xjj��M-���f�J�O|�!2�c�9�&�a�1B�6Y�s��l�>69��l���),��,N��TL3�0��F�*�V�WpR�	|�]'��������:��,��A�&[Q
{�m��)L��+�
]Ea�������	dth
�eB
�4c#[T�R���H]�(�:��_ez,gq�l^RQ�_���:j���#�����N�hh
������w�Z
��C�.��M
s	�P��e������;��Qm�p�,E�T�����&ZZW@��Fph��$t��`�6����pnaCj�D��IA7��2
�B�Cy�P�D���rJ�
^���&Q�)���r����f�Y^JBNb�
��Lm�y�-C4`�c
��3Ug�P<��f,�K��at�k�[=A@^D������-W�������J��I�A9#{8'0%��=�d���0����n=����A��������&]�0k�����^�"�xoy�uLC9~����&�R�N�k���){�_5J�3�
l��^Z�;��b�����mJ�-��#�z@���@G�7�xen���2�Y-]a�+�:(mGg����^OO!J���E�������*0����y�1Q��:�����@����_���a�*8�PK%N�FPK�epK'gather_analysis_all/gather_head/3_2.outUX%�Z�:
Z��WKo�6��W���	r�6�6����i���)���-�����M��;�,E��Y;�K�D���o�$���������������Dz�����Y�HJB�qV�#�hj��Z�<�V�8#��q9I{L�Q\��9)��	����kJ��k���l���R7xC�U��0��`���Q
����T�6�V��TV�wC�/V�~��>)���o�m���/�$��'�?�G�a�Zx� ��t�q�J�c������I��}^���Z���o���i6�2[���G��b��_d9�����H`�}�F��_O>��j�v2���6�qB)�:�&�eNS)�#�qd���(���R���F�-��R����CGy������+��������T�~������|��� ��VwA���Rq���PNm�����!�� �(M=:N>��Y�J��#�Q^&x���VJ������f���XF��<�<Q������������B7�b-���m�;0����N���A���{����,�����|]%��a��t+�.�����:�$�t�}t������N��!�{���Gg��Q1%�eI�D�(�����T����[#
uP'3�Jc���Tw{����%H�h#Jqoo�����~N��WrG)��6>0�����=�Y���mP�������X�`����<��>��>TT�R��!GH�01��;l���F.1t��c���5���!�r��7��I��v��E��0���q�>��=[l�]6AkO��d�|Y`���!��������;�Ir]D��*����V
�����LR��A
L!��7�(A�4��M�6��UQ�yVN�4n6�d%r���hI>����Xm�e��rD���p�����Y-v�X!�Q)N2�����d(����K�!�Z���!���������y��|H�-��� 
<]e��O8���H��5����A���W�����9B�A�����W/��Zj����:&����4lV54����p�Y�TLJ�z�����3|
X���G���k�x��n�l����9��0��k0i�sSK�.����,�d��K������O^_�b�|Y��(��]=$�1���U�N*�C"(��,�9����L���?��R�����������PKQ��S�FPKfepK'gather_analysis_all/gather_head/3_3.outUX��Z�:
Z��W�n�F}�W����Z��"T��N�:�+�(�d0�B"D�
�J�|}�W���H��t��k����3g����u{>�]_��z�H�7�Y]F��#�����"L3�5��]+����%�[4���0���~��G7��3�5�7K����t	��P7x��M��	G
'[[U�"�g1�&&���2���~���j���Q�"�8�sk����,�(t��?�'�a�Z8�@���$��t��S0�����_Q�M\����\L���~r~�N�������`��|r�4��05?�)��� ��;�>KW���i����!�J��dX�	e�UX��H#�,�)Hc�b�k�SZ��UV����CG	y��w����g��M���o;.����e@H6�
������.H��DHb�Z�$W��� +�'HY��� I<�x�w��*	g����xd>��}QV��pe*�e�`.�l�RK��#�-�*�x�����X������,tK�!f��P��]��n�������B�W���0��md���w"5Z�e���2@#�����;D����J���Sr�Ov�O��1�������C|��!�g��4J��T�0�0��FA4�X����e �i�>�����&ge����	���HR4��R�;�w-MA���/�&�&m54|L���I����j����-*�P�KF�v[��63FK������(�.+�(��(I���r@��h����n�p54����&��}�Et�����I���
�vY�"K�������������z�l2�4{"g�p�|��Z	
;���;�)t�S�H&���t��*����|,B	
��j�,L��%h��[Zm���WuU�R�q�l>)JA�Y��}�����*b���c
��3UW�P����g0�P)F	�ksP�gqT��A�����MCD�=���VRt���C��tc3Ow�y��0����&�
H���@���%���|���`�UV$y�z���������-dBX�A�1nV4i���t�-bBP�����C������|	�������6%������9�8�S���/�Mem��s����%����v|�����9d��e8�g���D����W}RE�������E�C�k�|�V��/zM1���o����^�PK�zF:�GPKpepK'gather_analysis_all/gather_head/3_4.outUX�Z�:
Z��W[o�6~���[�a&x�s��K�ni�9�=�L��m���6����()�g�v./;C���������u}:�]���xmK�7�I�'��#������"L3�5��]+�g��%�[2���0���~-�O�n��U
�0����gY���q��A�*�w�����
G����M��4b�� ��?�������_$�K�[�����.���*Ob�~A�#z4����
4@�O'Ig�����O�����|��g7�by�a����l2D>[
.��E��b��[f9����������$��w����z�v:���6�Q�����4���B�E���T�(�)V����)����T7	�u�� ���6:{�,Y�Y���M�%��������Ar�����.H��R� ���Y���Z�d�2���r������;��i<{���e��.����+5�T*�����3�RK�v��V`�e	�0H��A��5V��+;w��n�?���
s�
3���X��6X�3��*�� �e���w,5���:����L�n%�%!����3��I����&�d{�T��|/�T�}t���!*f��,I��T�0�0����
E�
Nj�bX@dCpj!0�ybp���.KfK���F�������-n���tG)��6>��
�����P�5�X��p��
��Md���-�:�� g��������2%�t���F�u������B�����w�Z
W�B+���IK-�C�m�u/�D
*k�(�oI�)����9�����������4+���(:�<�/��M���"�:��F2�Om���OTYE�P��(���`M�K���Zm�����*WL�8n6�d�$�$��h�>����X��2D�_sp34r)a��m�E�Q|W��2�4���"���yEA�B$��\��C�����+C'����[�9�)f������������6):x<�}i`��[=������*�zqG	0^��uLK9~��U�&��E��t��q��,nV�!����3x
	��G����e��t[dSF���^rA�-�/�M��]�7��/�t�R����z|�����)D��U<��g���C� Ti�/�$�V�a3hY������>R�/��#�����_�
�{�PK�og�FPKkepK'gather_analysis_all/gather_head/4_1.outUX�Z�:
Z��V[o�0~�W�����$v�F�vk���V���*X$1K�J��wr��M*m��w.��o��/�����-�����������c[�1�e�(�!�T_Ano�1�3us�@?��*J��T��pe�43�nH�]�����NL%�N�inW���,�����g+�H�����g6g�l�������v6��� ���,scs�nLK������t>���k��t7D�	��G�
'�b=~�|���"��LOG����D�,�o5))\D�3�c���$#dIF(&�
^�;	%��H��7x
���WK�����n;�P)�7���J6��x�DA2 �kX2��@����W���L���0�^�xQT��tjs��$b_�sQ����7�=����M0t��;y�u��u�&1�{+��\NOa�Q�+��bM�&c�W��U>ae��8�������=���Y� ��
aVh�$:�#��h	�4~]�Q6kr�1���i,��f^s�B�m��-`W�0����G�}_�l=.5v�xl�A���2��]�`��RbH���w�Q�:�������}k�P:�>h_�-\�.�F}�*pg3=����(��-#�k�b��������4���M��>f��I��4���bKd������Q!�h��N}���]=i�p�	����x�����>RsU�lV	��L���B��J���bI��n���������~6�je;+�D���x����>OL�t^��m��\���u�cm��������S�����{��W�7����
*'!-zp������~"$fC��a��J�A��PK�w}�.~PKXepK'gather_analysis_all/gather_head/4_2.outUX�Zp:
Z��V�n�:��+fW{�)>��@��w�
���B�	��$�Jt���;�#I]'QZ��iixx���H`�����g8;><����qL���.Ls��E"�6�(NH;�D���f���[����4��6+ ��.T�FiB�����~���	�s��������������jU�U���
3$T����zo�o�n���;{5_/m����lj�k�nB#�WY��5��zuC�����)rE(�E��r�n|����Y�U�]�A�q�m��boH�p���!����Q�TQ�4L�>d7��h�Z-�)����������j8�u��XCD�����4���N9I�aZy]�(Z%0�FK���o�=���c����|���Nl�kDdRK�>��K7��4�R9
��A���Qh�4E5��Gq3������6�.����t���j((��0�����D���I3��XNm1rI�����*�vxB����Pfxm��<�*�U_k��j�&������~_%%�����=dY���D54��������&_�s_aR���,cWy��'�y@�������O��_��a)�������o�r�Z0���*�����[�
��2{�V,L�p:v}LJ��7��^^'�-/;�--}j(�f\1�����J�>/�i�&I��~2�A�R��`�t|���DSo��m\�j
-\e]�e�am�����w���!��\�;�bLD����9;��_�,n�-vxm4�KW[�x�/������R���M��QDyxK�
����kt6������9p�Se3��6��ww�	�oN�Z6�L~PK���+~PKcepK'gather_analysis_all/gather_head/4_3.outUX��Z�:
Z��V�n�0��+�V���(.F] i�5	�E�S���CX]�n���hKR���C�v X�4||�f�"�_��_��������@���6�#��0N}f\(m(Q!�Pa��W������K���8I�:� ���d$)�"MH?�'d���>�:{���"����w�_����.�`��m�!�h��k�wk���n�Y>��)�rn����f�*�/]��w���K��-����'F�:�>N����)����B����\"=8�����)��<��u�^nu�C8M����cr�:��`�a��x��\Q��	�U}�����������a;cm�# ��H3�i���Is������HI-v0o�!�l���p�1M?�.]VMH�6�%"���-��h�
t��a� 46M3���� �hZ��|���[;�#�2����G��������z�b Z��'�Q��g�Q*�w�X��a�vh�e6��d�-���I��j�!Wl9N�bTH��u��F�K���k�i�����P�!�����&]�k_`R���
����j/�1r��=�^L���	������^�n1����B��o�|��`����n}a'�6d�e
���>��PC��t��P���]6�����zL�J�qGaDw3F�R�K�f^��D�����#)�2<m-�17���\W�X4�f��.�����7����{uc������U��b�6���@��`E�2��7������o]l���>�CW�yvQ���U�x/bx���������;t6j���(S��CG^�������P���T!d�r�[�&��/PKg��(PKUepK'gather_analysis_all/gather_head/4_4.outUX�Zj:
Z��VMo�8��W���!)�C4�I�_�I�6(��
U&�����m�_�C�JR���C��0l�>�yoH
���_}�����`r�'�	���z�i�Z�H��4%*%��J�54�[�`��.�F�`�~���me���(.�=9L(����l���xm����
�i�vs�Z5f�{�m�4���Nn�N����nQ�����5K���}�:��������@�M��Lg�Y=�0��$�a�"#�����3��}���5���2�k��(�4��������E�x�<�[�d�
�H�$������T�E�^�'����ry���9\���g�iM�S�U$��i��&4��4eDe4��L�����x2�{s���rh��[[��.�3S�Y�i}�Do7��a���d�������BC�Qh�(�����B�����%���pi*;[���XYWO2�$���Ie���DB4��	 2qh#������)���*��[�_�E^��c�����:F�$,��*)5v,�C����e#ZgqR�q�	d����iS�������4n�e8U^,��Z����g�y����_�C��!L`P:�>��>�6����]���f6�a'�I��Z�`�Q��N�4;?��xW/���vy�'���%ZC	����T1�
7�R)}����h��;!v�9HU�/��[��w�&�zO�mk�U'hikc��zks�����!"�.
U�vTE4���y����Z�w�;���|mKo��p�sE��/�)��t��oEL�c�������-:���
A� �R�
U;�W���z��OK5QJ���Du��&��PK��.~PKVepK'gather_analysis_all/gather_head/5_1.outUX�Zk:
Z��XYo�F~���7������s�����>���K�BR��_�Y.IQ��P>
��Z����f��Y!�����7��O�gW��e���'G�2^�B�q�#+��K�q3DY�-Q�-���4'h��U4GE�p#����p�q=���y�.����;}��M���J�}L��.f���}�� �m���C4�W��`'..��o�}����,�x��/hpL���d�p0�NQk�$���*)N@��77[|t�,�Q�.O���E�����[������z�G�m�D���C��t�<�N37�
����T6�p����Va�|� ���$*�4��m-��:������l��]z�9��A�E)�~q�([��i�?���e9�4����b���h��g{z�>EY�r]�0�Db��Fk����
"����)���t��0�0�l[��[m�5X���oT�:�U]Z��d�:����9�P�K� ���mCkc�R_W��n3��$��O]�����dV���=�����d}���p$�s$����>��G����IbLk1N�1� ���b��&�1��F<.���z]�	X�A��j��s����<v���tv��X�E
����!Hw�7�'�����J�0�Xq�8X]�:K�e��J(��`���0��W�X3C
9�]�Z����e�r����;���Y��������n�Q�����52� )�'�D��F��
l��*,c����8l��Gb���v�TS
�I3\�L�J7�p��J�6�za�_	�������Q"$�D�8?S�5��GS7D��'��`��a
���9��{&<�k,�z6<����	\��!�[i^M>�.�y\,�%z�^���+a���@GL5��bYuB*ax�:���a>�}F(�
�����5B=VA������*�'��y5B��ZqJ(�AS�g+�l���\&�2D���������K�J��(mC?�����.)A?��P����x#)�������g��}�:o��^BY�P�!�z �W��l(G/�� ������i�h���5"�^�h^:I�B�J��FU���:������������`M����*�������i�sG��z!�����]��Z�G}�f�r-�E*H��n��.�&��c�!��io!�m��)��`�g��~LV�6�������V�9������kO��z���p���K0k�����%>��$>km�1��)��fz�6d�/�~!�L��w���A����e��Y�t�D=>{{������r<����,�&e��
N���Q��w����O����8��5?��m��'*�Z���8��"n��B�O*�Q?�^2��l����*q��<�u[������M5��b�;(^P�t�����=����"�{��Rqx�N��'��4>�2���r��&��s�p�[@s���h��7�n�*C1|��']~;������PK)+�T��PKWepK'gather_analysis_all/gather_head/5_2.outUX�Zn:
Z��Xmo�6��_�oq��;Ec.��}K�.A1�S���-��\In���EQ�������v0l���w��=<
����>�����8����������d����YQN�4�,)�����bB��dV.&2:A�xZn�%*���0.��Z1��aPYf�~O���g]g�����c�{{��^����T��i������F�e����+���w�Mgv����E���1=�����:E�����f��<A/^^?����-�l��l}z�6^����*�ao&o��*�WI/�o�����l>��<.m������p���
0Ly�0^��=>i\&Y���4^��u���q��9�������!��0�EX)Uz�\��^���[���2NS����.�M:]X�{�e�u�"r.�\������`�Ca�`b7���aa��d���v
�G����t���MBEe:h�P���z��e[z�}h)����I�������z������H�#3����#��H���L�h�#y+��e���7q�@dIR�iM�����`�s0�(U�aQ�b�P�iM1/�]��<�!�b�^���h��h��a��]�76E�����O�{������t`Ee�����S���3���*�TP���P���1�,���}�Z�e����jX����C���ri���~B��8EY���R%��0���`L�T����J<-�z���a�3hkq�K�"TL]���6dJ��	�@���a�#��;��|3��e1F��Hj3q	G��	e�>�����Z�"�X�s���Da
�S��@��7N�G��K;���u��s�U���O���9O�U�Fo��k�������[����ZQ,k��T"��u�������X	#��h����z��\YH��m��u��\��l���1��P�@S�fk�l���Z&�2D����z�IQ��5x	G���|C������.�@?_fp�����&R2���>���p1�k"�����E����������E�
jp����������}�N,�;!����!TuR��pRB����n��c�x !c)	������M�k2 ������9t������j�B��8{Ww�V���-^��U�~��7�.~	w#-8����B~�;d|��_`c���u�O0`�an%��u�V�C�z �{�0�������������G�
�H?%>[k�9��9��n�j�}�&��L���y�t6{y�,K�C����
�������g����E�?��W�!�(��3�������8?#.>��~S$��:�=��x9�������|/�����]��d��������NA�o'��m�IH�#�.�������n����vt~����bT$#��
?���8� �-��B���3-:�^r�lt�@�m��Uq�^����JD���3���������PK�g:�~PKwepK'gather_analysis_all/gather_head/5_3.outUX��Z�:
Z��X[o�6~���[�!x�h��4�li�%(�=�M�Bl��eM��w(J��D�����&hR<�;��P^��ty��xuv�/��N�wr���U�#4�&Y>1�hC��7C�&_�	E_�Y������i^�K�G+;a\)�57�z�2I��{r���B�6I���>&t��E�:�Ky������b5-��F�]���}�m<��uM-�����x+�����l�q~�^_�^�����d6Fy�>�F����v��p6����-�7Q.���M�b}6��v��gS��M`��Z50J3�G>���t�O�Q��.Wvw�S�m�/l
6��.=�yu&%d��LbfX�(�����?���������b�������=�a�4�@�.W�Db����s���a��+��,P���s��%����[������U���P�(�8/U���efe����	��6�6�!�������$��x�OD�?~$�x$��Hf��G�^�Tj�#y�Y�K��wa�@�%Q\S�iM���8�Ic��(D`���,�T���R�u��`�V������~C�F���8�'O���k0���>��y�>Z�L����+*���
�W�W��PH��+�P;��%X����X��d0Z�1kA��3�z�a��fn�	Uf
�K�D�����1J�������@RpU�4L0&P�������Fhlj�1�����%.���w�F=��L�J7��L�
�[3��F���.9/��6���H�:s(qn�k�)�v���"��{�hs���P�c �L��Q*U�������sC\����0�&�q���<�W���N���?��JDP8hY�e�5�D����>&D�q��.,�m�B�V^n,�fz���*�g��y5A�N?%>���*c���Z.�~"d\~`��$������S��m�{��i�KJ���	��1��DJ�5��[�}�������[�����^B9�)�������^�1��a�����_��n��4�B���KB��L�b��E�U�7X�_�!�XJ�z5�=�i'f�5���O*9�+;:^�����j��f�e�.�D���>;x��cA��p#����*��@�������>����E`����d�_�&��(W���[�Pi���=�7��}[0��{�]�Y|ho|�b����!��h��`j���m�*j��xL;��BB���]�����h���6-/���gon�_�A����.��`G������������5�t��y�8���}�E����>�y�D^�s���{4�u����Gd�������U��fL*
�t39�n[���48
W��"��aW��2��j����`iRS�e�=�.���n�/��x������-���|����v�m��F��]>�iQ���/�[�(�I�����?PKFw���~PK[epK'gather_analysis_all/gather_head/5_4.outUX�Zv:
Z��X[o�6~���[�!x'e�����A����6a�%W�5����(���R���I�s�sB��?>���}�<���������h���I�O��`I1n�(M�e��E�|1����i^�K�G+;a\��0n���e������;}��M��c��=&��ox��xy*��n���(+V��2�m��^�����3;[����_�����q����NQ��,��I�'@�����m�Hfc�'��+���:+��hWI
���;�\q(���p���]����<��0�-`80�
0<�90�0�
i�!���#�x*�{|�0����q��������|aS�)�w�i�YeB��9H��rN��$��g���4C��a[@\�?w�t���S�S���e
�Tt��Rk���a
E��i(��D�mS���eW���j�-���x8��*��#�K	aX���S]��z��iz�uh]�C�kM����F���E�	�x$h9����#���<R�`�#y/))��H�]��l�~K��v1�5��O�s1������b�
E0�e�N�4�i.����"�A��������&����MqVo������F��	$|<��N��|�}�[��(�����+��U�(��J%�rY�L%,��2�f�2���� K��M=�2�I�7�C�Y���.����n�a�����52�t ) �%D�	A`��aa���%��-���T��q��������6�"TL�;i��)]�&�#&����p���F���.:/�w6���a�V��P��N��s��i����{�h�0���p�P�������R�bxxj�s��.��/����|/�y���5z�^���o��0D@�|@S
������J^������{JKX���t���������5y�����|^M�1
qJ(|�)v�U�l���<&�1D����y�I���5h	���K���$���h���/(ecd�����k�����������k�����^B���c&&;6�r��Z�����%�&a)�
`�\��+��"�P����0�@[���4�/��d,%a�i��Q|���`M����)d�6A�S;5w�[���n�^�w9�Z���v�r-��G��p#�t�g5�(!1�ssy�<��'|; ���w=JV��n�!�ew~�L�C������q�^CKr�\A��3t	jm��}��F��4��g#m�S;��@�y_V��~�*w�P����2{z-s�B�����z|�����d�/��"L��4�e��
F�����-�s���U��,��e�������%*�����8�w<���BEi=*��=��V�o�1�\�����m=#	����_T����]-.�9O�����������"�{��bx�
���v���g�0g���%��Fw�9(���#���N���?�K�5����HzN���PK��u��~PKSepK'gather_analysis_all/gather_head/6_1.outUXe:
Ze:
Z��T]k�0}���oMb$K�d�����P��1�T�-[�l�f������M�>��]�}���9��o_������b~������#X�B[�qZ�vFg1
�!�P������zF�	�ej�����	�I�(�!6�e�u���i:���������jU�����a�=lx
��Q����R���O�R��T���`��j����4Fe	��:�1��d��������	���]*^�-����]�T7v4�2�U��r���|M���@���V7�F�H������^1�$t��	���NO0�y�����	�
�.��3����8I����^�m������c:E���$�O'0�}�]�]����v�vO����:�v]6�%
��9�%�uZ6�z0 |\�ld_��Hc���qC�`LS�J���c��_�e��P;'����(<�������dj��Og���������;�s�p���!a��H��GJ9��>v	DC���9���?af�+�f�%�(���T�X�|�*��0@��1&�t���7PK����PKPepK'gather_analysis_all/gather_head/6_2.outUX�Z`:
Z��T]k�0}���oM"$K�S����P��1�T�-[�l�f������I�>��]�}�����$���o?n�~�b>�}c`o���sU(0J��$,
y��#4����N<�����a$R���*dS�h��3>�e����g5�\|RZ��Q�l���R�zX����`�p���Y����;�(���Y@?$ ��?����Z���E.��Y����ht�:���/De��qB�`��.�^�
����6���n�hOx�<�9\+S�
|���=J
��RY�(��FR���a��A1n%t&�Y����,A�	"�`p�_`����b�k�Pj+�h���+���pW	\���M0��"���1�n?��m�m����]��Q��(6��Y��iI�c���T���6#L��m��
���2��,�n
Y�t|v����y��������S��
9{���Lnw��l{?x��7���g����H���S�I��}��Gw���0a��gO�qt<����U�����]y�����L��]%
1�����t��=�/PK'C���PKjepK'gather_analysis_all/gather_head/6_3.outUX��Z�:
Z��TMo�@��+��`���F!���������h+{eX\X7�������s��C:B0bfxo���m�����_���n����?��\�������/x��;%�;�����)<���b��a$[���U,D(\O�;d�@V|��tr�I��G���TKi��a�������,�J��>�
���g���t�
'��Y�kUV���1*���c.k��N�4���y��-�/�K���1�^��6{�[p��'<d�������(�}O���@���V�7F%h$k�1�C��^�%t��}�#�H��U��������]�)3Y������Qv_��&m�����N���P�.��y;���G�����
�. $j/lo(VV��oWEm[��X��WR]%EmlF\B��m��
�ki��\�GS������2[���"YW�-�3����j|���jR���������^��Ld�g�Q��si��<����Hr�#�J���0%���gO�qt<�����@�e��������Jj����"��r/l�#������PKf�S�PK�epK'gather_analysis_all/gather_head/6_4.outUX!�Z�:
Z��T]o�0}�W��&�l���J�t�>���:M��*��
���������M�>L��v�����s���m|�vs������'��R[�QV56�g�)FhH	��C�x��]�4�Hf��X]�TP&�BC6U�q���i:����,����rY�����a��������v��g��
��Gg��;�0d�;����^���E!�Qy�Ks��lu��s_��j���n�a�>��FQ�Y�F�=������,
U������'�n_3i�2Ph��U���A�3�zn��q�S�9���s�� 1�V����a���X�WX�q�����U�I:��Jor/�U
d:�&���"I��1�n���m�]�����\���XYn}��Z������%�MV��z0�09.\v�/�l������a�`L[�Zg��kv�_U�nP['bJcFhxn��������v�������!vf:��lR�|�!�	C�0�;#�s)�u�h��)����xj����O��}
�Yv@	����	�f���j�C_4����8"��n�PKax6"�PKPepK'gather_analysis_all/gather_head/7_1.outUX�Z_:
Z��Xmo�6��_�o��� )�K�dk�lI��+�}
T�����+�k�_��(��"�r���AD������wG#�}��O���A�Wg7�y#tt���#t����(��S�1LaC0�c�Ci�5;��k4�'��
� ��p��hfO���L���Z2M�9<�G��[��Gq8��Yt�&����Cj��v����X�R!�o�A,mm6x)6G���S������bVc0�~�����
�7''�}���i�(�3<��&�����`C���9��O�E^X��%�������M��M����L���Fnp�ql���m�e��m/'��M3t;
���O��q.���u��������Pj�|��SI���g()��>,)����'����&0�4��<�Lz��R�0����"0�R.#L�-Px)��x�rma
���"=fF�v��`�D=�o��9��IeD�^�he�"��2�C� D�R�q3P[0s\�f�[�!��hL���V��D>R� �8>Jc]�p�{��Ka�/In��,I��F��f8����Z��>��6��A�x��������_�7g�����$L����W4���uywv��0D�����-*Z6�|2������������Bn��m*��.��@��(cag�*+�4b
A�KX$�+]�s�%f�H{��w���}��h�g�����:N�v�>�/��(�Q�j�
(�P$�	5u6�#t	C��<��C��sR�A���/K�.+z����cB��S�[���i?�i��������M�UQ	�N�`���VD�%��AY�%�
��Q�0��z�f���|1z�y�|	4��a���\%M�),���mN���@������o��T�x
�J��X���N/�[e��{�u�����u���w�g��},�������2ng>~��]�g���*��(��s����
�*�5���
v�$mf FQ���������}��pF�U��=?|�-\�b�]
0��	4*'�>�o�J���KV������F:?�Wh���;g�s=�_���;t�������{���f�Tj�����������u���p�V����B�m��R�^N��ms�U��-Mu����
T�Dwv���Jhw�V�]���s/�=N���Y�C�F�:�Q�rG����X'| ����m��@4��7�' �t�^(��N��`h2��"J�M5Z	��#���-Wb�rT��IF�f�J�w��0��
#�a-UPE�0XW)�@,
m����r���K[�����7m��}�8l������E�8F+�uy�)�]��3h#`	g�4].��p���W���;P_������(��U����"�����0��LK@^�������}��e�����7�(~(l��
�P4����';Z����K�����W����PK�a����PKdepK'gather_analysis_all/gather_head/7_2.outUX�Z�:
Z��Xmo�6��_�o�������H��mi��+�}
T�����+�k�_��(�/�9vQ`� �e�x����������W�����W�y#tr���t����8��3�,�LcK0c�%CY�5?��k<)�g�
� �h��x��`��K�q=���Y�.�9<�N_"�:N�Y���7Y�\���f�6*\���h�U��&��A�lPm6)7G���J(Nn�h�~B	k��	u���b���E����p�|1�n�g7�4^�yC�l��h���i�,J���4q��}=o�b�2��e�]n+A&:�c�v�M,�v=7a�jq;��iv��}�EI�&#$�f\F�d<m��=���`IW(��>�����g(�}*m�I`X���'����m`>�Y�x�YEM ���~~�x�V�)-�P)
��x���� ��e<��K���y��y�[}w�KC�$"���	fx/e��2.Y?e��e��S�{(�����<�F����IMH	�"S��h��[����JF�������bR-���^|R��K��[�<K����*�9N�w=��>��g��@�`�0��8�~����wW������4�����W4h������j5a���F[z6g��h�h��r
8�k��n�>?l:�J����hS�AP�5e���!fv���"�V��(��[LXUR�Q*{�@�w�\,B��h�������2�fn�>�/��8JP��z���P[���H���RJ�4
|����������:#�(z��������^o����P����6�km��a�:���7��4`�h�APS����|��g��Y5�+����Q�	px6{&����+r�7>G4/�s�3��<�u���;s�;|�`��j��'z�)��a,%������u�'Q����O�U��2�s^����E� ���h�;%�������}�Yo����".���u���Nc[��}o
�Y�>�����$�WYO�������0���p~��oU���R�q1K�O!w��i%z�.Y�]2q�
�����F�2��;���������+������	����+�2p�SP��auf��H���AV�z3lB����X��+@!'���6�������a�J���]g2��&�&���a�Y�q��u���j?J������{��N�a�M�s)����e�1�+��"�)�&��/���<��X%�w�h-Hjg�8����VqyP9jx��b7M�4{w	+Zz��j�����
Q����hH�,mk��V6���C���a�&��9���5��MD��V�����Fl�^�qx���Sm�2p���X�-��x�m��o�7v��"HK{A��B�xg�QGb����a��LK@���O�?i���� -�!)������@*��'���/�\�a��	����A�+����?PKJ� o��PK]epK'gather_analysis_all/gather_head/7_3.outUX��Zz:
Z��XYo�6~�����@C�>�f����������O��&b!����n���CQ��H>btD�Lg��834B/+~~s��ty~����������x��i^�Qf�e[�q,��o�E��Q1>���z���GT�Sw�4����a�d��3x�O������hw�]��g�ww���
���&Zae+e�0����A������(�8�M���	%��F`��,����Ge���=��$N\\�)����x����K�!/���+���(�B�������������G��u��!��`?�����b�Z��-�T��A�N�{����$J7 �5�2�'�q���'�'K�Bi����@%��;�RrL�>d��|�O|��a]�&��.�s�T��t�n�AH,H�A9������-P)
��x,��� ��y<��K�>�i��yBu�IC�$"���$�k'eteL�����	c�N����d�2���Q>F��qRR
#���x>���V��G�
L5p4=��.��_;�1Hi�/in-�,�F�Dg��8m������m<)\z{���������7W}�:��
�Q�G�W��^s����W�	}��5Z��:�EE�F�OF�k�]C`�u#��q��W��:Z�M�X@y�\ZH"u*��Cj�b-�Z���j�-��bH���QL�=�#��|6���I�2��/w�]��h2qt����a��4A�F5PO��*�	�M���+M�_g<���X+B����f�q���e��E^E��7�ZyJ(�q�l��4��0M�Rv��i{�N�� �4QR�}�/[�LT<�U�RP�jx[H&-��������|x�����>G4/�s�3��<�s���9s�;|�`��j��KQz�)�DH�R`���N/�[f��U�,���u��9/�L��US�,����S�O������w���L�e�`_��4�������il����a7b�'���xY���Es{!���� ��$��~��[������`\LR�S�=�ogZ�
W�,�!���x�p[�]�~��Y��������N�?�z�`�q�_���*c
V`3��,\	��9Hk�����B���{�Sir��Vp�9�j���n������I*	fvo��&���%U
�e�0��,�8M�:��E��S�p���=��'���*���LL���=���|���`i�B��?,���<��X%��-Iu�HYl,��0Z����r���)F�f�*i��0��
�fd?,�$��
��:}<$��5�Y�+~��!H[��v[��na�q�&"HW+Q��u�X�6��@����7���Zr��~�XB����Zc1\�o�H�E����R��>T��6����g+'�r���<Ij[��G��e�d�����7�8�+m �SA���yp�y��7Ls���l����W����PK�:���PK`epK'gather_analysis_all/gather_head/7_4.outUX�Z{:
Z��Xmo�6��_�o��� )�"��@��mK�.Y1�S��D,��\I^���EQ�e��cvH"F&�w�=�;���??���}���~������FN�U<���4/�)�D�k��+,��o�9E��q19�l���XDST�3s�T�Xp�q=�K�i:���`;}���8���w��e�b~q����0��$�X�J!���!X�������n��PBqr�D3�JX=����y��WO&�^������qb�����.��s;o�
y��`_1IEi��&f���E�]TLL�>����m�	�$t��X�-nC�4%���r[�����4{0Y�>M�$1�3��f\E�d4i�b=���`IW(��>)��$��g(�\p���PR�,�|>���6��M�����x�eX���@9,s0P�9��I�P8)
��x���� ��E<z�K�>�Y��y�q�p�M��$"g��R�^�he\s�K�c�b�����e��p�r�}�O�oi�xB
�Q�*�G�T`-z��B�������ip�z��Ii�/in��,���B���8����Z��6�&��A�x���������>\_���G�(���_��>v�������!��5���hQ�������pD7��}yZw�r���hSI4��0�C}�����������B���i�	�*�P�	�G��t�;_��.�v4�M�s�����E����[�����	�y��B��: T��WSJ�P
A������P
[�T�,����,�����5��j-N	� N�m�V�����S�N�z��4��
�2$z�}�/�y�+�U_���yx7H���
�{�T���)����)r�K�8�����}�D�����J������5?��
��~K�e���H'�C����N/�[e�}�u���2�2�N�v&a�O=�2'K�oO�6�v���o�Ez����b����Y4G�
����4����{=�jI�Q_��0Z_`�����9�a���p~����oU���R�q9M�O9C�|;%�Q��d���<�H���;�ve��uf=�3�������}t'�������J��`=��,\		;s�1��J���B�]��M�s/��d{�
c;z�V�W�&� ���]gp�Tv�,�Y{��NV{��y-������t8#�rO����X'| ��������(��o`�,�	���\[��y04�X%f���In�Q�����qP9�y�$�|;M�P{w	KZz
)����lf�
�p��hH,-mk�s��6?^����}h���;��[��F��p��J�|�d�V!����2 N�u����x�����W���h���#5NZ�Ju��Com3|$V<[;:��i	H�|@������2�p��r�I�p���6B��<Ph���7�f�(s���)gI>T�"ON�PK�����PK\epK'gather_analysis_all/gather_head/8_1.outUX�Zx:
Z��X]O�8}�W��"-���T�H003�R����>U��*m�I��_����6m
)-���f\�>��s?\��w�uq�/��:�~������f��xui^�0!40����,���P�#'��N4(f����(&)Tb\=U/��t
�G`��}��h�4�s�����QfFQa�V"�S�D�r<����w�<�i���3����(+:��&����<��N�
M���_>�eG���d
0��tV����ib���,?G���P�d�
�Xb8��b�S�5��������P
���'�<�GIb�]$6���f���q�%q���&Oq�u�*9�d�S
t�-�����&`��\�����J�.��F�\+`��R��R�Z�J���r�K�*w��n�~��{�Q|������C�L�pP�����vp�B��b3�h��h��H�k����6�1�^����K�?�?�8�D' +@�-%GX��j!9*4��Z�1�U@�T�%��i|:�l:�&�y?��8M�3:A	��b����9e92��z�#
�*�
2>'�M��fy�N�����n�t������ZbI�1����	2T�J�|5iR��T��Pb	�)�ZH�5���BeDjAq(����?������[�l��PGA,TXz�S���kF7@���ZB��U��Ps�F7�u4�f��}<NEKa�'w8m����,DA���2��,S���k��p�FY��N"����o�n%(M��f��X�����+��*���&B��Jx_J���@��)&�v���yZVV���������}���^_�}�8G_^�����o�������y&�
�>�r���s�,����O�J�e�Qj�����KM`���y�V+��A�p�O��x71������*K�.���V;]���P���j���c���)S���w�������y1(�B�DT@��jJ�}�L��0�ro��4��]%���no'x��h4���f�F����_�i����$se�����6�������VP�0���%���)���N�K������E�[�K������cB�2\y�@������w�NSv���v��)�����5�W��������p7������&����T�r�����P���o�9���z~j�ft9w3Q���V�Pg�*y�����%���n��66C�-E^���).�,������P���M������NW���]��<�3��L��cN{��O!`�NQ����R���?q7��3�_77�U�^w~S���������^�z������6w�9=.-h������F���S7����`����}���I�C�*�HR^+�5����V>vqi�)�����f����@ i���{����PK��eMq�PK�epK'gather_analysis_all/gather_head/8_2.outUX!�Z�:
Z��X[o�8~�W��T�Z��I�2R;���n���V�}B� *$Lf���{��@h�@�a-TRc�����\��v��py�/��>�ycA�s��t����(��>"�>&c��3����>E��q1�r����XF3T�s��<p(C���O��Y�.���&;{���8�f��>e�rq>�dj�Q������� ���/�&�f�r������c�<\DY�=yVQv�����
u�l�����A/;��@T^D��QL�eQ
G?�D�Vh���b�24P�d4�8�j��	�g[��l8h<0{d43�I�GP���$Q��V\G�d4m\�A��ab�c��x�%��%���#[�j������p�������"�m��O��*����7
Y�X)��JSR��4t���+p�(u�E�+
�)�.��c^�a��izPp���K'n!�kq
h��h���v��V�Q��0��0����|��S�G'�t�$��r������Bb0�8/���{2��R��ix��r���*��0��8M�3����d=������Y�tb���DF�/�J��}�j��2/�9 ���=����@�1��*�)���!�����R���7�&&��D�2A��{C~M=5�#��)���qc���F�����l��PG�,���g)N�J@��`KE�4��������hF�G�8Qq��x6��cXT	#;y���#`"!k���e�!m�Q��=�Q��B'�fj���Wt?��&HW����h����U]Jh�l����x��
Am�.�8�����Y�2piw1,�ee���<\~�z�����������W7�(I��BwPC����<���C�:�W���z2�X�q2A��i��[6%W���l�j�o���K�	\��$�����I����e��,�7�����Mte���B��f�y�},���ngJ����>�������*�b����
��a���=��D�>G0�ro��4��v�n#���	6P�'��#�����y�?�4
��+4����W���&�S���R��������5f�6�P�Rh��r���r�@� �A
���d�����o>�����:yF�_�;@�]�����l��uz+��������t��MD
J������;�o?e���GC��q���g��jF�������/�YR�sw��-���D	{1%������m��1N;�Z�j.��n�L��:���&�PV��!�a�\h�J�1��������_�i��A�n�[j3v�'�&64f��k�����|���o�_�M����������t��zr���
��m���6:||���f����5��[�.}�=�&�j�8��Pi�2,a��=���^�C��y�A�Oj�,��_x(pi~��{�i��PK"���u�PKOepK'gather_analysis_all/gather_head/8_3.outUX�Z]:
Z��X[O�8~�W��"-��I�e$�v)ba�j��Lk��6�$����c;n�6���yX����}�w�w.B�;������\�^�� tp���*�$%B�AV�'L��������<�Q�P�#�'��N<(g���D�(H����zR~�8���}t&;����$���O�>��lz:�z��Q������� ��l.�5�w��E��.�Ar�e��Y��a�[���7���������Yv�����'SP�|�f��~f�>�	0(?����QO��
�X�0pjF��
Z���)�	�������O�?���f��v����*�����%�
&=6y��\�(�,��cI�<�G��+�)%���Tp��_��U�wY^6B�!8$��R���R�c�X��*����W��au�E�k{�)����ca����,=B%�^:�Q���gQ��V��6�q.[����a��f���n_����%�����y�2��3��q4�\m�R�K�-	����c��N1�N���q�O�2��G��NP�p��hq�f�D�,d�p#
ed�
>#��0�#��2���2R�%��k�A=DW�T
1��g� �Q�!)�s&_���~@lm�Rw{D($��k�qP��PP���)	��
�Q��z%�8C0�Q ����R����L���P���8��$����tt���q����<���q,��Q5���
�#Pk�,DA��7��,SJ0��iK8��Q�B#��z���7t7�S���������Ha^��2�ynMo" ��c��J����v�S2.u.�L����V(��������}���^_�}�8G_^W5yO�o������y�L��>�r���s��H�J�O}�^�kX�_��e�	`�3>p�J�	\�	f2����ML)��d-��G�U��y�S?�c��0���x�Z�P������)S�P���K{������Q�E1��0�����M	�/��6}�`�����i�����Jt��l�B�#�r������}��4
��k4W���\���&�3���R�jE4X���K���|G�Z���;�Y$��U}�\5D
�p���N�������:uL�1���A�M��n�����wzs�������Y��������;���8@��MX���o�s�_������z�d������:k����E�<)Q�^LI�\���56C�-D.D���:�\����3!������%�W+o�v����B3���9�]�^~<���:<��;D����6cc�nb}g�*|���V�/�������m�����u]O*:�|1���y��h5�����������G���n�u�R�>xW������J+�>X���5���Y	+��N�t��3+����s	����^ytp�PKaB�n�PKUepK'gather_analysis_all/gather_head/8_4.outUXi:
Zi:
Z��X[O�H~�W�AZFs�-����]@,,Z�S�&�`������_�g<��NpH���*�N�s�w�wnF�}�_���������w���7k]���@�7H���	��q�0��?�c�~����X����,�"��c�4�Rp��'�_������? �)N�q����Y:���F�E�i��D 0��*&�zE<��'m��*��?���
AreEo��D���Qa���P/��&�q�/���
D�E4���C:+J��g������sT<�]�l����SS+&*h[�H,���&t�
������G7�(I���u'.�Y2xh=bA��ab�c��x��a�*9�d�S-p� ��a@^�S��+p~������Y�
��J���j�=+�
VIp*�U�����W��U���0�$^xH�m��Wf�f���=��t��"G.�Zw��v���N��.��;]�;]F���Y�}���G'�tboI9�4f����r��dr�!��*u��&������ql2��������<�c�0�,6:��#Q#$�p#-CY�zNlh�	�*��,/�	 4��
�.�^C�!��j�)��B����J�����&�SZ�2
�*���C~M=�5�qFd( S����q��Mx�/���P��3C�P`Y���>g��n��o���^����6:�U��8NL\�	�m�8�����B��,�Y�-��o�4,��,SF�02�|�FY(�D4�1�3��� JP� [��R:G+���
��s+zeI������L�Li�>���d����_<O��
U��u~vq�N�o�/������/����#��
�B
�n��<W&�Z�]x�"�'��Y'#����e�Qr����6�&�9����*9,W$p$vL0�U.��rS��Y�a�X*��>����2��0�7�:x�Z�P�����t5S,��C��]q�t6x4E~��b1t����M	�������`�4:]MS�l��������
5�R���J��w
t������o����7���������Wg�j�5
V����A���w��U����E��
�$;("��P�)�1���j����S�����-�Y��������}�7���X��~����.����a�w6�9@�	�Mhx|���9����jk�n����(���������m8�J��s�+#a;��n�v�(]�J���:�d����T��&�Z����J6y�M���x���F�������	����!��!bW>.uYk7���Y��u{0�-;_���������|'�w��u]�+:�l����<����7������0Q�_�g~vS��c���	fS=N��r�tIH&Z@�B-}�aPR����$�C�Of0+������f�U�����PK.F|�y�PK{epK'gather_analysis_all/gather_head/9_1.outUX�Z�:
Z��WKo�8��W�R�6��x���vi�MP,�dhm�"KZIn���w(�~��8vrXb�5�}��p����?�\����n�n���\��G����B�;��j�CB��x9EE��P�-W��R=��F�<JP��@��K,���O���dY���u�3BY��
���S�J��9�2�w�s�6��Y�QUq��t�F3����������~e�*x�cS�8��Z�����f9�UM�yU;��g�������-��
�6��*�/o��DyYc�lfY�0��~��r�K�FI���_�rz5�f��li�B�/�:��>�- �/���N w����"��oN[��%�STMM��(
���VL���g 2r5@N�F�A�O@��_Y�@�]�������D�t4=���u������B=JE[�|.��a�U
�q��Vh�������|��e�v�1�w="T�!�T�����C-:hH1U��
x�\4dpB��cy�-�E�1
����"�PI!TT������Y�B�H*S�Q���y��C;}44@I���23�����n�[�:3+�C��,�q�0�H�q��Iac�Rk��4���Q�Q{�!K����v�57�/����KHG�;�z�P`@I�f�^��ml�Z.p�.�m]�dhO&$`R�5�;�`m5(��`"�?����M��Q��=���(JQ�"K��ijdD�
���	'>s������A���!8�v�|��Xfq7��n�UY���xg��]Sw��t��{�<�B$�Y,-���#��tl���2N'(?
�*���d���"
y�Cy���PPx��4���aTi�����3�\z=���K''Q��
n&d���&MCC�h�ml-��f��l�9_����GS���*F��������X��2�X������:�s���V�����=������Y����n��=�/�d���*���s�����a!���
wln��H�T��}u?��r�;�:����WW�BW.j��*�^g�&���W(&B��j�zt�V����/!���#�&k���#|�rt$[�?�M�u��T����'����(2L�y�V��PO�5��a�I������eqX?@�3������'3���t���#���C�v��O���PK5dY�zUPKhepK'gather_analysis_all/gather_head/9_2.outUX�Z�:
Z��W�n�8��+�)b���R����#
2	����M�BdI#�m�E�}.EQ��G;Y7�)���s�B����_���F���7oh��������y\!�ge5�!!<�*����"�V)�O��P��G�j%���f���	����#I����@�������h�1%��$T�����9�����?��J�*�R���hn~B�IT�QU���D���Ee+x�S�8��m��tYE�`U�lQ�`��,5��xu��c���m�Te��
��(/k���<+��G�6��i���
�5*g��ia�`�UKkPK��3��b�k�����S�-z���������jE�U3S��%�����i�z��$�X����5�����n��@�&Q�� ���:Z�������-���z��2�d��k���;����Z���sn~=/��o����f�����������u���BF4o��#3a�<�\aM�!
r�A��EAi��"Z���"�S\p�m�	����8�R�T��@�r��Il
\����<�!J�������/�����QgfI�@��X)@�C/,��(����7�P��+F}h<�/f�F��C�N����u�W��Y|���
@^b(&peQ�E�@o(�p�( �S����&����!�G��	rcR(:Do��u��`�JC����B&T�(IL�����~�(K�g	����P�u�$ >s�&� s"����� �v�t�n�Y����;pU�����������'[���:�����Z�������-��G�����)�'�#����d���E��q����6]Aa�}45�G
�RKu�>�����E=�A����ITgl��q���E7ibG��*`k�d���z�v��x|�?����T2�9,D�4�@}��������,Xm]��c5G}���������]����� L�_��&��S��O6�@_��G����~����
9�%]g�[s�2��H��|g5�
M����f���%Z�
����+�[���o�U�j	����Z+��S<���+�������]����6:P-��������+�������X/R�L��e-	B}�@k�4�=�_����,����p�y�CW�f��c���/�(��~�\>���PKQ��|TPK�epK'gather_analysis_all/gather_head/9_3.outUX5�Z�:
Z��W�n�8��+�)b���x���ti�IPfehl�"KIn�.��s)��-?���b�MQ�����%Bo7��ru�7����yC#�w��=t��
��8+�a�	�V����rH��xR��RP?W�(AU<7C�	�X*�q;��$Y��s���F�>+v���cJ�w�PA�3G���6sn��0O(��8Kq:J���	�'QeFyTT��'g�y��EVLLQ�lTO����e�s�U��EU�A������������M.P���7hf����6��8L<�_~b��%N�$�n��Q9��N3��Za(5���B�8�:<P��0�jq��������l���l]�V�OQ53EKZR
��C$�,����g�H��u�1c�cM�V�w��n��@�&Q�� ���:Z�������-���z��2�d��k�D�%pD��;H>�����>�V���L�5��zp-&V
��$���w���7r��HcJmU�j�5b�(��Cy�\aL	�z,b�PA�5c��r��,N�T$�).P�\�y��#;}0Oh��85qe�8�����nc��Y�"+���
FQ�w7:)���UX���
��/f�F��C�N����u�W��Y|���
@^b%$�SJ�(�[	$����WP`S��V����f\ehG&"`��0	�h���@0���%����X��
%�I������e)��,AaF��8h���g.��g��$hR�����h����ew��,�����3��]SwO�t��t_���<��!�%S�	�[:1�N�E�SOG>*��
��vm=�(�U����m���V���h����:Y��gT����qK��$�3�����P ��ih������%\[M���7�������T%t��Q�a!�����;V����(g�j�z��9�KU'O����3:�B�W��/�t�e�)|�'�Hx���S������+�8�t��N�uW������o�`������N��@�X��-4eA��+�[����9�+�.��j�x��N��g�� ���#�'+g��|�nt�Z�=�M�u?W�i������X/R����Z
8��Z���3l��~���sYDu'�D�����xQ��{7A!��}�o��A��PKO��z�TPK�epK'gather_analysis_all/gather_head/9_4.outUX"�Z�:
Z��W[o�8~�W�eT���o�c$V��tg/���j��'��D
I6	3e���q�Rh�/1�����\|����?�\���n�.�������G]���B�?��jjB��c�NQ�}+G}���l$���I��T�s3Js���)��$Y��s���F�.+�0E�����$�y���9�w�s�6��YQUq��t�Fs��O�������gKg�ae+x�SS�8��m��tYE�`U�lQ�`��,5��xy��c���m�!�����L��5��f���W�Xm~��(���kT�.��s�[��Pj�yj�Z	�ux�Z!��u��QK;�r�S�-�7��k���)�f�hIKJu!J
���3A$���c"�c`4��=���++��I�45�A��U�H'�=-�,t�]�,%el��9�2��
I��
��
� ���_����[e��K3EW��!�d�X5�$���N3�F�@u���� T ���$�nqB��!�<���SE�@����+!��4��r��,N�T$�)��_.�<�M����>�%�$NM\�9N����n��ufV�!��JQf��2������Y�&xK�����&Z���5�Y:�d�b�q_��f��6�V���8�(��w(��>A�

l�`��j�������=�	�Xj@�
wh���@0Q6���J,taB����$�����&Q��YqV�	�h��$!'�����@���)�_p�����h����ew��,�����3b����'[���:������eg��i���N��S|Q��=���c����d���E��q�����]Aa�}45�G
�2��d}�Qe�zT��-�������7����E7ibG��*@��[�"�d������b�`��R���U����+�RF��`|�u=F��������kc�z %�:�W��/�t�U���O��@���G����~����
9�%]g�Ss�2��ehp4�'��oh�4�4�����BS����
�����&�T�j���]�Q<�I�x���Wtq�����z�	�}7:P-��������k���kB#P�q�'��at�f-�ux�@�4w�������w.���"���F���.�dQ��{�����m���z��PK!H{�|SPK
�KiK(gather_analysis_all/gather_robert_patch/UX=�Zf�Z�PK�LiK1gather_analysis_all/gather_robert_patch/.DS_StoreUXm
Z��Z����NA�k��.���GMt��]@��!H�u��e5��
	��qO>��cx���8��]S�tO�=i�+j~S|tu���"���?�����MU���_��3D[�F�����:��i�vi��^�hD����;Z�����w�o��n��o^������)R�H�"E�fdUj�`,E��0���+y yR�L~�K.k�t$w%$O��	�K.%�%w$w%$O�,7�,�,9��u$w%��_N��R��{�%�{gC��2���l��)v�}Zt|O��Qq���N@~�yw� �
��X��o�/��T�{������dl�k�\~��~���|7���
����m��_�\���$��F����K�o��8�[H�W���~"39h0s��`�D��5

�6h0c��e@���^��E��\Qh0SX�T@�y�tqA����;��[7w�I[��!W����������xU�������x�Dw+�
��;Q�`��*m���QhC���n�5��;A�U[7w�I[�=Tik��9p��m���Q�%����5�c����g��j@��;��)R�H�"E�/��c������V�M�s��'��|�\\�z~�:Ux�~_���Dv��g"�J0?[������Xp������[��k�e�>������a�[
<f�����.�Q�c�/�N����x�5���m6x��}�T8N���x�9�c�>p�
�e�������{����f����|3�K��Uy�AY���.�X������������n�/����n��;5�h����'���<)�����6�?��:Rx�~�<X�T�_�+}?���k����w�S\�����W��)R���#+�7������F�w�]w��������|0��"�b�'U�K��?��i�?�	PK�����0PK
rpK1__MACOSX/gather_analysis_all/gather_robert_patch/UXXP
ZXP
Z�PK�LiK<__MACOSX/gather_analysis_all/gather_robert_patch/._.DS_StoreUXm
Z��Z�c`cg`b`�MLV�V�P��'���@�W!��8��A�0]�PK���4xPKffpK0gather_analysis_all/gather_robert_patch/10_1.outUXh<
Zh<
Z��Z�o��������.b����> �K�����������U��w��=�}����k�����������3�y�B��M�������W���+y.]\����@o�m\ �������f�������kF�o��XC�]�Ee�AE���W�`�
�uS<N���.���������,�(,�"�Fu�����v_l��h����b'Q\D[��5z(�d�vYF�h���X.�&�!�B���8�2).�:��0,n��K�E�&8�����VY�����:M���t�������;�>���S�TQ��?J��V>y�j�����o�@=UG����r����|��'���X��%*���;���]^���h�f �����S��
�����,�����UJ*����"�����q���H�q�&����$�Z?C����I��~��r�
`l����NO?�:���4F2�5I&5�D�)����(��bP
��(ab
f�������z:|t����^�)h\;s������>�r�~$I����
�$\� 
�9��Ge�FY�[i	�����V�c����DF,�y���B��0�����gPV�>&��2_��������iZ[tv�������Pw=Ro�,QG[��-qy�T�D�-J�h�ug��L��0��q6��k��(�Sjg��L
Q��E�,T������MY3]3�9���9s��=��14��M�J�MtMJ��Irw�D�%����iR��=���N���kC�� �VQ�S5s���N8�c�Q���Ti���Sa��?
�X��A�#|*x:�G!��_?�Xp:B��@"RW�)����@����>���&���'�!�&hW�n�7�n�I�X�� 5�G
��c����0IK�%��A[H��0O
�FL�c�� +�}y����\���(�t*�����0�t�C��,*�,��&�� /�����]�����N��tgQ��a��'9D���a�������m�S�)���
����
$���pd'��l�@e��h�n?����A����|���K��%��&T{�RN�LGH�L�� ����47�3%���oik�BA����p��e���q)���r�g�bF�5��~�Pd4��<�Lr�i�F�4xS�����L�-��:��&VB��;y]��Q�/�J=A�����G���bW�k�.������2Tx���3��B��2w�T7�3�!5�*��bF���}���	�,f4
�r���d�	b9��aus���Y�0O��<M�1��N��K�O����&��8`!j*��-�p���g�^GV��(w���*(���QT��@�.�� �R=�,�����a��j�� �A�u\l��)�z��7�}���
W���[CF���-���P����J���p0�����AM���-�"��Yuw�Oi�}00����Q�D�~w����in�n"���}��p��;Hy�����+�.�]�%z��46���h	C���~`���vZ-�iYT�D�gwy��*�z�I�{H�0�ka%�f�N�P���0�Z���#�z�Fhw?9�dV�k������ ��k<a����.��q�%�K��7����Zn�g(f�~'^=��li���TqP�s=���Hr�b����x�k�=���8�xi803��M��=,1�1 �n|�=@	b�u������p"���o���e.��r6����	�!����$��yu�5�L���W�����}�+ ��D�s�Qp����m:KWj�\x��1����b�����[s���A��]���������"GV^�j�&��Zex���b7�MC)���d*9$9l������z:x����fB�������,�R� p��/#spEK�g2��5u���6��u�X������� iCw�����#|�yG7�����n���`���4�}�������!�j����{�{���x6Og��([�T�l�C#�^����_F��Wp�+�hw�2Q��v�iF����h�W0����y��I{�g�c�����7t��5�<0y���]�s��U��Uc��a�����C�����������.��//.�PK���6/PKsfpK0gather_analysis_all/gather_robert_patch/10_2.outUX�<
Z�<
Z��Z�o7�����"b�o.��@�����i� (�S�^�������m����C��l��w�E�3$g~� ��_�~����?���o��+y*�]�M�	��7�6v-�4w��P��Lc��9CY�{~��=^�54�9Z��+�
r��^2&��2�7M���I�]~I��@�?nW�%����[�����������pk��E^l�M����-��b��MVv�������3�l�L
�G�����<J���uX�Q����s��.N����CV���y�o�N���(�nm���.�F�}�:EN��>J��F>z���(��QV�ehE����~Z��p�������`�:]-�Kwo�����T�v�f �����S������������6:7JI��0��H%��Gj�
Iq ��8e8%��ZI2��t?�	��4_�wYZ�*�bGj���w�[�l��X`
�l$��`"�D�������f�(�����F��
z\M�O��8���c�W�\����4��Y��m�$����7a�D�6�B�J��^W��U��4�ZWL#�AUQ"�\x�I�	�Ru�t�`IP����^��2�!�?\]V��R��	hm����;�l�C����V���u������������J[�+��������K�Fq��*ILm\Z�	�������N�R[���L����T����X��4�Z�c8?�&�<�����14)1�&�}c���3MM�Us����6��@}|m�����fb�f��}���l���B4�x �T�t���G{���	��LO��(�d���L�����!���L!��"�wan6v�����}&(M�����n4� �K�X�V�U�)81�����`d�5����1.��U��l�;��3����������^�������0�t����9����_7��`�]�w��w+H�;���tg6��n��y�I�0r�/��j���(���g��7��"��gW�o_��u���+�
t��ov��?����A����|��
X"$��M���	Dg�}<f:�P��P�2�AM,	���%��Z�AA�o�@�";QH���I��&_7	Y�4�R��&Fk�}�H}�Q7����Y��	��/�g6�f�����D>�X=����e�Z�/�1M%���A9=�����<����q��i��������,�Q7���D �-�(�5i��C�H���	���s�l���S6�$f4
T�!f��%��A06�4�@O�WE_�*�Vb��A��R����1M<�q�r�3���[!�I<��;�z��2���
C�>��p�0K
@��z�:�x_@���#���+�����;��������}���_#��"�������W$$��'B�_Pl�������H�P�{wP���{]��M�Sw�V	m��Y�������g���
������BL�n�.��D_C�K�/(� ������o�Ec��*GK
��e�������;��[��+u��_w~��J�z�I�[H
�$��r
�q6���}�DX�`�t���<�	h�������<���BC��[���(L|N"<��o-��J�����Ww5<[F�)�)U\�{��]O��g��s�w�Uc>�SQ���"+:��lZ`\7f�zG(
e�~�n�w�����E�Z(��_t��P���b���o���E.��g����U_+����~��DB����Vk�r
=��,
?e�*W@^��x������?��m�!XB����oZc��1�q�A���P����#3PgKP�����@\��*nZ�	,�2��nL�������6�9��cF�V���������>�Q�
�
��^��G$��X$5�(sxA+����A�#sxe���2.!���e�������3j��>���������_M{�:>�?�j$(aFJ���_l���H�LU<��<G^	��gp2���=���\)�����;J����b���`}��
%�k}�s�\��DY��y�>����R���@���U���c��G��6?C��lT��R~�`����)�����PK&�LP/PK|fpK0gather_analysis_all/gather_robert_patch/10_3.outUXH�Z�<
Z��Z�o�F���b�E>��}?��@����i�:���)�EX">Z�}g����lQf�����^-g�3�����{�/^]��{����z'O�����gBg�M���a���P��Lc��9CY�{~��=^k�s���6����%��,���Z�h���������.�]Y,QX�E��2���������$�F��"/���&N����x�ktWD�*Z��8��?��}�\&%<������<L��8�����6�Qq����^�ZeQ�w�v�4��
��6J
�V_#�>�i�SE�V^!�R��|T�z�JcGU���T��J�?F���m}���7=?E�:]-Q��.��u����?E�4�\������� _���������[�����[C�����'B,�p'�bc�R+IF�~�����������r�`�;R��:;}�(�����$��9f j�2�s����Da�������4��f:�9?q�`���3n�����_ive9z�	�$�����	�$\?���!��{[��V�[i����V����k!�R0�%�a*��+$����<`�/�^�^��2�!�?�D����M�9�I0������A
����z�A�zo@d����Dh��"��k]>!8������gb�����u0�MLM������6�r0k6�z��%��.c���F����Pu��1Q��c?%[%E%�%:%%���l^�B!�'C���<5'$z�kO���������4[E��K5�q��;Bka���
�%��T�t�� ����?�J�T�t���G���f++����~E���=��e|*�<9,��`��6�}����	����8����A5aLQC@g������Z�mSa� :0�S���e�K;^���8`��wAV���r����!T��Q���XISA��,����YT�Y��Mp3�A^E�w7��w+(�;��q1���0�w�y�#NrH�a1�`����I�m�g�)���_���+���w�:��6�������h��?����A����|����D�( �p�@F(�����H�	��@������8��N�Y	l��TP��M�-��w$��
M�������k�������M������E�f��%�I��-�0Cs�l@���yNh���!������y���,������H��LP@3�����<��j�q��m�����,��g����Y��0������!f��%��S3�C�0Ws����~�$!���Q��dlqf����ua�i���� ����rH��y��Z#�(�j���q�r(1c�A���>
BQ�/Jr���9�L>��;�z���_�S�Xf�;�OT��5�BA��]���� ���e\l��>�_�����>iRK-V�|H����QX����ob��w�tt�SA����Q-����]����"-�M]Lw��J�;��*��m@t�W��-�V���*�t�6�]����J�Z+.(� �p����o�Ec���VKX
?��e��9����;�j�N���%�B��I�����&
o����A.�Q��S�1�^DC��j,�h�����^���A�cKh�g�]^?�����9�ZG���C�s�6O���z��}��Ww5<[N���)U\���BO��gRC��N=�/�x�{�;���8�tEL��a?������
	�H-����F����M�|x���q'�E�z0Sow�\��:��#/� C����R��S�L_�Z��)��C����S^a��
j'��W�?��m�����qw�7�1����8%���yQ=�[`��n�(wT�����"����&��v���q�nN��Rw������Z�� ���������O���f3xG���9�l~��F��d���aJ������Y-���
U�����1�����w}y�|]����p����������o/�g�O���A�1���{���j�x	��V<�B��C'<d��o3�y��<�eo�nT>���W=��<��}�
�b^����<t�F����DU���EO�<r������s�M�8���.�
B�6?C�����|���*�*e�>]��><?;�PK2�;-�.PKwfpK0gather_analysis_all/gather_robert_patch/10_4.outUXP�Z�<
Z��Z�o��������.b���Q���w����AP��A��^���F�����CQZ�k��J�:0��f�����B�o���7��@���z����s�������z�l��E��%�p�0��c�P��V\1�~K��
�-����5*���b�H�5a�C�{h�e��������Z����DqU����8�������si�q/���6��:I]R�
^���K�.�r�'�C@����eZ�CI�.���eR�Y���@�Q�7��%J�2�R�6���,��+���v���Uq���������Y~H��*��

��Zm���$�d���m*4P�������m�����=��r�-/Q�m/�����Em�_�&�A6�w����)*V�nosw����UJ*����"���iqF��ZJoq�i��ZI2j�l?��K��c�U��cG�H�?���cT�\��1�� FY��&��1Q�7����@�f�cj�2f��������a���C{9f�q��e�@��;���:JS���{Ui�zda�(��,I��[cn�%X6�b�Za���FC�eDB5R��#�`�L�
,��`�}L�O��X�W��Sm���������������L>����\�����`-�,�B��ZZ���t���e�Y���0��	�����P5g���
���j������X��!��pZ&������hR�
K����(���;"LIL����@�������o�u"P�>�>d�d���1����n���
8
�E@����
��Q�h[;"��9d2x:�G �7�����Z�xu���]�����
�@N��<Z��}p���8JQ��mu�Nb�Z�EP�+jcX5��@i��=�@��a��(�
��e�K;`�Z���`���Q^���j����!�7.�L���L��0�L��}����U��s�vP�QY����������|R�s�d;��<���l��?l����Yv����K�w���!�����f���h�^�o�k�L������N84��g<��'B�����OwY�B?"����H��,���^I�'J`	�Mac<Y{-�P�G�&��k�XHo�R�U-��v�����s�?R�d4��=�Lr�&i�F�4����eC3l��*~r�jr� u{��U|���1b5�f����m��>�-��|+m�����pn���`�z8f�g����N��s��������bF�?������6Z����a��9�pe��4(X�����
���,��sH��n+�F ]�4Q��x�Bp�?������wGT����U�g"��'��[(�p�%4������.��]�:�3i�O|^
���:)7�����^��c�:��rB��"����k���U$�����s03fPj�=@��vX�.jK��l����)��KC=����Y^��� z�w_��-�V���������]����z�Z+.(� ������W�~@c��X���~!��<�-*�l���*��������O2U�^��Y|5�����r
�!���w4N�o�gHG^?�xy�����H��j�9�z�FB�*}+	���#t��#�1�0(w��Y"v����3t�����,�x�9i�����+(
�BO��g
:}�oo�`��(�����@�0b�����l�Y�#��+�P|����88�G3�2B��IM{R3�I�=.��r�nGA'@�����R	�P]����#Sn����U��a��BE#8�5��(���M�>�n��������qfLo��C����f4��>���SR���-�E�����&��Tex���b7�MC)<��~��e>���$�o����B���j2��;2Gnc��#F�����i� ��6��-sxc+��X�
�Q�e���"c�o���v������/%����,�>���`���iV�$A�����:1���w��_F��s���p���<G^���@\�F�#����P~�����:�:M����<t��8��9�q���K�p���J��z�%����g������W��j8V���.����gg�PKl`\�/PK�fpK0gather_analysis_all/gather_robert_patch/11_1.outUXS�Z�<
Z��V[o�H~��8H$H���K� ������B���8���c��@��~�x|IR��/�kl����w��.�~9��Wg��7�^�a2��h�0�������RB�d��|��g4/�3I'0��b(���)*9��������NFH��$*�� �Le����W�^�x%A���l���"(t��I���)B����H�D�+J<���z��E��Q�!(�:�U(��[�����J�H.����W���i��Y��D�������$\�d�_Y!���������'%A���N������y��p�[H�c�v�&�q���a@Q�|�l*P�_��3)��C :(�>�Q���$\0��:�J�A(�-(��6�]�DV*���6���&�tv�H~�E�&+���5~N��'�^�=�c������+M�Q���Zy�mP{;$��h0�3br�+e����S�p���x�yT����H���^+o�p��|
�r��A��d^>�u�m�K,�����o56�rn]�:s���Le�T�tS����Bgr&��C�����������M��G�x��������}�j�r8
?&s}oi*�(Y@4���`�2�yk&
���[��eQ
��6m��F�����B���k��G�w��q�����8���?WWV��fm��+�A�W�o�W�\��T��T�5R+�����,���!������`�O�#���b;�N�r��(�L&�������?�b���we���2�,�t��'Mn?h<X�}��)��3���)`~3K0�w���YZn,���5z�pw��I�����<q���8�}�_*����d]s����/��4��f!x}D
*��6�tJ�N�,�1�h�rL�����f��4w�{������'�����9���
������������{nR�O\����Zy5;�w�#�de�0�
1'�0Ok	C}�;=!�MHp��}`���U�j~7V�8V �����uXV�l�(�y���9���h�?PK�����PKwfpK0gather_analysis_all/gather_robert_patch/11_2.outUXP�Z�<
Z��VKo�F��W��@�Yp\.(��:iR�pmEOC-$B)��X���%w��E��t������f��xu������psuq��g����Lp�,�
`�e5���x!N�"�YN(�L��|"������d�'�	�0���n� ��n
��YR��Q����"+��|x��c�Ei�����������#*T �	������=�h�1"��
Vs�=rb����.�
r�� o���
��3(-���V����.J0�dz:vz�UTg���"��MTT	zs��Bi5~;*D����b������Z����.uxDI<�6�C����u���G�/���n�����W~��9|��l��@��x��8�+�C'LIs�(��_aOZi\��g��aY�Vi��{J��,��<[�5L���u�y��t��(Mu
w��bl�<g
667���,�]Q�-�~5����BB��iL�����llA	1�{�rh��f!��-�Cy_�]�c��U�I���^�����L�!\����i�6u]2w��*
�P��Cm�c�V���3��Xbo�\�\\�����yT<��[tn������&����l�[��2�f�L�WHW�8���i�����m���V9�:R2
�~���!�q�+��c�����q;=l_7G6�����0PHb�K���M,0<=H��
R/<6H[i����
�,�C��k��
;_�Q�>T��x��K]$�h4��.�>���UW����xQ6����I��������A�)r8B�|F���!�\ZP;
A+���^�HlGjt�����7�#h�nsy9�'�J������o������r.� 8]�g�^��l�H�=q�3.�I)}����"���.s�b[���\�����E�O��5�)��.z��5��ey�;����2d7�w�����A���r�c�Q���yje��]bz\a^6��&��SaJ����s�<���s7Q��r�C�r��:��zn�H�(Q�*�8��6�PK�4(E��PKRfpK0gather_analysis_all/gather_robert_patch/11_3.outUXY�ZD<
Z��V�n�F}�W�C�H��^�$(��:iR�pmE��ZH�(R�%�����rI]��v��y�W�r����3����������������������:^�%�8��r���3�8�����bF�G</�3�L`Fe&P�k5���$!ve>p ��
~8!-����M�(�sUVyZ�/�~����8
���E�a�,"*|O����>"�Q)�e�4+��."k����\�|�!'kr�=��CJ����������_Y�Ry��T���N���4Z������y#�c�x�Y���q��T�P�j�;�J��F���5n�����&LJ�z
zXPa��/#�5�\xgAtP~�%|�����!�SI���8�-�c+���1��!��R����6�$V�%�C�q���f`��8�:��{�$*�{���W��u�������-j�zX
����y��|�S'��� ]"(!����;��F�7�����V�,�@�%�K�$���Zg���BM!X]?U����I]��}����P���P�}��R�Xr���
K�������7���M��g�x��������m�z`�|~J����Tq��x���A�������6@���v����V���oi�~�AMx���v��En7m�)Z�3����q}d]
Y��P�����"&�D	����A�������jo�m�,����F��V�_a�M�=���r;�N�j��8�L&���������r�a��U��:2�,�tw�Olo
��=Gs%��@pK�NS`�c�U��n�z��9����$
�u���}"�~����t��r�>�~����.z�u�;�%�*���4mg �I)]����"�����G����Ol�3�;�=*�|�dF<=�g�~{
�|�@���Vg�q�(d;�����=����7��VQ�H���'c{��w�p�y�H87���B�w�s��y��S�������1�{����TT��l�H�c�SG�s�d4�PK,w���PK\fpK0gather_analysis_all/gather_robert_patch/11_4.outUXD�ZW<
Z��WKo�F��W��@�Yp\��I�4�c�6��'��!���H����r���E��t��������
����o���������
���3�U��K�q����)g�q����bF�g</�3�L`Fe&P�k5s%cDr��lp ��
n�����i\�$a
��*�<-�V�x��c��I�����"W��T�'u	i4����D�e�0W��F�vY��?��R��9	Xk����	����5��������|��4)��O��^sVi�<�H��	�2F4����������p��+F��������VE��p�K-��������H)��5�aB9�@���7�cTp���A�=,��%���@��86>(6��A8.�	�l��`���!K1n���l�X����iX�Y�R[����q�=tz�0ITw��",�,k
:���)x���	�0p=�S�`i��:94��A	�����x����$<��U�Re1�0�?�+I����Y��oE�PSV����}�����
�>��\�U�rW�|��R��r:��
S�������7���M��g�x��������}�Z`�|~N����Tq��x�x��v���^�2@���V�eQ`��l}K#�{x�j��Mo;�|���<7����-���p��>���,og������\2�6��c�;3HE;HE� 5R[�Cm����-i��l5����?�8y(���4��*���d�]�}88��*���P����=3��Aww�������fP��Z0�Q"�`
\i)@�4F>�Y�1L�z�u�����6�#i�������_���������������A�yO$} ��rx7/���8M�zRJW�@[Ex
<��<l��!�����X���t�4O3'���/��._� O����4�����\��+�Ga��[#�I@�g���^c������l$��L��[������������w��6��
���������:��WJ�'�����O�d4�PK������PK`fpK0gather_analysis_all/gather_robert_patch/12_1.outUXF�Z\<
Z��T�j�@}�Wy�
����Hk��i������P�Q����BZ7���H�����)��\��9���O_�'7p=]�J9����i=�y����]�(e�iB��TK��}2�O7=�(v�(��&�}�#Z`��i�Y[�}�����\TvQ�f���"gv
r!�R�t	�T���};kk��8�,-L�LN�i=O��&f������T��T�}��To� Z}�$RS�'X@(_�p���{@v��Vw���:���$C�<�����{]W�v&�1�Y� �#��	���aB	��@�������d�pBtIA�P�`���(�L�Eb�s���bi�0�Ll��
,%g�=�
�����L	B�����lg���������0����������.'Go&����������Q����������
�b�����%��O!�V&6i�����U����}�g!3��	ex!���������a{aD���Ey�mus�pm3���yZl/LLn��n�4���P�a]���V	JzZ���F���!5���;���k���\�s&���N[
�p{e��b��m�7�����%��=80���H�;���j�As�g�A��PK����b!PKvfpK0gather_analysis_all/gather_robert_patch/12_2.outUX=P
Z�<
Z��T[k�0~��8�%	�B���B][W���y�Q��-S�2��&��;���%���>�`p��r�����Up����E�z��g�`�d�D�2>��&	���&��R?V>��$6�r�0#�S0I�|�SI�!���n6�Z�����K��R/��|^�yh� �tN����D�E��h0��Z� Mr���tV-�"����5�B�P%�W��;!�m�l��S]"%���K�q���#$����UV0M�<W����I������Z�����a�8[�q��]�a�8�3L8BZ�4P���3�D'��N�} a�a�a��n�X��c����|I���::�6fC��E���z�>F�����p����/d�hY��9z:�c������g��5�o��W��f��?��",�|>[�m�UbQ��Z\����RE*)��!���ov����=p�C�Ii�S�2:�y�$:m"P:j�Q[W�0+�V��K��	O:W��
�^T�������2��y��N���8�2���j]�B��7	x�P���]B���������e�9�������BM���26�b�����kh�L(��U=�Z�hi���xO��^t����OPK�|�_!PKafpK0gather_analysis_all/gather_robert_patch/12_3.outUXu�Z^<
Z��T]k�0}�����	�B��a�����e����\[$��elem7��wm�I�$�O}���8���s������������W����1�i��`����RF�&�kA� ����!��4q��S#F�[E�47a �$LQB�_}Afm����;9�����,�YD���\h)}�T�@=�(�Q��h1��yC�&u&'��^�en��k�Eni*�h��>v\s�������>DQ.$���%��+�CE�������LU�u�I��eL�U/��4s\�������0g�*������k���xZ�\I"�S����u;#l8\GU�e&��"1�9�
X�i��4y�ovl����,%q}OjF"��ez����|������c����p�]���:��_��rv�f6���>�o�xU����m�>����)v�m�-i�
��2�I��{�0�z�u�&��Ug!3��	e� ����m����q�`E#]���DY���\+&���y~��3��h�����>�ER}�;v�U����w���B���<E�(��%Dw���6__��|�_��v�Rh���WF�/vjh�����X�z�&^���V��dB5�C��0�PK��'{c!PKTfpK0gather_analysis_all/gather_robert_patch/12_4.outUX@�ZG<
Z��T�k�0�_q�%	�B�e�
u!l])�J�1F#��HLm����t���N�lI�>�a��:����x����*���d|�*p���3z0I��"]�Q���p�P��C�+��c��o�C��Y�)�$S�G�C\�"d����j]�}�C��/�K�,��y���Q�6w�6Vh��@�"=��F�	�zi�������j����.����*��*���q��$���C��EA�@G\���s#�f=B����|Pe�4�s�������<ZM�u����&��� �F
��	���7�q��uJ%���C%]W����H�q��e��*��<V+��9,�$�C�f�����l�R��H��Q�8��2y@�I��z(����m�e�V���`����;��W�0�	�������ht_D����}�l��IjT�E�j�3���z�/ ��*RI�����K��w�{{���>��9�/dt2��Im"P:j.�Q[W�0+�V��K��	O:W��
�^T�����;�6��qzz.�A�1���xP�z���I�	��u�:��A�)s�L��y���m������BM��?2�_�R����k�0��!�zp�R��$(�]�(��`V�8��z�PKl��A`!PK{fpK0gather_analysis_all/gather_robert_patch/13_1.outUXR�Z�<
Z����o�0���+nUaZ-�������6mm�Q�aOU,�
1K�Z^���HP�*�a'��l�w�9�
�&��������/�&@������4�_�^Q.
%*"d�C���^�� �z��@;N�,CH'�'�0c�][K��O��ia�>\���d~+%�t_:�N���UV&����E�Y��;�tv}��^�������s���?l��pa�]~zt	#O����M|����;YK�_�bt<�n�Z�5������v�h%����/��M+�]����uSf��U�T���X��)�Y���kJYb(����c�
!B�x�1+����$�	�%5���(�7�f5�6DPB�����b�}����Xb�-
�$'���p-�$�������=��E��i}K�5����Q�7\'qe������S&a����a�H<b�#N�]��T�������8�1��'��N��C�w��px0u� ��qRgqpXm�C�>�}�x�~�W�8������g���9�P{M#-��vT�S��f�7�d���PtA��p*p%���JT�-pS�C��0�7_��(7������bTX�D_y��]�W�8��PK���[��g.��k�v�����b��2`���PK�1�@x�PKyfpK0gather_analysis_all/gather_robert_patch/13_2.outUX�Z�<
Z���Ko�@�����!
T�j�k#Q)I�Vm��z�\�+�K��	���6B4R!y��<�y,�������O��<�~�:�O��I�x�v�
�c�
P�!kr�P�<$C?�1��v�Y4�����!F�Rk��sS�vZX�����	��[YrJ���p�)\%e��v�]�e�����O������6/��+�{;_�a����c7��w��k�hZ�,W6u9����lm�`����(����g����M���ah����2�����/��M+�]��&���<PFUyS��&r���(�*w�����!�]��!<7z�K�Y�]js�-T$h6�1������$�A��$��5V9�<��a�\b[L�H�
Zi� �Be�>@QI����p���X�@��o��r��o�Q������RP������E��H"#�<�!�zee��l�|N&����F@��6��n��G�����hj�a������������}�	��x�~�W�����*V�`Y#|��(����FZ����^��f�7�l�[_tAbO�x�xl���o�����f���G�9�X=����.?�
�s{o&Q�aOK�.P*
i���G�|��T78K=1��m�j�:��_PKO��*s�PKmfpK0gather_analysis_all/gather_robert_patch/13_3.outUXL�Zv<
Z���MO�@����"��j?��H���j� =!�Y%�7�7�\����#	�!E��Q$��yg���5�������O�:?�|�x�9�O��I�x�n�
?`�
e(	!krw_�'#?���(��h
>I�@�8	� �qe#�:7�g��M8�p���1|VJN��t|������L��.�����w��z�����Cs�����nK��.�?l���~�F}�nvt	������.��\���%K��Q19�s;��}��Z����6�5����R�PW�%w�Y�u[��n������*�"��E3���v�Ah.�!4Cnz�s�y�]js�.]$h�40L��\�5 Z�LI	�������05e�a�(�	��DhAbm �4�PTVu�2\��4j�����Q�7\�Qe�����RP.
������P$�X
l�t��*0ml�}N�����E@��6��~�����`f�Q����\����z���X����#��X�\
��X9��GBK&�mh�����v�h"��>��k�y�N����E$�DP�'��'OX�m��"[<����y�rs/��z��!F^x�(���je������PH��=�x�,�z�������.3e�^��PK]RCw�PK�fpK0gather_analysis_all/gather_robert_patch/13_4.outUXW�Z�<
Z���Mo1�����!
T�����D�$M[�I���S�],X�]�]��K~{�]��H9t��#��x��
�&��������/�&���Sk�y�����2��BJ"F������=�� �z,�@;N�4CH3�SJIK��4�lB��O��i��>\���b�u$�t_9�*�����*�������<��w:�����C����+J�o+�����q�������F.�����|����;Y��Y����x8,�0�	��+hm��-V�R�ARm���K���~�f���)7J��o����\�-(�Y��7{���������Z B�x�1-��\A�����C���Q�o>�--�DRB���q��rY��\`�
&�$\a
�JB�n "�4��P�V5u�s��4(����b��F��p��9x�URJ(v+��`�[)��G,�j-�������>���
�l#��2��N��C�w��px0q� ��qR��<8���!w{�>^�?n�f�Z�K����Ug������J
��M����%��jN��=���4�s����3T�J�����-pS�C��p��_��(�u����2^C<�EDH�Y+���j�9�i�J����-8{p�4��J��T����w��~:��_PK�DOv�PKjfpK0gather_analysis_all/gather_robert_patch/14_1.outUXJ�Zo<
Z��TMO�@��W�����~{�H���E�����Y%V;�7"��;�J 8-��Q��;~���]�7�o7�W?�������
��=8O���y�#���(EH�������]2��H��m��6�,\���DpFH��� ��%>=fx�1�l�<88�NWnj��kX��,�����d�������^6�Zv�@4��4<U`�����
�Lm������v����-e��v���C7��Q���n]4	MP������g4v}��b_�$k�S��4�K%�&Z���Rl;<���P4 Z����V���&d���Q��>�P�~�d.�nA��%J5w�A��e���o5�4u)���K8s�w��Z�-om�"z�I�"t��t_{�	5��r��|5�:��%�����m1K�����0TC);����\|�������JG���}�.�������:��<s�tV�6N�x^��m�|����g��B���l�6[�������A3	J(��i)(����2�J�LaBh"�{�I�)������[���=��B��P�g�����b���������C����2J_���:�;�~@�qLW�G�
����n
;u#�Z��/�����uIy����Ut�D��8��.q���$�VE�?)�,��n\��	6R��\3R���>�{�z�PK�����PKgfpK0gather_analysis_all/gather_robert_patch/14_2.outUX��Zj<
Z��T�n1}�+�������T"�6mZ�Q�(��Y�{��QH���CB�I�<�#�����s��������w�8;>�oCo��E�ln@?-+7�Jp��i���1��7	>��N�Jg�lnF��9Q��3_�����
3<�`��{����L�3�V�a��}m����nf�;
�6dm��roC��a�S�i���r97�
.2]f��<��L��t�wK��B/�E{t�[m����F*EM����~���q��|.m��3�(D�(\*A��s"V�m�'v@(N%�����4��h3��6,��5�g�0���d7�jn�`uF~����4�Yf2[�����Jue���wq��	a����=�9�����I��$*I����4��t�����jf��G#x��X
i8��M���8>����"��&���@+���p�r��p_f�d�l��L�Uf��Q�S~���B'��bb���d}�'�e������a��p�����n��;�P2"���&�FJ����P�P��AlNj���X�	����(Fy���D��Z�M���Q���W��8��d1�I��vh`\a_L^.����S�DD���yv�2?6.��cS��.��w;�[�������6E	B#y����IW�"��M�0��������	PK��-���PKmfpK0gather_analysis_all/gather_robert_patch/14_3.outUXK�Zu<
Z��U�N1}�W��C,_w�Q�DZZZQDA���6Vbe/��#_��-�
�<��(y=�s�o�=~\_�������?z��,zp�R��q^�1W�sM�"�KC(��r���M�|,��&�+��w�B�$���6]A��K��P��!�g����Y8��
;3��}[�iK��m�j����-v2�m��z�6���<�v���M����-J8OL����s;N�*��{�T��M������hR�X�}4a\�4���n�	�-�S��[���?c��Q
E�R	�@5��H)���!"*��<��P��������4Qc��g�Y?q�u��$�^�T{c�2���xV��I�����,����el2�3�lx�+2B��(������f�1�$��a����v�/:��������r��X�!
���F����>��m�]w�����+��7�I�y��5t��3;x�Z�I���rv���5Z��\ZM��k6����n���&�e�����g�����4�R�c���Q�-�&]��*��^�����/
bsS[��f�Vk$�f���b��3����k�U!����Q��NV��z��d�I���h`\a�m��pU�����b�:�\���T��)�DK���D(���L�r���.��E��� �������+�H�$�$�p|�g
�A�wPKl�����PK`fpK0gather_analysis_all/gather_robert_patch/14_4.outUXE�Z[<
Z��T�n�0��+��`�IF n��E�	�����	��W�'_��fg�������h�{�H�w�W���������
��8��u�8/]�I�y���.���o������E��u��:gS)�%'����$�W�?�0�C��6���3p4�f�����l����m�j�E��)�4dt���mC%�rJ���6��a?�bi���ef6�o��^g��wK��\�"����&e�u������i��c��3;'�\���f]�$��K����T%���PJ����$�_���Z�p��{I5��y��
��LJ��J�4�A��U���o��$1	L�K�
N>.c�A�A�`�[@�14C7�=�=�L{J�W��>�7�n���Y�eG�\���U�8�����SL��dR�����<�����.�tR�p{��t:]!����FwyfF�@V�6M�xYN�l�|�1����j��_���l����u7��v�T'�)�)���d�Q��)<�UM@H�t5BHE���}�@���=��B"`�"�����~�yU>����k���A�op�t/�C?���h�����yqW���	x��_N_g�K�{����?6���b����I����Z_�6��E���LZ�xc����y�|�r�z9d
��`pPKv�HD��PKSfpK0gather_analysis_all/gather_robert_patch/15_1.outUX@�ZE<
Z��Wmo�6��_q�`�o)c.������2w/����	��%��:��;J���J�v+��� ���y�w�<�M���o��������Ng����7_`���fxmS��I^������HAH�J"b(�����G�p�	�"��<q�|
��f�u��_NH��1�<����'�2��z=����0�*���A���3����0��3-K%C�#BZW�G#�&)��1��>0��s_�����:�P	��9�Q���S���������@�Z�6�[/��~(�j?��1�mf�3�
#����ln��i�a]:��K���&�n�u��������^Q
�H��GJAy���'����J���T�(dm��T�k{h3��Lr��l1��Y��f�6�3�b�������?
���8\�]�J���U@�����_�����U^�1��g&xO���:On�1�-R��)��x(�@��la���v����������RBdDi�^���J�+(D�w����X�kh�U�2�����f���XK!�U��C���U%e�S���`�>\���;��f���l�&P�&�%���,?��c?HmU�l	��zLw�`�a���B����Joh)v�
�D��S���E"��N���p���DI��
;���'=�k�r����q�>�������O���W�����XV5�+����������[��}e�����������CuM[��o������i����w�=��+� ������3{����)�BH��c�����Ma�������J
>���1[&y�� �x�Q�o3$�]>�?����`A{,��)�OG�2�}�t�%�.�9-���b{��q���l+AJL����,4��F�����`�O�z����tk��Y��y�?����!o�x0���t��s�PK`�>m��PK�fpK0gather_analysis_all/gather_robert_patch/15_2.outUXU�Z�<
Z��W[o�6~��8b
0��.4�I����.sw��W&l"��JT������d�V��[���p$��>���|6=�u
�_N���o��6��	���_�����
�����Ee'	g<�D
B�+�PP�	�faWFE�yj�����DPN�<!�u��uQl�o�.�27�z=������.�
��As��g��e��s�[4B�X��G���������s:�Lk����juzv�"T�)w6g*&1o6B�	;��*�H��@9�Q��9�E��eQo�'}7�����:�ad}S�����;
p�/���x���M��y}�MKK��E$t��P��DQ�NJ�����Oh1E	�0N�H��
K8M��=��NW:���E��p��Vf�p�?��SJ�Qv6����~�������
\����;|��g�nWEm=b���u� ����Ho�1�-R�0�$E�}�/��S�,�7�����(%��*���>V:��Ix��8��sc
����S�J�K
?&���c-�3[
E����J����%�w���bn���v5�6�$U[>0�R��y�)i�������������-�{�u��7,9����*���#V��XHD	j��	�T'�MQ�{BJcw�nL�^���'	�� ��^���5Eg�T���T�]}�1%P�w������tV��o/�c<����Z�w]1���	�$�	M<alY$C���U]C�%�S���z��0�i
?$; "�8 ��������7k�K_v�}��Z�9���&��-�\��L��0UZ��
�����6CR�����^������"��b�|tI��;��B�}NK��cClo�>�z����
���
6�|�-4��F�������bz�Y5��V��5�[��I����y���������?7PKMU)��PK�fpK0gather_analysis_all/gather_robert_patch/15_3.outUX��Z�<
Z��W�n�F}�W�C���+�+T�Dm������>�����B.�_�Y�$K���
�i@���sv.������)�~9�c�����t�'\�:�6��g6�Wq;��"+�Ds�%RR��y���0�/�z��`8_�r�'v"���	��6f�e[�
�E^�����S`0��+���ohP�����j������02���wn��>���0�"C(������?���X���M�J0����D$jXq*4
IOP�I$u!ct��Em?�Y�����6qjcg�F67E�������"v�|/-~�Y��Y�El���c!Q^Q���H��#��\���	-f����:T��.Ls���=��]����g�r�!�Z�������1b�������?
���8\�]�J���U@����r��p�d���:+]���R<����b�-n�1�R�0Q��Cb_�K�������N�n�7�$D���T���XiM�S�|�������^�+��,���i�ukI�<l-T	�
�T%e���B���>\���
i_3����h�&���6--%���4;��c?�����tK�=�������)���WzM+bG�0%�
QD��Ba����!��rwOH����nD�9��'	*B�����qk���S�^��Z��+�)���+c,�
��M������8��V�}W���+���=a���P]��E��(�B���0�)�{P���N��4����8 ���������7��U!$���v�,��r���������L,>���1��Xde�� ���Fu���v�${o�����}���'��?][��X2�2N����R+���!��Og�����oK�4���Bm_k���`S?�SLo��b��]�.F��78��lS����I/f?_����
PK]&�h��PKpfpK0gather_analysis_all/gather_robert_patch/15_4.outUX|<
Z|<
Z��Wmo�6��_q�`�oEc.�����i3w/����	��%��:��=J���J�l-�� ���y�w�<��O��_���{����?��wp����w��7��Y^��:��,+�(����HAH�J"4���b�������Q@:s�t�&f��5��$�u��U���7��<O��\MS`���+����hP�x��t���b�L�Fh�dHxDH���h?�B)"��n����������XG���l��"��q*b*�$= 5�W� ��q�:Q��[��kxin������$W����r��67���u�t
�
~����Y�F�m�������^Q
�H��GJAy���G�8�DIN�f�cJtswk�o3[��5<����}d������N��b�������?
���3�_�K�J��U@����r��p�d���2+]�>g�	��S�u��f������P4��]{b��s�i����U+tw�7%��J5z]R*c~4cX���8�k����J&_x����uk)$H[
E��o[���;�u~w1��nI��YY���)An>��4�4o��lo�}/�ea��K�=0���]�}U��!���^�R��^�0$$��Mo��������6��-!}!���U(���G=�k�4	U�����)z�N�zQ�j��/^	�Ci��*�&�r����g�1������7m15���X)��)��Eb�U�%6aM�R���@u�� 8!��+|�X���_����p�@���v�L^B�`k6����s;3�3�O|H����3��,+SO;��m����'�G3��7-�#h�E>8����bZc�b*}y�����b;{�8����6l\XDU��O�P��q�/����7	���������n��%�$V���o����������/PKB�Q��PK�fpK0gather_analysis_all/gather_robert_patch/16_1.outUXX�Z�<
Z��Vmo�6��_�ab��/�\ q��[�dv���@q[�-��6������vjo��l��x<>�=wB��~��@7���ouN���A��*�u�In���$��C�%��E��G;��������heTi��`�v��,�$����N� 4I����8&��S�J�+��n�Q�-^ee���S�E-b�=?�}?��4�l^�)N�{w]�'�C��a�|����,��;����&��4Q���'�}d��d��&L���Y%l�������m���l��YhMK��\�XQGW0d�#}���%1JK�e�;&�O��SY����Cq�����3�p�05*B��8`*��:��� �b'��d�; [�������y���s�I����M��A�Us�0a{���P�Q^�\�C����$[�,G7�0�
���}�t�����0����(^�U
\�	��.�����e�i,��������e@	��������0�a���E��P�h�H�~����M����Y����&�0FI�6�<���
Z���@	4��`L�����=X�S���Q���V ��W�c��&���[���M��D����&�T��?<��R���k�a�u���3���h]O�h�d���"��6�%�g DG�`M"�`Jk�����d�2���}k>[��:>�MV&;&�tF�����	��I��J�Z�k�Zh
�(����V�
��5���t�L@���^���pu����9� ����'��f��4M�����Q@������oV�K�\��b�C*��PW��O}6����37��S�C:����NG��$wd5��)�f|}u���w�w���������t�u���i����GY����q���op]���~����T�K��'�"���&�g%�����Z�t��LAZ�7Jk��������8^��PKN�g�PKNfpK0gather_analysis_all/gather_robert_patch/16_2.outUX<hZ<<
Z��Vmo�F�������%��w��rRB��k�B��TU�CV`��^�.�z���W9hu:Bx�w<�<��};���b��\���a�9��[]���!����-|�`*1n�(�?e�>��n>�C�`��`�\��F��S�q�d��2��z ��
B�8�J2�1�����_�GL��m�*+#�l��E�<r��������-J��ey��$�/���y��b2���pr���c��=%������(W����>rqr2Bs$Y	�������l�R��6���t6K�,p�%F�\	�iA��d{C��-Jb������ [���<E�����P^d��p�0W5�KB����h����F�})���a3����d~����e9�y.7�x��M:(58�c.�6��J�g@��%��BFe������YQd��g�e�G��W�X������n�2N�Q��p�����k�$�L�h&����%%�+_[Yy�a������XC��A#����,��i�\�%���h2
"Gmi�3#
$
cm�hm��_C�j�1�f�����W��8�h����.�M���u���}D���������!�D��?<�����}�������&�(.�]��r"��VIt�H#�
b	VTa���u0����v��i�Z��y������
���+�
�U���eG����c�d�m^��5�e%4eI4-Zh�]V�H
U��fYr(I�UQ����2���T����u� 0�}w�����Xt�3�G�B���g�#�����98�Y�R�1t������|���~x��������P��<H=t::/7��w��n��W��l|7ywq�J��_�4�n���{���=���GYO�G�8�q�78�.��������*�%J.�j�
��	�YI!�S��*����v���R=a�
��O�����PKA�h�PKtfpK0gather_analysis_all/gather_robert_patch/16_3.outUX��Z�<
Z��Vmk�F��_1����,�&i�����]�8��P�R��,��,��U���������s��}�`�J�����33�~������\���c�������2�� ��P*�y�0�����S>d�)|����=���-�l�4C���	���.��%J����u�`�d�C9�BU�������u<�-^mU�_����Z��{�~z�~<����l^�)I����0O��_LG��I�2?~\�����n��o����y�8��'c�?�+�Wf�d�wgk����YR���Yff�5+b\O��(V#.�~�(*��V��q�����$�BNmU5G��^Cyq��xTPND��9����L#�X��q�9swb�<�o��0?a�xy��J��,7�x����Z:�r�!B*��|/:���H�4:��������$[�,����c���>�/�"�_�^b}��s�%	�5\����
�b�6ykL|�|�.aB"Z�%*���FE�ylmUF�$������\c����@�`�O�@o���"��|�i�����&�����DaKS�E����5���������%���8��^�i�b;zm?��5�_�Bw�t�G���������	��/�xZ}��������:4���%�Q��'J(�����)N[Um0K	w\�U���1A��)mmEm7������7X����/p|4*r�,Mv4J�i�������
��I��J�Z�k�Zh�*V6�rP���Ga]��B���JSW���/-q�L�Y,����������l�'m������`���N��fu�T�A���+��HFT;��umk�4�	~z�g��G�P(i0������M��j�C������5�M���.�W����C
����t����/���3���s�����P�ff�?����
j�%J!)�-l��	�YE!)a8��y.>���!K�F;����m�����t�PK�6R��PK�fpK0gather_analysis_all/gather_robert_patch/16_4.outUXW�Z�<
Z��Vmo�6��_�abAR_��@�dm����C1���`[R%jm6��}G�91����;�I���<���};���b��\���a�9��[]F��"��&�p��$��.��O���O���8�P7��"\"���jI%����qy�L��^�:y��$���
�1Qu��T�����|t[����?����Z��{�~r�~4�Ei���HS�����0O���/&�^���Y?�w�)5�M�i6�\;O��&���M��%�+�J2�g�G��,)���,3����!}����_���#F1��`�$�Li]&'[����E�����P^T�I|��/jT���0M���6Rb�*vb�,������h�xYG�p����d�
��d
��8�
c�+�&l/:����1����7�����d���f��@|���aO�_]����9�)��5\��e>������4l|�|�d�Z(�d~�e@�_�
^[Y��a���������h�H�~��'��0�K�D�M�a���"
x���4��h
�����S��{��hP���5Q>�
�v��~���d�~t}��s��<�I��4����w��	4���O�w��b�o�i�
���3���h�O��ViT��,�uc�LHs�q0����v��i�Z��z��5�-��
��&+�
�U����G����s��_k�_��5�e/4�	T���l���@A_��B���N��������yVL������/<	-����.g��
�n����x�;_�� ���)�*�������&��t|���}
ByH��0�����\���F`�9���������n�����V��:�`���>���_��x�=�z~����O�� ����������Ae�D�s(c�<h�xVR%��i��;����6���o���D����/����PK$���PKkfpK0gather_analysis_all/gather_robert_patch/17_1.outUXK�Zr<
Z��U�n1}�+F�C�R,_w��T"mz�Q������`�e���$��;f/@ 	�=t^����9��c����^|��A�������-����GY�z�����MH������1���n��m3r���g��$��pB���7%Y6��Ne:~	�&NM����Lr;1����l�9�����qS�o�
��mEXV���������A�I�i[��,��y��IS;��|(c`�h�7�387���>
�
2�:|JC������Yok�v�������10���B�9#ZT8� �*�U�*(����A�D!�m��	0e|���,q6�B{�R����\�%� �S;;#I���j��;SL�����Rf�#���������q�V�pp����DT����WY:~����q��$�M��~���I!K����S����	��*uM�cJ�gED+�;L�h��=^�&#���x��Q�{5MM����kd8�RgP���}��������:���.�,���Z68����tuV�����p�q���E�4�JI�gx��� �������*~Y������]�i)��/�����	y�K�R���:�����5�]���k��7Q���Pq�<T�@���"!���Ao�l1��v��{}~����y�������(I�HT��E�@�B0�s�����P���ts���|���]�sQ������A����|�E�A���[���v��H��
TOQm�"���-QVv����5n��/L�b���-�px�f��*\�t:��C���.V�����w�����h���-\�2W��3��h�n���wZ��PKH�!�9"PKrfpK0gather_analysis_all/gather_robert_patch/17_2.outUX��Z�<
Z��U�n�0��+��v��$�F]�i�5
�E�����-D�\�n�~}���8vw;t."$j����G�?�����������it�$�'�;�K?��j&��f-)��r��{2����=���_�|2w&��(�Y7?�y��g��2=}�*�l��p0�N7������Fq��x�k�g��.o�
Ue�w�Z(F���q�9�S���yq���R�en�y�������[�3[����P� S��C
�':d������
�SWz7����`���5.���D*���]$�D�q�3��7�b��*��I�m�zW���@��bWnH��%��I���7[ ����*�-z��Y0JiCO����O��|M/6E��o����U�F�"�&�1�9n�bc�4u)\��p1���
s�<��`�a��=B�i(�bL�����x�b�����8���ea�	�||�X����\�g�����%
2����B�}>������f��u�0��y�
�r��@2���4]�U�8B�X<�]W��8�����t_�0��Z7O��}����e=Z���/�e���J|c=��a�w��O���>pa��WG��n��z��D}5�F)�c ��	��K4P��0L�A��|���n��;>��,P�9@M�����I��O���12�!���+F��H���d��Q}���l��+��2���L�G-�@~T.�--�;qst��_8�K����U���H����BY�]��N��I:����O�
���P��	�o��V�fO�w�q�e�b}���.8c$��b^v�����>A��/����].���:��PKe�y~7"PK�fpK0gather_analysis_all/gather_robert_patch/17_3.outUX��Z�<
Z��U�n�@}�+F�C�RV{�eQ�D���F)QU�	m`�����~}gY��@�4�����9gw�<K|�v:�g�������4Zp������qNc&#"!u.)�����w2����h��]�l27=&9U��P�����<_���B��_�K2�&�'��L�5��9����k��Ax����mC��M[�
����ADx�6my�S�����y13E	��23���k��^f���%���.l�X�i� ���!
E�BJ")�L�ok����97�5c8��
)�A]<��%U�xI�AA=�@�X(�g�|��Oy��1K�)��^�Td1t��YA�$3�5s��/���~�����EOx��PZ��+vR��#%/t������8���5�7y6�����q��NS����	�#�A��+Q3��9�'��F��)�bL��������G��������i��������h������ �Q�Y�*n��A���/O7��;@�00����U�����0M�{�!y���<���."��a�$�wx����������:-��r43�D�aJ
*����;��|A��
��zb������Z����QTr7���3r�C�ap�D��;
B�$��i�x_��|6����y\1G���o���'&Q��8�&�)�d�B0�3W4rGH2)p����h|���ll�=�e�dH����#?,����9^�d�u��_8��E(aQ�6I����z����Ow��M�����lbW�
��/A����
�k:�C��.��h�����t�#a��l���-m�2W_$�����6��q���PK�X��7"PKWfpK0gather_analysis_all/gather_robert_patch/17_4.outUX��ZM<
Z��U�n�@}�+F�C�RW{�wQ�D���Fi�(���
���K����6�@z�C���=��sv��I�����W8=��<M}h��O���4��a^��T3ai��B��({~$#?�	�����M�'S�cR�'2������S��3���P����&�M����q�����\�j.v5�"@xg���m#b�u[W��mE,t���mU�V�h[�e^\�����f�uA��ql��p�3%08��O�.
�[d�s����`*Q�o��nj���������c�
����Q���P^�4
������h��k*j$Q���E�0U|���,���B{�R�� �\�� M2�x7%i���j��{[N��w��Jf�#�6�t�N������Q%��jz�y/vU,���!��'�5S,��.�s�
��6�<�P�aN�'r'L�zF�4
�;�)La>�Ds�'L�X]g{6�*l6B����S�����pb��O�� �a�y�*!���
����u��}�0��i��j�����L�����Z
��<�IX����ED�5J�]�CQJ��9�>E����������2�HA%>�}�0����X�Ei��\-�������E%7��o��0q,%���0�#\��S�
�����4��]��g��-�;��������x���>�'�����0f���j�`bk��{(�f���Q����l�n*��2����n+������f$V�UL���X�}�ll�`)a��5 )2{\�
emw��;��M�����|��
�7
^��>nonr:�}��>��h��9��;F"��l����}�2�o���%B/_��\��i�~PK5�%�5"PK�fpK0gather_analysis_all/gather_robert_patch/18_1.outUXY�Z�<
Z��V�n�@}�W�[S	V{�D
R�R�V�g���������g|oS��}`^��=;����Y����|=<��N�>?c�&/���c���i�a�8��l���bjQ��*����/�j��>�Fq(��vs+�X
�q�TbpJ��V����+���|K@N�fX��W)�y(&�cJ:��:�Gw3CY�ty��E�B�\�>v�������z��������*[��U���������v�����u��}T���sw
����b�+��K0g;f[J;����03#�;������E��.���Z^��M��h��"���-�1��hE�! ��k�FX��BK!(���,��������X<�E�ZQ��Ja��e�k�a��X$c5j1$�aEwGT�c� p.��( ��4��+�8@�w>�����a�����(-��Z� 4���5@������������^P�l��m��'��34-����NWe�n��+����������wy
��)����7}0f��f��S��(
+�w9PV(:� �g����*�YW�d#�0�^@����a8��#��1���S���d�t�Di������Ge���A%�I�TG�VY�i����`�j�5�F�]2
�N�MA�TZ��j�]q%$�Z�|�b���B�"J��3w���(EY�G��S�`a8h���*T�` 7�P�!
8��X3�1��*�������,���]7H�����|Y\����WB��zZ�lshDhV���>m+���3��{��P��a���x�l���v���=W���7���XNlM����C���������L�{���z�+��w�"���,VX��+��o2+�I�w$z�br;���4�O��*UTh��K���^��X4o��pFTo�����O&PKF�����PK�fpK0gather_analysis_all/gather_robert_patch/18_2.outUXS�Z�<
Z��V]o�6}���[`%��a��.���E�`(�dh2��D��d�~��d%��8��0���xy�9��R������g�����/?0	B����1C'~�B��j��qf
1�
���bjQ]}o������f���gyh��uK+�2X�q�J"�����`v4���C���I�)�+�b%�B��r�!��t:g]����U���
�V�
YqU���_��?�4��&�,����.l��]�>�l:����aW�.WG��f����E�.`��0TSlu$L��sv ai�&l ���	33A8�Ou�^%�y��j�j�������3�G������F'�b�a��`���s�&X��B+A6�t,����g`��X<�E�NQ��JaF�2�5��I,Z%E%�T��=QT�B!1c�8�$zq(������/�����m�!�cg06J%���$@AtP�@��o�t>�����Q/>6>�"�z��M�����������z��p*�,eG���<v���P��nw���a4;t�X�OY��z����P�����Db1tMI��l�-���Qz)�
\�����85�1�_\�4�V {���"+K}N�.�$k�|��
�H�Y<P����|�LY�e��=um)5OH��y��1�������R\	!h��|�bi�WF��"+
W�sw����DU��p'���p:��p�$\��`4�QJ��P�'
	�A
�X
A�,t��e��+��&!m_^����
D����.�L���Cd��@�:���ai�pf�X��*�����C���=���.����HX�Hc�����	�V��k����������=q�>���7��R;��:r�+TKam����0�<@I���D�Q,�V��7
��u�H"f] ��2
m�:�qy<�Ho�a��W��o�,��h6�PK/�.���PKufpK0gather_analysis_all/gather_robert_patch/18_3.outUX��Z�<
Z��V�n�8}�W���%x�p���v�i������DLD���&��uM%q�����iI�9��p��>�����:9z��7Ah���o3t��> 4O�*,g�����][L-*����~�u�,5=@�$
u����ni�
^����)+�KX�@��o:+�rJ4�����H��S1������5Q�q7T�kWV�X�"$�e�S��<<{����Y���v��]����}zQ5<���(aW�.��_G�%����y��a��0TSlu$L��s�'a����'L��33A���eQ_����
���8]�����e�v�#�O�
n�� DO�(�
��I!�VM�b���Q,�lXh������!Ob�����'���5��������������QT�*A���M�l7�$0e�y�FX|�d���.�!;��Q:�C�	�f �-@��x�{�������+��k�!������W�v~g��:��7�
z��|�X�5��
���S8��B�����3���O���$l\��@YA��LA$�e(���T	8]���,0�KN0��J�����S���g�����^V�$K��A�/�?J�<��`�(�IRT'�VY�k����`�i����<"�h�
3c���t�)�L�N1�L�Z��
yB�$�\���:K���V�^�0�N�����j��4��HN:��RC�"�']kY"�����n���������"���7�������v�@4~	?��]�3�w�Zp���r>N<��N�����~�t�#a�#
EwO��6_��K��:�{`:����#��=��^
J
����1�0���
]I?R|T7D�	��$z�bv;���4�O��"�u���e%�V3tx��:x`���y
7,����s���f�_PK�kMz�PKyfpK0gather_analysis_all/gather_robert_patch/18_4.outUXQ�Z�<
Z��V�n�8}�W���%x�p���v�i������DLD���&��uM%q�����iJ�9��pF�>�����:9z��7Ah���o3t��> 4O�*,g�����][L-*����~�u�,5=@�$
u����ni9�[
^����)+�KX�@��o:+�rJ4�����H��S1������5Q�q7T�kWV�X�"$�e�S��<<{����Y���v��]����}zQ5<���(�T�.��oG�%����y�����0TS	S",���K�`�uOXs�ff�pk���ly�u��+q���w���v�#�O�
n�� DO�(�
��I!�VM�b���Vil(iX0������'�h�(
zR)�X�Lq�{,z����B��iE�G��J!1c�8�$i�Z�$���'������A��`l�JB�	�f��el�K�������]Q�=X���W.�����;']�I|�9@o�+N���"��
�to�����%����f�,��l�sbN>&a��>�
sbI� l'��Jh��-��#�f	G��U,���S������h�k��:��<w�������:O7/8 �t������U���)�
f�f\[J�c�A�$���FbJ�������G>S����B��"�2��3w���$GE>D��W�`}U
N������j��4��HN����cI{'N����D��kw�"�+��w�eu	"D�����<���t�����Cq����U83{g�����.���C���=������@��	i�_8���P���<�w������zd�������A�A��9�f����]I?R|T7D�	��$z�bv;���i����E1*4o�(m�:�vi<�h�X%�����t�t�`6�PK�2��z�PK�fpK0gather_analysis_all/gather_robert_patch/19_1.outUXV�Z�<
Z��V�n�@}�+F�������H$!�
�%�*���bl�^��U�����jjZ��!���a�����xU��}��N�����G���8���t��q���G�nVul!�"�d��STW�������(���.p6�u�uQd"'�����@�W
�PN���g����x������Z
�eA�!�[6��p��������`"$�z\nd����hA��=��j�����3��d�H�=;�L�����ha��8g^!�(:BI�>/c��%�.�8AG���YCj�m�j&��&���eId�Lfm9�kS�]��0��0pl|1�%�2�ILD)��b������"��������x/Fq\1����r0������T���G��>����t.�7�R���gb�IQ0t|�������;(��7p�4�'2;��'���u�]��?W}�RY�6'���8��CuO7�K8o��<�����N2���4���a����Jg!���H�Y�,w�Fe�e���O��W!����r���&��g���X���yc`���+�a�N�R��hR�����$�������bKTb���4+1��G�1��VSv�������P�����a4a�������IQ�@�����K��x8s�T�e�����C��fp����Dz��g�E>g%STx�)
���uQ �^RV+n��y��F�%��w��ZQYD<�|�����U4�5V����U������9l�����X����lwk�g�i.���Q�8�s#�NT
T���4*@��:3�����l������J��_)~PKM���"MPK�fpK0gather_analysis_all/gather_robert_patch/19_2.outUXV�Z�<
Z��V�n�@}�+F����o�*�HB
!�KTU�1+X����n�V�����n-U��F���3���8�����[��������8
��8�1^E�M'���0a5�2)BY�C>&5���k�V��������	�9XG�b �%�r?#�_)pC9=xG���	�G���\F��g�z��t����`��L�w/��v���,�9`2>��=���n�a�mm7
�qn���ucF9��y��������a#�����o����!	#Ch�����%�u�Y�4,3�V�����@���g��;�W���>��O����D|4>��VL��P@3T{6e����Q`�B6��Q8��#���H\�r9$B�g5P+P�\�%�v��jP:��������qV
�0`.
�.k|�����5\�������o>��[�����f��|�/_+���$������p���p����z�N�)	�	�,|^��*[y`��vs6�I<�f!���P�0*{f�zd~�B�{��������g+�'��4����i��f(#���� I�H�v��� !�Y}|5��X�O��p�,���1{�%��0�VS�����3���P4[�-���dL�I8$�S��z'E��F����h��8��P�e�����A��F��Y��@zH��Q�,����+�tBb��w��s/�����P��M4�sM����\QY���n��u�*��
�M���*�w�wu��v���-�X��mZ�������\~_��Mq�q#�NTT�pk�$h</e�������������*?<*��/PKZ�P!MPKufpK0gather_analysis_all/gather_robert_patch/19_3.outUXO�Z�<
Z��V�n�@}�+F���6^�:I(B!��z��
9f����&i��,�p3���}�����3�h���h�o��it�7���c+��hG{V��2Pv��[���5IV%)�������#��UU+P�>�]�lJ-��$������������rz��y���Sh��!��)��rdl�e�	
����Xr[{L�0�p#9`�}��{F�sm���:h�":��s&�!�w�9C������.���^�T�g�_d���.�8AStY"F�\�\I7=�V��I��lH������+]�]��0�0pl|l�����@d��%3���)���!�OjxA��$�����a�;�r��\��X�3�
4��x�gw�������p��T������`�����{i��-�K?��k�h�����|��^��*��Z��Y�/_+Y�9��<��CuO��K8o�p��IO`'���2
E�����xtZi/��i���Bx;{�`3*{��}:�����=g�5����������S����yc`�����0N�C���hbi���I����W%�*�$�_U�C�e5u��G�1��C�)�sV���-<)H~k��a4a�������I����N|�yg[q���?6���B���t��C =�l�3�,�����lJC�dY��{IY�H�p-�Bv%Q��$��+Iu9������U"���RqP/WXmJNnU9��������V��b����������s�u(�U�����;!P��")����'��8C�����!U��	��}�^�B�PK4��MPKqfpK0gather_analysis_all/gather_robert_patch/19_4.outUX�Z}<
Z��V�n�@}�+F����o��#��"B�Q/QU!��`c;��$�����6Wh���00Z�=gf��U��M�z�F�39�?���1�q�W%��	eU/L���d"IA(�T���������V�T]�M]��212U�,���0��{��
���
\�~'��b2r9��!�"�4p�ec/�[s0)������L6�r\���0�'q=�
2������No\X"x���Q�bs^���Y�K;�%!]�D�du^x���.IB����F��"%���g��:��,6eXKY�����)��O|���{na����;��n�����h��CZ#��k|�XS��i�(,M��Q���H\�j5$B�g6�5ht/�?���`6T�E���z�.��n�E/�K�K��B�������F�Y>����4�5.����U�9���������6`~t�@eO��K8o�8�,�����2��e���������Y�,���l"�9����2.p�$���~�<_��]�%�v0$O�.�	
F@�O�7C��N�4��h���	i�����
�*��S2}�b�t������Ep���3V���O[d(�~K�� �h�bk��sR�/�4���~�	����lv�����k:����<���
e��9���"�NHL��������\��p���D�;/Q�]�����A�S��i�����f��j�r
�*���]`>�-���%+^���mo�O��0��u�1��gn��	�����*�$)A��xSF����X���U�YU�g�U+�~PK[�Cv!MPKdfpK/gather_analysis_all/gather_robert_patch/1_1.outUX2hZd<
Z���]k�0���+�]X�$K�bt�v�td	���h���v*���_?��i�&
�;|�ut����o_��-��|v�����&�{x0S��������1!��#4�S�VS�je�)��1�Djk���\NC�R� ��5�,��9��8&�U!���pa�z{�^�V����u��
`;�%]�����K@_ikS��b���iU��
[W����ei�(�y){E#����x��[cm���>���6+W	��*�T��K���������5
����4O ��h�������E~_��RvqU��4��(
�����f�.����j.�UN�'Qe/y
?bl���5�;�5i��#�	��r�^.�����7�LEeM��2�0����0m���C�cS:��gx���n�]
�����$�"��\i+M#}]ej�j�~
'$��	v��I�N����������[�V
���<���i��3�/��w�y��{��T�nM� b����~����
�������������=�PK4N��jPK�fpK/gather_analysis_all/gather_robert_patch/1_2.outUX�
Z�<
Z���]k�0���+�]X�d[�������#KcWEs�XD�RY^����v�t]����>�::�s��������7���?�����f��`���NI���a��<�`�]5%p'W6����g��
�,��%$
K��i_������s�����%W�A�����|�6b��8. @q��^@�s�h;�Gq���1��������{S�������KmN��������R��H�s������`��j����*��Zf��-
m�I��Z� ^q��d��0>@b:2�Nz'��(e_��S�\��N{xz���e����@���������9��G�m���7��=�4��3��#�����\\)�`)na��t	M�����0������o����$�~��=}���|���KP��UAH0a� �..����0R7U.��f�o�pF�$�`�B�8m�Y�6�z�u��uk���]��N#,���|����qQ�u����=4�bA��|�Kd��nx���F)�|�|��{�oPK���ujPKifpK/gather_analysis_all/gather_robert_patch/1_3.outUXI�Zm<
Z���_o�0�������X�`c�2���NZZe��iO�G�`�@j������[��I���a�����������0��������/f*W`�����Q��!A#4�9S�WS���fS�1�Djk���\N�q�B!4,����r��c��1yp�
����+S�������������;��(����t�N���!}k��M�]����U!++l]��7���9�c�8��1�3����o�]�cO��s-mV���U��Z��2/�;)���O{�+a3i��bg��!}w�������]})�F�
�Z�t����f�.����j.�UN�Qe�e
�1�kFO��QDX��,�>�aOH�S�r	������e*
(h������ve>��Ik�������>�sM��=S�T���+}���$��.���$0��U������S8���O��B�8i��$if;�|�t����U�e!O��pa��C���������R���*���	`���s���Lk��y��&�!bq�|�]v���PKy�jPK}fpK/gather_analysis_all/gather_robert_patch/1_4.outUXS�Z�<
Z���_k�0���)�[X�$��l�AmKG�0����h��l��������%]��a�aC��{~W��V��\,��|v�����&�{y0����LWv��cJ����f��������SDc���\����FID�Oc���24J���{n�w���J>�2�����F���
>��a������9]�����OA�akS�P|���S��������Km��(��v���1`t�%���j��l���Z�\�R��e��Z���������O5
���0@�O��'B���X/B)����F�
����p���}3^�Y~dc5��Jg������1��(<!k8�Y��,i��=ax����w��+%,�-,3^�.��K+����E4��&��=$"	!_���H������b����������
��H�T��������N6A���P�>gi��v����r��m�t)N��pa��S�����w�X�^�)Y��CS@��n��DV[����iB��Y�����{�oPK����jPKzfpK0gather_analysis_all/gather_robert_patch/20_1.outUXR�Z�<
Z��XYo�F~�����RQ/�>�:@����������H�0E�$��y�o�,�%[R���H�j�3�7������z�.�_^�����d���EX 4�&yq�9�BJl��5��O�E��Y1?�t����X*��;c�	����Z���(IR�
���]'�7R�[uBoRWI��w�0F�2M��e8�����-y��y2�"IO/��i^���-��f��U���r�����9�D�n���0n�1#
�S��F��&��3**hBbaM�W�l@WI���0*\6F�Ny�������[������
��+��P}xX�U3�����?��D��`^U�`V��!��.��L��XL�j����[���]O�%1��"L��B0;�X}�D��`��0��#�qN������/�<�?��y�=��:��nLl��k<s��e�7(��OoO*��n �����,���������<�w�G%��?%���bmV��5�����<�wPQ�����`�O'C���.�FA�
�5,�TcN� ����G�Dd�
�y��&�}���*���`���P&v���$�uY�.� ���%BVW��x:v������V#�����h�e��A�2�P�_�e��5X�$�N�������(��Q�I�J���a�m4Li�
��aZ�4���J*��O�������*m}s���$�\������u��l)g�@�{p���7���g%m]�z�.M��[�����j����]�
uD�;�.����v7��;�ewn������{3���2��(r�J�����}QVa.�[�q��%��������E��G�������t���7�����A�ok��RN�>9���qi!��>��&�5�Vjk���L�^�S�c��N
�80�t?J��D�I-�b�,��77���7ZK���{YE]+Z9�M�����*_�������%�����Au�U\�s�Z
il���J>(4\��������q\��������������4���tQvP�:�o�I��w���Npd}��pd}�nqd}��q,}>^��"H�/p�~l(M 6y#�a�a25T����lVik�?IAI0��6� ��7H�J�����f<B��t���8"Rk�)���������������r��V!B���w*@��"���eQbC���������WQ�w4�w���)���;���`���������2@@�0�k���d��BI�24���HN�d���r��w���j>I�7���pY�iN�����Y����b�^�P�����U����CTE_��K���wf�||���x�1^Bq���d��Q�\;���A\��v���m�"T�z� ��q(M-�zs�����V��/L3��PVS�h0�PK���XelPKzfpK0gather_analysis_all/gather_robert_patch/20_2.outUX3�Z�<
Z��X[o�6~����{X^���h����A�l�d�6�%E�����;�J���N�am��L���;W���?�zs�]�����!48���:��Q��p�����B`�0n�Q�|��(������#4��*\�"Z�3��dS-1n��E�$I�w4O_ t�d�Ho�����Q
��>�Q�J�Ed3�O�pi;S��b����H��4�a������$��L��j�8(6/���Htm��-���BsA1�-lt6!�
0�C�3��.	����R������
[8���}@gh�6*�/���]�@������]�ZK�UwU��
?�XU
��4p�N0���kl���������]O�%1��"J�F[����l��D��n�_��Kw�O��}���x�!������]���X������{��<�oP4�����x�gn �~�n!
V�g����q%��;������$�9_���N��q�����p���(\D��z8��p�D,�T!+TvPH�9��
sS;�	t
�!z�������o���P��3,��^Z(���A�bTS����Ivk�].�8�@�Y�}���|��-�hU�":��B�����O�V-�
<����K&*��Q�-'������R)��R�	��
YPS+&�F�`-(F
$e���b���^����0+p:q?�L[��{�n�d�K���������lIgTB�QPA�zXe-�:�M]���]�{������l��}�wuD������;�gwA��;�k������������2����.���s�����4�`������V@�$��N���=L�=�{_�/_�����N��'����+C9���������}�3L0,*
�
$UL�^���M`���xB���q��!��T�U�m����77���7j6d��8D[�;VB��Q�7���R_������
m�S���v��5qY<��(J�@hS��zg�q��F�$P���1<�?��~<RZ����f��.�
�^G��	���Y�=�U�#����#��U�#����c�s��**�a�~�����R;bWTk�,q�EB|A�Y���Q�����;XC��h$�	*���A�����{z[��C��y��\�|GDjLpJ(|����#���5jO���%�,D����JH�"\���*Jl�s����.��H�Ap�����P���[�m�`����Z�����@Pph���hf��r (����������I�?d��w.��l>I���EC����>������,c�b:&Ts)�fw��4���Kw\j�-�����c-��h�$�h�^�?@�����%�n�EsG���*��?�;9n����Z����NW��V��~*!���P�k��`�PK�YblPKOfpK0gather_analysis_all/gather_robert_patch/20_3.outUX�Z=<
Z��Xmo�6��_�/C�a!��b,���^� K6�d�6�%E�����w%[���q�a��L������H��?�����{ty���_P�P���7z�<��%B�qZ�g�s+���a�zFy��8��S<)�g�P?��h��x��#�*,�x��\4K��=`��B�i�m�i�R��6uaT
wCT,�l��$���)�\9M'CT�����(+*S��y��j&o_��x(�(���Jt��1�-��3�pI1�-l�	l���":@#
[m����.�J����t��[8���{@gh'..����)W�]�������T��D
��j��?���$������N0�k,�n����W����G	J�De�&+���`]tu�0��;��c�wR�p|�����?O���x����$����n��&w�.�8�A��~�x{���������?����,-(������Z���F���4��X��
A�q�z���q����8�����v"����#��S��QP��Ba��T�w�VRV������Q9]���i�0@X	4

q��!��F
�������t9����;"d}�y�H��'�����V��jb�5Z�����RC�*��o��+&k��Q�� p���L��Q�{��P7T�AP�&�V���`�ZL	Y��3#�VGx�%�F���i��{�,��re|�bx� ��C��#�Qc � �Q�w�M��������C=@JSc���v��ZE��u�B����>P:�G�TJg{H��(��%�<W����(�f37[��>0��/#�*(#O�m]q���_�a,Kt?�j�`8,�}��|A'\|��;	o����kK9����[�Zj�s��	�e�����&M'�qHu�q1��Q����X&���VU1O��������[��R30��gY����[*���aT����M�?li�Q���-:�1�gq�kTZK!���`�3��(�nMr�L�h�0P���pa�r��Hiij��4�'�t^UP�:.n!I�E����cW%8��Nm8��N�8��N�8�>�^��<��/p�^�&�}�
$4, W(�_�.S���
���d��0K!�0�W�J2e���
��A�=�m��>��i�M|�|GDP.N	�?(=����x���~�f�02��|�$E�3@	Y�6�9M������5K�A������P���[�m�`������x���Fph����f�y��Dc���5OGrrk'�����������(�o�+�eu
"��}	�d�{�T&�l\�P8y���]�/+?D��'.��V��
�Q�1�gw�4���[<F/� }������t���2������jI/���<�C�|+��E��w�Eu�U��(��d���p�5���PKRC?dlPKPfpK0gather_analysis_all/gather_robert_patch/20_4.outUX;�Z@<
Z��X[o�6~����{X�/�R�]���Y�a����D,��In�>���P[���N�am��L���;W���?�zw�]�����!�;���:�a�Pg���
)�a��Q��(�N�������"��<��3���k	�V���Y'�;�������o$�4��M��Q��=Q�H�Y�R���`�ZS.\>�'C����%�� �
U/�<Na5�woVK<�K��n��A$�q���Q��i����6��d�[��.��M�W�l@W�B���,w��8���{DghF.������]�A�������]5��rKT{U��
?����J�2�!��.[�	&�c,�����v�+�7�������<���\�h�.�����T���;�|G�;�'�����d8����A���6��nLl��k4q%�EF�(�<�jk�J�#on �~�n�,��c�����������(��)�&�+�J',�q�z�������0��_�z8��p2�H���P
�Z(PP
>�+VU�������A>]��Iiw(���`)
�r�!t�b�P&����8�si��fA9 u����>�x�s������V#�����h��$�!��\`NE��+&+��I�
'p��k��B)�=�3&�n(YI���I�Q1%�WL���K�4?@�r��U�i�����d�����$�\���,�]G�-��*
��nT1(?FY}0�r�
�:�g�R���E8�[/�F�f}�g��������}v������t��|�����{/�
�`6s����c��2������m��C��D�;m9�%��5z0��!G_����.�u���o�����-�����P�A
������Jm
�5����8���k����-1�rH*n��>�����������R3�B����Zqj��7����e��&�6������
:�1�gqQu Tk)���{(� �wk�g`�<�q���Q�UN?)
Iu��HsiH�EEo��n��\��,��*���uj���u����u������y�� A���{YPjG���8SL���Pm�)qgTA�p��,dC��%���L�oT��v�v�����14
��O����H����J�;@�/�����Y���__�I��P�Ay�
}�#7x���of14��@��j!������n�������+�k
5�T��f&-+C�+0����#�k'����7������(�n�+�}
&�Sm���6��2��0!�uaB
Wjov��K��Q%}��K���7z�l|
��}�
b�����+��@��l1�7$�/�(a�`a�ZE�+���@��!�C�dh����7^Z�?�RZ8	Z�w_�=����PK�jQnZlPKbfpK0gather_analysis_all/gather_robert_patch/21_1.outUX`<
Z`<
Z��W�n�8}�W���bC�*���@����M�4AQ����D,X�\In����.�.q�8����%����sf8D�M������������Aht����a��Q�d��f��P���(M~e�~��|>���A��#��K_�1��)e��%�
��G���{�n��0���S��+�p��};��8?���}��=�e��*
m��i�/mg�g��������
������?�e��v\/��K\n��zuzw��;?��,	6��%y6K.
f�Ys%�%�D�4K+"(����6�>�(�s;J�=%Un���`i*��fT����3�>B��u,�|eD��n�W6��]BL�4�C�w�u�T)�o�����q,�[�45��fx�
������ �:rAP���B�n�*
�
�������j�e��ta�]G~[X_s���`>pQ��[�l3��@;���9�������6^��AiE(�(�)t���h�������(����[�@8�T�����v�������m ��#�Q%�+��������QX������M���a������B9v��y�A�@u��a��'H����5����H&{FWZ�y��
:����^�q�k?���F���6�c��(
c�v	2��!p4T(�:K&����]�`�+(r.��u��Tq
m�5�����4��
W���g �i�,a^p����[�?P�$l@��Fy�^-���DH8v�����F�j(>{'�k�<[�g�5^��%����B_3�����z��5�G��b���XK'.��X��6�fM�~&q��)�;Ul�Y��P�����I�;��xszu~�n<��
�~z G��a�:h���,���
}��uy��:
�:�+D�"(zo��
U,�N�C'��Q��8�~y �NgQ���{H{�a 9��
>��g�~CH8����T)M]��3�K=v���
����7���2����A��h���J���������-H�m���������'����0ja�L��J(��~�k�M'���r?_gECz]/��y^4��Xrb��-���5HS�l���S�����ms�	������� �dk�b��h,���x{`P^�`�;��*u����Dtz��y�u���	��U�h�86�2��{�����E�	�>���
����?PK��c���PKnfpK0gather_analysis_all/gather_robert_patch/21_2.outUX\�Zx<
Z��X�n�8}�W���bC�"���.���vw� MP�d�2�%W����w(J�.q�8��X�E��9���zS�����_t}yz��~��_l�.�eT t�y1��i�(&��3��_���_���O�w����X1*���p�����`N����B��tO�#���=B�ivw�v'�����?�al�������c�����w��W�82��I�4�9�M1OgcT���+47�*/c�l�i�q�8�N���������]f�����$X�F���,���)�%��7Ej�����2��������hFi��Q2��(�(���V�I���3�.3���)?�Q�h������t��2yaf�bB�I���(����+y#w�\�����O�����7s����~����Ax����
����S���������d9���$1�7|�e�N���I]&n���De1��b�a!L`��{�I���	1ge(��0T���xj��0���(����%o t*�	Ih��Lh:t�.x�8]����XTR��(�j�z���3E6L���/�m�� �w�����}����r�5T=A�G@�P�7U�Z����2��4�Yzt�I����.
�A���A�?�m$(MP%&*�dT�C�(h�"��.K
&����]�`���A��������+([���V����&Z3{����a��~�_�~wC��@���E�o�U�����O��"p�V�-�������I�4����9�G5�aP@��!��.�5��	�v�_C!x�8�p���@0��=�	(��m��=�E�&���;]`�=��(�/������H������7�W������pdr�k3���f�>��e�B��{�@^���B��
��
Q}��)P�D]���
�$\���6�V(�P�g�l�y �%Ngq
���{H{��!9��
>�Jf�~CH4����T)Mm��3�a�g��})\��fT~�7�\o������:��;�f3h7��a
���$�
BA�������A+4�v����	�����AQ��x��7��Z7/�b���u�hW�yQ;�c���nt�@��� 
����T>��tx���=�	������9�q:��}1�m�D�+����7��y�T�]>�&������[�OO���Z�-@c��&Z�#tqo�u�ZT�(.4V.|{T�}<�PKl�'���PK�fpK0gather_analysis_all/gather_robert_patch/21_3.outUXY�Z�<
Z��XMo�8��W��d�!H�Ec] I����F��(�d�2�%W�������lI��q.;Q���yo�h��T�|����\���VB�����u��
B'QV��G���bB1��Q��*F���f6��):	#�
d��yQ�������BJ�l	���v���,?�:�V����9�
����R�*5'����Ww���b�\&��q1I�������,����gc4�������,���`~��R��g���������n{I�
^�����5������"@���k��v�s{��hZY��V2�I+E ���bFDee�(�if��l}���8���PR�&#]o��0z���&t�����i�>��c�5|�/��	�6p��$�r����[FP�2�R����u	PX
@�r�<������x����]�-��:/�M���y�=��*�f=7u����6
��E�� �Af���c2P~����k��81:���0�L����G��{���{�#o�tQ�����V!�P��I]���yP*0VP��BU��a��]����a���W�X�8���mc����u�0���}���Pu�<�
�!����J��h�Q�R�'Z���w������h�,C�r�E�&��$�	��t�)�R�����^@��p��a8t^*!D�
�2{<�D={dMP������d��\G:^�i�� �'Q��y�|}������A�
�z�mYG^��_�}:�m����oP���c�[��5h^���6�G�dt(��J������"!�]��/Oh�����������������HCgi;��v���x�+{<��=�B'�H�H�������������2�����J����)%��Y�K�	��5�u��D�����v�[b3���R�R��!K�y]����W��������t�d0[�~�G$�9���_�T?�	�������I�h�G���t��b�`(��7�����m�)�j`�*��e����m���{��$lA����Am��@�8���z��Q�lbMsP��~?��u%��-LhV�-H��E�*�^��Ke0��������������]��������9����K��9�q���C1�����_�������7R�q������JM�nQ=4=���ZB^#��9Zt����-�7��(S�����
����?PK7�`(��PKZfpK0gather_analysis_all/gather_robert_patch/21_4.outUXD�ZS<
Z��X�n�8}�W���bC�"�b�$i���4H�>�L��e���6����-�q��^v^$�"g����QzS�����_ty~|��~�_l���eb:���N8aZ2�	�x{���W9��W2��	�A�u�"�,��i�,����J�KAi����p��G�:/^�]������[��s7vN��=���}8�>����*ML��i-Mo�gc��l�l�:�@s��*��f��W������E�^���&���%�Z��$�f�t��A%X��/� �M�[��vor�xw�hG���Q2qo��&U�@c d�b4�f��Gb����$^t��RmF��^���:���qf�w�dM�TJ(\��]>�{@�]G�
!q��oULd���AEupv�X?�U�Q����5�s(v�i�g��ba�]�Q�8��<Zg�|��>�f�e���r��0���c\J�e�L*-&�x�B�����)���U����?���n�oy��+���������BM���o��XF�;�5b�(�����`!�F�
q�j�fJ�?�m|��r�#�c�A�����f!U�2W�C!�E��D�gF����4�f���h����r�G�2*�45)��tG�3�&�I�YB5�h
U�1��g��B�
�3k@���U�6�CC*��z	8���?-Ll����~�?��%�����D��5$l������j\ESt��#�������z�������i_���:^[BzT3x�,�%<��u��etc�(���5�{�A@_c���V�A8�M������}&��&y�=�)�?�{���l��kE�ms�j$ux��������x�}�����;�;msP�$v��'�^7�72���P���
Q�Z�<x�
�a1pJ�E��n0t������l��W�Z�t��0[�����'0U��	o->��f�vCH2��6��)M]����vv�/Uz���7�|o�������L��37�U�����P8A��q� ^9�n���.'�Yu���Uh�j�h����F��}K�uY	���h��<i�a�s3�l��L?�E�����9�������7���O���!�0��� �|k���w1P8h�8"��2_�����������Z���|xLD���}_Q<=P������4Fkr�����x]���_	��t��`�F�PK�����PKXfpK0gather_analysis_all/gather_robert_patch/22_1.outUXB�ZO<
Z���oO�0���S��$R�X��I�"�
66`4Mh�PH-�&�8�l�w��N:Z�g��4���\��=���\�����8=�;y�%:;����Q:N
���P.q���N%��}5�p��h�X�81u��I�z ��JB��j������p���4�s�����e^��	`g� ��,��a�������-��qqB���'��D�R$��7	����n�&6#]�oG�l;���m'U��R�K�s�sQ���+H��}V��u��VY���4)�<�M�q3�Q7NT(m"���nbI"sq�����'X�"���L1Fi��%m�|��S0|��q�*�6���h�d�!�����43����%�q��+\y,F�N^��U�W�)��z�����\w�}�'����#/���A��{����Q��Q����.���nw�������flA�X(;��,���IU�
_��D��a1�~|���b�R������~�/����,���
��(�rhP���<|�.�M���>o����+�5��o�-s���@���irSM3:����8�\��\���~�+��=B@��M
��4�}���2(��d�}%��`8J�6X�X����Y%�Yr����2 �ws��-[4���XHN'�7u�Xn�Y�D���W��I����<��z����ZW�prY�C���V��`�vo�k����$��t2/���E�H|��	�61������z��f���\��[���6x�T}�;����ZFtaB�����D'�I1_�{�$����\��N�PK%����	PKZfpK0gather_analysis_all/gather_robert_patch/22_2.outUX��ZT<
Z��Umk�0��_q�AH�^,���m������1��uDc���-�����NQ�5���1}�a���{��3�s����/����l!Z{��i�q:J
����1���P��
����1xHf�x�81u��IG��(�qFHc5����g��<��9��x�6u�W��}��8L�8K�k�����&6���C�(�>!SK�#��tb2 dj�����Ml���O��,w�����r��E�]�L������e��\����9��<�q�������e��`V���6��"TY��1��\Q��8�t�.lt�CRW�!��q�p�P:�5�E5,�l�/,�4$��������9L3��.x^r'�����`a��������L��7;��nX����=6`@�z�y	���w�����D�E.~�:�/_��-X�=��Q�M��q����V3�eQ�-u��PR���
�aq5z����R5*m�t�*&���~�.�2m#YI���EQ������/�A�p	4�$�M�g#�@���$����D�a��������Tt�}+18N��o_.^�����`g� �s���"�O>�i��d@��+"B�d�����<����ff�X%�9��)��Ii�/
q��p���qU�t����[�lR"������+"���<�z����ZW����(���,����������\s���������H���y�:�7���*r\�����L_n��V�wp�
�~�.���_�Z-FTQX��j��X'�I�^�w�$�1�����v��PK�d�	PK�fpK0gather_analysis_all/gather_robert_patch/22_3.outUXW�Z�<
Z���ok�0���SePRaI�dR��v��v]�e��:"1q����tc�}����4+czca�t�{N��\������<;�x�-Z��h�Y2I4����Q.p���L�}��p���'Y�(�U��N&��3��CB��s�i�O��n��Y�/�(
N�tUd%���N�,J��
���B
#�&P��#����D|�H��]B��:�f��o"=R�r:J��]�mK�IQ�sC��Mg�����#H�]�6��EU�6����	�<�M.
�3�S7�A�B�D.���)d�"JS�������gW��'(m3���%�;�����a��Xa��9gD
�����xv�$�VE'�Gq�oq�C0m8�x
�^Y���H�������L��]�f���V#o���A�w;�u����C��?�m������+��+5����>,�<O�@��cY���IW������u� d$l���b�K���<���ybx�����m$k�y��������(��Sh<����f!�=�E�b��������6�J�r��]���r^�9������ro�ru�)�B�;{��p����I������ }.�O�@s�HjT�x2�&Y��\3�N����A�Gd-/�]��a�F�_raX��u��k�%,�����3��>��B&h��n��4�|��+kU",$�Y?/xg���c�`��o�{���V
[ie^m�i�o"��lT+���,��<�����<r@�r�>oU4����Sv�m�����0��>L��T\���_���D��G��b���/PKe'�~�	PKYfpK0gather_analysis_all/gather_robert_patch/22_4.outUXC�ZR<
Z���mk�0���SePR�K�)t[�uk��e�2Fp��8vj�k����SlgM������X���w����\������<;�x�#Z��h�Y<�
�e��1�p_�L��}�cp����68ad�0OtOr%H�<B���lH�l��vu8Mcs��)0prm�<-�E�	���$N�$���h8��04�a�>S�H��z&���*��%Q�RO��4��~�����1������pR�u8Z/�k�U�s��u^�$��.�����e�6,���anbdy���H�����qB}�&2�,�&�$�D&�N�Z��u���R���d����YG	�s0|��/hTV+l��%�SDM"��;��q'F�]p��F�����b����58{Ey[�<N�{v�t�����=3`@�z���������e��:\�����n�`�Bz����������u=��Vs-���>�*���
�&:��~�I����j�K���r�+����~���R������	�:��zh�2�����-��Z����7��S�5�7���9�f�����8����n9�7W�����l��0X�3�����.���7���H���@�W|��d>'\4
�{<[����,��C_1�����BR���U#�����]���[�#�%���mN���?��&r�2{|����@��C�<T���B<���|�=�oE��w�f���Mo�j�WKn�"�J|,�zkb��,���d�<r@�r�9ou8�m��St�������2�����-8���41�[}��e����*�v��PK*�[H�	PKofpK/gather_analysis_all/gather_robert_patch/2_1.outUXL�Zy<
Z��W�n�F��+fSX���I
p'Q�4���0��ZH�e�!���"��;�!��$���
_����sD�-������������l��N����G�Q��hg���}/P��W�(��e�}����)
���y�	W(��)��d��������!��8���pt�3B7q�CN�i���fGq�g�<A�&IV�Nq6���v�8�����<��x=[���]}�}��i����<����:_������%Z�0�
��1N�T�?��.1\�t���q�.y�)I1��<�C<��3"�VQb�L�"�����`��	�H
|�d������D�\�n�rEA��A[�n6wW�p�����5��������=7�����C��a������,��Ge��t�I�J�8i'�/��c��r�G�����W�>�4C�Z{b���p��/�L4���4\��
}Z��'t3�6Y�^��%�+��x(!^!Ds������h�-=^�
���x���<%��9���$�O9���t�������}5Fg�A�,�G���u�>�G��}�)��:��
�C*0�R�����<��S�����I D����X��	�B�&��h���L_6����#)}���D���r��,�
�7~��xb�*%!p�"�*�8��5����0��-������%��f��A����Yq��I�D�����I�,�?��f:�JI�pc����������fJF�nSD��b$kH�/��EGt�"����}1]vT|�E������\M!��%�e��p�]�dG�=��ud�!�d�*I�=��#�Ws_��N-���feS2���\e(_��P��.��Z�k���3�j��|Pi�k��*2g�����'e�-���E����
��^*���'�Yx�J��"	
@������@�������{�a�l��
����N���"�A�{�	j�mE)����9��0[6:S�����\��c{�9��A����jGqn)�R��)	^�����TWxT"=B�\UY{���5]v�uMWiS�3o���f!�Z��h������7a0���vN�*���V���� (K�j�$f���z�o������%��z�G��hv*=���\\^�����t�M%L�c����y,{Pt��
���&���7��P8�5���}����^����Pk����y���t8{������u��I��b��.��@[a	}�d���}�5�`��.��
<c�q����#4}��M��~a�@a5:��#aM�PK������PKifpK/gather_analysis_all/gather_robert_patch/2_2.outUXJ�Zn<
Z��WMo�6��W�R�6�EH����n7I�A���8�-�����������HJ�-9J�9�K4��y��q�[�?ofW�����79����c����`�H��DR�J
a���[q�����\���L�8Z��h
��A�NdH@�	w{�4���d�1:���4�!�xm
�nkvT�>��)(6Y��U�y�X��������@�q��d�D���:;�Ey	����WO;G��r��MA�f�`����"<Wi����}�� p�����q�q`��!
��8B�0d�!e�������4�.�����*R>�
?�wB��/��T�;����{2�������u�<�;�J�	�4@_��}jI)�l�F�J�.�N9�,����Px�q!<r���H�F�d�9<@A/dv����*/��)Q=vx�Y�I�ML�Q��j
>%w�\/4^�"N�U&���J�T/>d�.�H� q����p$p	51�z�#����p���<g��)S���T�%���t���������������"�G��c���C��4�%���L`��.��B���s[J-#�n�Q7�P@2$a��N"B�=�(�[�N|�����D��K��eb7�U��3�Y�iJ_���a�#���3�5����E��x���)D u�0]��{"�=A���)����_�(@Tk���@X����~�p�v����Ir��VT�3�����J��SU��b�E��ud/,�w��Y�P�W�.;j������n����i-���(��}b�#��#d��uCjJ��h�n����{WA|�8�=��:6w���)C}�QWSr�Y������#�+� �t���=����(����[�U9���x������k�j$9"K�d��<���*������J�D���5?6����M�W�F�v���.�������m���EJ;B��%ju��������b�jL1�����pf;��0k5p�d������g:�]L��C�������.**���"��q5^��o��9]������C�FV��k������������@���v5�aB[��_�E`>��vpD��W�]�M~`�_��T\��U���T:���*��Qy�Y������T�F��21��7E�T��kB����+<4f�����]�M�������[3��u7o������9emz������bF�)��@b)�;�����V
v�����u��n�3�L��=<#0{T�M����v�qa�3{cNF�PK�2���PK�fpK/gather_analysis_all/gather_robert_patch/2_3.outUXV�Z�<
Z��XMo�6��W�R�6�DIR �u��&i� (z2���8�V��I��;��-Kvcs(/�h��y��������n|�7��8�z������r���)�L��:�x�G�������o�C5?�1�A2�V�U���rL�c{K�_CZdYw�#���g�n������i$&���a���^F�\��"U.'�tZ��������2��l���e��>l�g'���p>��G���������UY~r��*�K���z�
85���K4���)�{�.=B2P��5�UB�<�
2��P�&f�ua�=�<[����c��x���y�g[;�I�*�~��E{[
nW���d�����
��n���in�9��4l�&�\���`��1��C-��#��pK
@S���mAH�����WV<��D�������.��r:�����:)��B-����zF�S�kU�����$�Q��������`���E��4��.��CfF@�8p��TP�n��];�0��z��p4��s��G�?]}>�2�8�SCtv�	�0)���F�����?
�q6�CH*!��(d�o������M(�y#C,c�h$qG�J&�a?�z��,{�x���J��M��� ���Ll�~��
`���<0���������Y�4��z'�K<�`�`:����" 7��D�-!���S��[�����[@0�5tOj�7�������T���j7VT�=.���w�+�����d��j��������#�p���Eb��[�����00:���q=-�����x����d�����
�6B�i�)u`$F4��}�[� }x���db
�������P���4DI(:��Zc,"gn��b���9����(�����U���;�bk��(���Y&���g2�n�t|��('����7|v��uy�t������}�F�l6�M���a���������5�s��R��N����%����RH{��+t�v�9��TG<��T;���>��w1	��r�����+Wxd�"D��_�i�x�_k+�s�7�����MzS��>VS�B�P��h������+�7u��mM[3�)o�����$ ��D6�  �>��,_����
��z��4�B�f��Oi
\iT>����*G�&�1���D?����LA'��	m���W��;�F'`��];�{��L�������C��w�o~@��G����{�u�Ew;��7C�����������Fv�������|�k��-#���"��<B�g5]��/4"�x�������PK�(���PKefpK/gather_analysis_all/gather_robert_patch/2_4.outUXH�Zf<
Z��WKo�F��W�����}s)��Dm���k�(zhy!�E���������HQ2����A|h3�|�����L��F�g�o�=�/���q��p�����:���e�����o�]�8�!�a4+�����7a�q�	��Vh���L��F���g����G��t}����Q���<�Q�N�el2�O�����?N�?�C����^MW��y�=��H�������������X$wcT$��Z�(�+��C�����YR"pn��A�'��� S�blp`���	1'��A+,��!d�ua�=�C���M��I�i�>Ns�vg���_�ea2��_Q�Tz[
�����h��hg�+0]@_��}jn�9U�%l�F��d���`b,*C�<�R9NZ�I����<�	�Bf�_Ivo��>��'�;����������(��K�D�Vw�]��u��U$�i�dU=x��_H!X(0��~R��]��w�	�r����]��];�0���xJ�h<.�c��GG?]|��yrvd_����G��4��1�w�~��(`��a��C(� �Z*,o���A6����	TG�p,hI���R�z%S�~�=�H�K�N|Y����D�RjM�e�������,��@KphOBLt?�	��\%	�[�q��%�/�F�B�@���E@no�������U|����^V<c�&��H
BU9����?�����$3s���j�q�y�{��B�H�m��1_����XC�xQ��9b.b��t�/���������������	h�m:[D�7�'&{��y�l ��N�����
#�s_���R�=N}O2��M]a��[i��TW�V�k9������B�{���aMvME�l����������
���X>6K#��Y{&�v�T�GT���y]�P�>������?��;�����9%��9�hT�}�6{��l&)�p��@uie)����Yy�oQ�h4�Ke�u{��9gNJ�X�S������L���I��/��)�L@�r�G�)�A]��]������0�}����O��7�,�c5�*�^k�M��|.��J�
D���i&4��`��J�H_�T3	$({-�Y��N�'�.�kT�BJ4;��x����5�������Q�<F|����E7y47���kBw���+��a�#�k���M��ls�wwC]�����y���t8t�)��g�4@^t�!�4��3�[�r��������r��]W��{�-c���I��������g�Fh(�D���5A���_PK?x����PKnfpK/gather_analysis_all/gather_robert_patch/3_1.outUX2hZw<
Z��WKo�6��W���	�E��m��m6H�EO�V&l�����n�_��()�k�r��!`h����Bo,���O�D��������G2@��*.FYQN�",`
�v�P�}/&�����\L�>E�0*7a��xe'�C0!�f)Z�$���8u�w�d�wTsB�1^U1���>���A��Z��>�T^�c4,6��0�S�v��;{_�tfg�<�,�
O��x�nV6�u���"�6iy
�~>���#������U�YX�-�_l��fcTf��Z�p]T|��,��0�<{Tq]�i��[�1�6���yn�`�!�J��dX9����(,D?��k
��I0��+2�%w������CGyiv���+�����|�6��~�����|�$���{�.H� �r �0�A
�+n�^�^��R^��$LS'��7i�8p�A��2��}Q���]1���@c.�h�2�L���t`� �5`Efj�r�7~�m���3SB(��.�l7f��,]]+RcT?�^���!�xjj��M-���Dt+�>�q����M�$9�&�c�1*��������#l�^��|�c����X�_�8m�S1��.�(�z#�X����3������~npz���!K�pG���V��^o�ui
��&��~C7Q��,m�����t��C��PD5�X���D
����QbZR��y�r'��U���D��	P+G�����6�4;���)��_v��>�n���1�VV�&
5L`��.�^d�T��Q�#�(��e���
�3�;�DK[��(tIh��,,�}�m������������t��*���(8��$�B��-���uh��M�6��MQ��r��@5����9��U�F�`�ks�{l��S7CK	3Ug�P�h����F�F�Z��`��"�!/����B�D������~zy�u���,�������`�y���|�A��}�xv��<O*$�����O������_�b~�����x���.dB���c��<4i���t�]�$:���c�j�4L;��F[� �����
�5`���#���4b�� �!k���W���xi�et�Z����lZBm;���^}��vzQ�u-�����5Q&U��O�8�W�aC�B�b���m�)c ��0���:���t0�PK\&��GPK�fpK/gather_analysis_all/gather_robert_patch/3_2.outUXX�Z�<
Z��W[o�6~���[�a&xi��.i��A�,��2a�%W��d����Eq�T��e����s�}�J�^Y>_�N�A��o/^���G2@��*v
��p�����]+�g��	%�[<s����hFn&��+;�LI��7K��I�l
��P7z��U��1G5'���b�����d�6�N���V��wc4,6��0�S;�����u6���:�#�~A�#z4����
4B�Ogqe�����O�����|f�g7�b:�e��u�l6F.[�.�����d��]e9��������Y��I��E��l�~;��vz��d\z��4�41
��4*	�FhO��H�2�%wWIy���%�����Bg�X�k 7�cw����}�����|�$������?�)���j�IP���k�d%g)/�e������;7i�x�C�2�]��Q���c*�a\Z��,e ��V��i6 ���A�U������f���1C
��������JKJ @�(���{����,�����|����a��0�����;D�=tj&�:i���Ct�>:��t�>:)�A�>0?�����i��i&&�V�C��+�+8}JL���Y��O�J����R�9��h����[w]��$�	��_�U�(K�lh� ���Ch�-�h����1(�����m�u �:��J�����SQ��OI��rD��h�~�@�S��B��kg��]k�z�JBh���7i�as�q��{�!RPYsG��a��x��es�)�'�w��������(:	�/��M���"�[��Z2�'�vR��'��"r(B�
�<�
��%h���ZmE��Uy�WL�8j6�d�r�U�F`�ks���2D�9�n�����V|_�q?5+	!F������YU2�#��%��!����}])):I2��cdo��f������0����g�����A��������?x�Hp��y���QI�5��]��0�C���YU��Q����v��4��V������m5�����P}��A�`���(����<<������?W���,7���2�Y-]a�+�:�mGg����]OO!J���E�?���&��*0��I��>�^�
��:�����@j���!�LN���~�PKd�U�GPKgfpK/gather_analysis_all/gather_robert_patch/3_3.outUX��Zi<
Z��WKo�F��W��rQ-vg�B N�$�c8r��'��!�TH����3|�V$G���9-V����y��W������������Dz�����y��rB�a��#�hj��Y�4���8#��I>I{L�A��������NP���huI���q��!�*Iw��V0N��T5��L��'��'��>+)�����������"�}��%]�����?Y�Q��/�����x��xA���I���:��Q���W�~%I:�iF���0	r�����g�dH�d5� 3����O~������{����(���O����t��)�mH�
�R@uA���9M���4��j�uA���Z^+h���*)_R�����<�w�X��e�h��&i��m.����O��t��� ��VwA���Te6
P�A
���	�����f�r�������w��p����A�G�����2�r�K���Y)�@?V*�+�r-�kX�@�X����M���3h)5nf��Y�X;�Y�n�s�������z�;��^�u���m��T���*��a�k����d��l�>6�s�M��M����f���lF�H���N
$P�09�qBIL����[�%fr���Q������m��a��$)md)��m�nM�b���%Wa�$n����Q��pB3�{&4�HM�����S7����Z	�5J��y��,Z�>-�(��(I��!�j������@�c|c���'��]��"�:��2`5o
30p���z9�$W5w~L#f0��96��������:��<�a��8D�I����������e�}jh+�z��i&�d�� �Y1$��j�"4T�-����u�WuU��i�\>�KA�I�/����_[c���!npkF�G����+'Nu���@E��Js� ���z��<��J�S$��\��C�����+%C'�����[�9�-w���������6����u�<hxX��2\B�F��@H�;%+V�������������cR:.h�q5�
7���t��	��W����(Y��h�~
X����43����Q��Mwx�1��
�9��8B����v\�7�y�����9�������������%_V�,H��k�����^�E��>��:a�2���[��I�~�pb2��_��W�q��PK#� ��FPKrfpK/gather_analysis_all/gather_robert_patch/3_4.outUXM�Z<
Z��W[s�6~�W�-�S4�_��3�m��f3)i�����h���-v���[�qX�&���'!�s�}�*�^Y��9���.�^��%�����.�e���Y�GL����c��������=���H�S��b���'K7�TY��T�K��Yd�
�=�n���,�c�N(�����/���&�)v�RY���Q�X/��E����%^��;������$v�'�?�'�a�^:�@��t�q�N�)������(�'./pv[-&�w[F?;?�&C�����\�**>�e��}���=)z���"���y�^��Ns7�5iT2.%��$M(C��Bt#�H��"�$MwB�p�1���A\��&�������Bk��%+ 7������!�3�!�t7HN�)��FHb%�ef�+&�^�A��R^��E��'w��xv���U��.����w�T*���\n�YJ-�:�`�m	V������L�� �`}e�n��N����Banwaf{0��������*�	t����z�[��Z�u����e�rnM���w���TT��v�i�=B'���K�����4�t�^�������$��J��H�R$�jv
Nj�&���/�,�|bp���.KfG���V��^g���-n���WtG)��&j>&c����&1@M=0�E����j����C�-�:�� ���wyIE�eJ�7�����@�.7�T;���1��77A_�3���1�V��&-�L`�Q���"K��r�e?����Y���>�����������4+P
;���;�NtSDS�H&���N
���"r,�J=��2���K���ZM�����*WL�8�7�d�$�,��h�>����X��0D5�����0��f��X���4��Ki
�).����  ��(��A��������~y~�����l�A="w@G0&�0�t�-|�A���@2�@��G�_S�NW��	�F�66�%���$�xMgl2!,�4���MZ�h3�n1��13U
����0�=�����s�Hx��fP0X���%�Y���s=p�W�1��/�M��_����/�t�R���������7�s��/�x������2A����TI:���>&J�e�C�w.^�H
�p����)��o�����PKY���GPKlfpK/gather_analysis_all/gather_robert_patch/4_1.outUXK�Zs<
Z��VMo�8��W���!�%u��_��$�6(��
U&���J�&����D%������.a��43|��pH��0������8=>��&G��1�SW�0�}\&��$�tSM���_7�n��b�,����J��Jh�4z��
�7�;��:G���*+��k���V�]e��_�0CT��M������G�(|��s����
�_���v�v�����>W��3[��@(�:�>�R"4 ����{ �����.����r�1��l[�W{MZ
Y�xLn���IKF��Q��F����"Yd/"��gK�����a?��"y MA3�h��AKL5��q�=���g��{f���4}��|�t��l�k���_�<�O7���h"����Q�T2�kC��hb6I`k���6�.����t��w�PP�B���t*I*���)}4�bTB
@���F�E�;4+
[��e��?,~]�Y����kJ���Y&����F;����"�!:�k�s)�����G��&_�k_aR���
�mWy���}D�e/��������Ov/3�f@����}��	Y�A.��m��_��h���'��j���A)�3����I����Z���d��MO�y�%��l��1�0���Q�$:�e7-�b2:���������������/�������U�N��U�[����Uzxx������F���j��*��d����zeG+����=�����F�|��`�.^�K�sE�������t��oE������`w�v{�M�{C+q+
>!&�P6x{c�mp�q�F�4��]�����^2�L�PKi�X+~PKYfpK/gather_analysis_all/gather_robert_patch/4_2.outUXC�ZQ<
Z��V[O�0~��8okj��c;���KA4M{BYj�V��K�A��;�A)�����Z79>���\l�7��_������[x�
�����I����pc�KR"}B�� <���c7f��h1�~�U��3�K�	�EH;k$�.�w�Cu�'&�s��4����l��Y����C�����={�{���G�)|���|�����j�����[?��`O#7�9��|�B�Q���B�J�{-D�p; ����.�(��t�9�I����N���E�;�8��[�d�,���	����#-�U$`�^�7\�'\^-�6�+����U�5x(�L<�"D���H�or��\�=��Q�x5���g��-��������t�S��G&EG/���
t���
Y'o���������-T������M/{+��\NOa�Q�+��bM�&c��3��'�8V��|RB���)��k;@��}���+4J���qi��O���(�5���K��#�#�&�=������;��2�Y�9������|
�z\j�6�>����7e0-�����0C�
){7�Opx���o�4JG��K����%r���\��l��V%�e���!Fe��eP�X|����>fz[�(6�4���RT����t$���R4q��������k<������Q��CS�����0��41�6N�5��^�J��
��Kr����jKU�����~�����}�e�o�u���>OL�t^��m�w�F��:��6���RDzX���������;����R�R��X�
����:^9��o$��pLR|����|���PK��v�,~PKcfpK/gather_analysis_all/gather_robert_patch/4_3.outUX��Zb<
Z��V�n�8}�W�[����"��@��m7	�E��B�	��$z%z���w$�I�8��@K�-
���H�`�����g�8=>����?q����.�s����6����n��4P�o���77Wh1�q��uV@p��)�8I�&$�n���:�:G/^�*+�woj�^/�]d����0�X=Tr`��v�d����6���|=�uC��n����]��3m����������=Et��G)2-��_�("E��b?>�z��������� �8��U~���u�"��C���ZgT�:�S�$��f��6)�bF���"��<����o�p�����6�H	H�F�L= �)a���%#J��#��0�GG�����=���c��]�|�t.������S)��dDn��!hB'���4&1����OE���N����sn�`�p�Q�K[:���j�P�)i�]>)�Q-���)I1�IbD�'���N�6;���(l'.��
�Z<]�Y���v��&	'$eT*�D�7U��M��d�����XI���K�>�l?>X�6�^�
�2�2��]���1c��G�=�N��8>�~�{�1K�(�v��|���+������Wv2�a'�I�1[�`�Q����q_Tb0�Us{}
7���h�9cC	�(1)#:�$��JS���I�.��������������V�{j�W-:AWYl�sX��V���;u��Xj:�D7��-U1M����@��`E����7�z�����]l��������y����u����ZD�� K����7��l�8���LA���^]�|j�?Q<D	�>�^��h�?PK6Q��*PKUfpK/gather_analysis_all/gather_robert_patch/4_4.outUXA�ZJ<
Z��V�n�8��+���"IQ|u����L�
���B�	��$z$z����R�$u�XYh	�����s�}P�`�����g�8=>��&G����+]���	.�6����v*�0P�o���7�Wh1�i��mV@p�](�I�!d�

�7x�MP���]����7��n�W����`��m�!��\����{�}{w����7s����
�_���v�v���4�}��+[���W� Lz��>Qh\�u���{ v����.����r�!��l[�W{M"���q<$��d��d�a�����L����0�T� ���?qy���.}�#��r���Q=h&w@K�4��
�1p}�$,���n� ����3�<������MK����F�L�,]�<��7��oJ�Q��8o<�����)gl��d$��b���&�%�b��������1)����$� Z�'�8V1%��,!F���:=T���B�����Pfxk��2�*�U�k��j�rLcF�$��#\Ii�P�tX�T�0����B=l7>X�6�^�
�2��e��]���1c��G�=������?�=�R4J����o�r�\0���2������[O
��2{�R,��,�>����ji�o�������>4�PL����;cT*%����K��x���a
�������(�=5���V�������0��MTzxx�n�
?���JtPU���q�#������N�����eqo�s���>_�"�������z��/��������"��z,������zg���!JE�CBcP6xum�mp�q�F1�#���r��cq6��PKLu�,~PKVfpK/gather_analysis_all/gather_robert_patch/5_1.outUXA�ZL<
Z��X�n�6}�W�-N����Q/�������O��&l!�������w(J��D^)���&hR�9s�B!tx������������0tt���#t����4����F�%���4��M(����D�4
�y.Q���	0�q3��,�d
�'G��+���t�Q��c�gg��Y�������n��h���h�6��
/o�]n�����hj�/htL����XY�@���te����������>�|���(O���ha�uVZ�����f��|����.��e����iR������������`*`��l�R4���	�`���{|�0���7q��������|aS�)�w��
��r!�%8�����?���������b�������=�a��4�@�.Whg"1�D���\�p	����\�@�mW���e����j�-�%��	*��c�L*�	��a�!��R��&�z���X���"���D���O]�I����������=��H�GKa��G�^��^�t��f�[�5��RD��P�=(FUKa����P�4�y)�z��`�V��������	r�edS���'���
�Qdy	Oo<I�o�Oz����:��a������&�R	��B����q
�Z�SLSM���dI:����f8i��Pe�p��Ktm���i�$n�j�)	$'� j&��������R�GJ��h6q�d�pm4.q�]�S�N���e��t�����J@Ql��>�h��%������Q"�T��9�87S�5���s;FJ*��D�,���>�5��tx����B�bxxi��!�n��V�W��8�As��p�>@�����[c%4k(�TQP���:�y�:�����7�p�~cX\oF����re�4��&oVq>s���	:���SB�M���2f{���2��!B���;K�<\��J(=E^��~$���F���|�@)#{�M�d\��>����p1�k<����Y����%�n�����z��1mz=:���xK�K^�M�R@�@p5��<^pk 4��9���P�GU��yH:��0��t�(>��L�"�"�Q��U���M���vr�h�Z7���z���[Q�����_��XT�P�M�24lQ������Lkyj3�O�v$������/�U���b���,��;�J�������y>�������cg����C{�CX�O�����m�v�f �l�diGK�	������w�2�)�mZ^�!P���]}|}���z���}B��J�'�oT�x��q�i�"��yY�}2���r�
���v'����(Sz�)D��V�o�1�(������m}K�`����Au���#�2����e�`k��`iR{��bx�
���v��20�r(��w P��ttt[��p��h���wvZ�T���<��G��prt�PK���n��PKWfpK/gather_analysis_all/gather_robert_patch/5_2.outUXB�ZN<
Z��X�n�6}�W�-N���2��������O��&l!�������w(J��D^ihNDP���93
�����\��>]�]���������h���I�O��`I1n�(M�e��E�|1����i^�K�G+;a����7�z�2I��<9�N_!t�����~�?��m�bO�����w�0F��X�F�(�QnWxyk�s��l�FS�~A�cz<���B:E���(�&E�����on.v6�h�E2�<Y�^��
�Yi�G�JR���������(��w���I�>��S;s���S����H�u�qW~�|/���8��$��m���8���0_�lJ�]z�����QTp���e�4�ui���$��i�>-�8����?�2,��b�@g��0�#P����Hg��Zc.{��H�J�3�n�Bv8�-��l�V[oA
V������T�O�����T,�FT���^�{��^oZ���ZD���h��v<u�'"�/��a��=�d��d}�1`I�cI!����c��0[���(�)��&BaB���1��b��@a)EI1�/��Q�K��E��[	>+��;��&����MqVw�����`F��	$|<��$}4��>��OVT����+��[�������TB)�k
a���x��J��
��h=��Y��l�1(�N���;T�5\.�����f�(��s�F����&q�$� 0J���]X`��CQ5}8L
�pm4.q�]��S�N���eJW�	������f=�0�����bzg�l�(Fj=aG����}���#-�0}�{m�5�@����p
 �p8���^�tn����s��U�3�r��G�*\��P�5����X	C��T��V�9� �K��<������ �b*x=MF������L���Y���U>�&���YsJ(��(v�U�l���&�0D����xgI���5X	GO�����Il���.)A?_&p�����&R2��sn9�\��F�{ou�2~+{	�C��\Lv<9
K�����o����IX
8���u���!Tu��f(��[U�
��yH2���^ekO|���`M�D�������%�������;�����.�^f��N�*=�����k�P`QQ�n����S4�i�Mk
��Rl���H����__%��z7�L]�Y>��:�J��P���~�{Pq�\A��3t	fm����!�HJ�!��h��`j���m�*j��xL;W������u������M!l����z|�����d�/��"L��5X%�*�����Q��+�s���u�Y���'�[�/���k�o�q��0������C�q^�������8�t���w$!����Q]�|>�*q9\h����5��`iRS�����R1��
���v��2�4�a!T�GN�F7
�/ZeG����%����{�v$�'GG�PK�{k{�~PKxfpK/gather_analysis_all/gather_robert_patch/5_3.outUX��Z�<
Z��XYo�6~���[�!x�2����f�m�E��@k�[���f��w(J��X^)G�vDM�3�|sP�������_������p::�����VQ��h�d�$����f���[6��[4�iN�(��E�Dy����b�2�q3��,�d
��#��B7Iz��j�?;Kl����R����>F��X�F�(�QnWxyk�s��l�FS�~A�cz<����:E���(�&E�����on.v�h�E2�<Y�^��
�Yi�G�JR8��������(�����I�>��S;s���S��������!�{�`���{|�0����q��������|aS�)�w�i�Y�B(�9��6�
�����?������������������a��4�@�.Wg"	��Fk����
��&��Ph,��v��0p�[v-�����������/�R��Q��C)+UgKVE?e����	��6�6�!����w��$�	���%����JN��J���LP6�����\
1��z%���c��0[���(�)��&BaB���1��b�JN1�#�����Q�K��E��[	>+��;�M�-#����<AgW���"�H�xz�I�`}3}�[���4
3�W�+��RP@	�H%�r#01�b�@��+�f�2���� K��M=�0�I37��*����]���L�%qSWkdHI )��u%��(U��va�M��.* C�z�M����%.���Z@.����eJW�	��eV5�za�_	���bzg�l�( ���0��f��F��pn�HK-L��>�E�����&�@@8��iP�-������sC\����0�&�p���<�W���N���?��J���FG+`i�5���u�@��h�$`������z��P����k���5y�����|^M�1
qJ(�AS�f���^�k�L�e��q���%Y��`%��"/mC?���O�]R�~�L������&R2��S_�r���7��f����e�V�*����\Lv=L�K	�E�����xK�K^�M�R@��6p=�x)BU�I�@c!���U
8X���!�XJ�����^|����mc@Sw� ��^/����p�nu��<g���V�Q���\������!]�����Z�y{yl3vH�v$@����e���wS����c��s�����������;C�`��W���O$���F�6S;�o3�E��G6�Pg�������:{y-s�B�����������3��_��E�����`!4)��op����*�q�"�?��_dQ</k�OF��_.Q����.��a�����]R�yI�y�*�m3&q��~��oIB����������]M���Q
��<�&1��8��(���n����x.��;�@�;(?s:6��@`�)ZeG����%�/��=
n��:�^����PKy�6��PK[fpK/gather_analysis_all/gather_robert_patch/5_4.outUXD�ZU<
Z��X�n�8}�W�-�!x��I��i�MP,�)Pm�bK�$�i�~��$�N�J��6A������!���������g'���0tp���t/���4����V[�%���,��O(�����4GhM�u�@E�t�%���q3��,�t�G���.�l�Q:�c�gg��Y�����������(_/G�E���pK��r��Kfn����C���!=�����:F���8����8A�^_��9��+������]�h��V~p�4����9�l�(���h�p�m��W'�y��Q�Z�pf1�
0��a&�F[��{�	R��I�"N�\%�����*���k��M��KOc1g�3�����j�(J6�ui��4�qY�>.�$q�����,Z'��=�a���A�.Wo"��D���\�p���-�`i��+d�����k�vn���`e��Q����%�R�Ru���Uz1K(��~�M������N�/�xz��M\j���.	A����#�eI�<��!�d})��H����0���>��E�5�#��:���D(��J� ���b�Z�Ug3�Z��R��2M��-��������&����p^O���W`�:/R |<�
Azg}3}�[���4
3�W����2�D�P���e��e�hf�!����Y�4��,`Ps�6sO��5Z,�]�/�r%(M�{�F��$'�u$Xk��aa���&�!��v����a�e�#h�q�K�2E���8���)]�&�=.�X��fX�#�����t=�qE>F�#�����+���i�>������Z���=X�c���D�e6����[��$0u1�xi�s�>�C,����|�Y���XF+�*�&���5VP�	�
�-�P�P�R��p�
�����	�6-VZLj����X�p4���7�<�����Rk�1������V��^�{�L�e��q��������+���uQ��~���O�]R�~�H�*#w�M�d\��>������6��g����e�{	e�G��\Lv=L�L�<\����A�%�%/�����G`�����:Tu���+��6�QU�����@B�R�F�����fM�0�f��*����Y�	�&�����!�����]vE���>;}�P��Afv14lb�Z1,�jm!-��)��``
�]7�U��K1S_}�?C�����~�`���}|���s��f����C{��t�7���F�vfn1�����~�&��U~!�v�}�s��M�(\i���;$������/O��?���Q�lG_��P�,������U:��<F�Z�:��yy�2�
��D^s�W���]=/,$�t3�d��o�1�Bg����m�%	i��/��j�v���Ag�Ol�y,
1���KF�����n����x>�����D����>���0tth����n�.1����H����
�PK���3��PKSfpK/gather_analysis_all/gather_robert_patch/6_1.outUX@�ZF<
Z��T]o�0}�W��&�����R)��)��N����+�&�f����H�$M*���Bp����{�
����7w�`1���10�7����*��em?
y��#4����N<���F�0�iDF2�8e���>��yYn�s�YM'W���z�0[.+�F���`�)�.�gaV�:��`������0����`?�j-���Zf1��u�E�����}!*�,���1�4zQ72��n�.{�?�	��2�ke
��/����BC�!WZ*#��#��H���0P�{���|���[)@G� 	�S���Xvq'���>��J:���Jm2'�Ud:�'���"���1�n?��m�m�����l��XQl,}�*����R����:-mF�.��~7Be�8\�[��R�������y�������P�1'�9{���Lnw��l{?x��7���g��E��c�2�I��|�9g(����aU�}��=5���|f����-P��C��=����1���U"���-�k�{�_PKi��c�PKRfpK/gather_analysis_all/gather_robert_patch/6_2.outUX?�ZC<
Z��T�n�0��+��`�Em�8m�F����B"l��J���;���Ks(�C;��g���{��7�n���b>�}g`g���s�k0J��N�0%%dH��s5e��S��rw#��Zf`u���K)�'�l��b��c5�\|�Ff�E�l�,�RZ���n8���)�.����Ty�;������("��f>��	�.�r��
�4F�1x�:��6��dK�}!K���	���]���|B=���f�uGk��!�Lep�m.7�E��["
2m��*�q24&�
7>	��!J>��	
��D0\)���=��P����{�b&k�P�t4�������pWS�@��	e�\�q��f������Kt]@i�>��P���7H��������0�_Iu���
�Lv\�l|W�QKc�������:W�N�\���:+�u����h7���:�����I�v��N�O��{!z3�-�M��7�����.�����G��-�>!C6L��s��[s�����@�e#�c�W�lUR[�>t��e�0&���uW`�8�PKM�9;�PKjfpK/gather_analysis_all/gather_robert_patch/6_3.outUX��Zp<
Z��T]o�0}�W��&�l0�A�R�uRT���iO�VbLF���g��lI�>L��v���{/�������������n����?��\����6	�,dE�!%P��uB�Yef��F"5����B&�S�H�#�gt��rc�c�j:���������rY��0��a������f%�@;8r�����8b ����`��j-���Zf1��:�����d����Q��	���et)[7a��p��]��2��!�\�p�L!6�E��C*4�r��2���l����)b�Eah%t&�)'>b!�JQ@��@�)��,���V�t
Jm%���z�6��*�Et���.������#��v�v�]`��mwk#���oVecZ��R��W2U�e������e[v���F����F7��T:>��V���L�urkEL� �2vn�����Ugr��Oe���������=i�����{��h���H|����=���>�8�~���9���?af�+Pz��v�S
E���V��Q����x��O�+�H'���~PK�V,�PK�fpK/gather_analysis_all/gather_robert_patch/6_4.outUXT�Z�<
Z��T]o�0}�W��&�llc@�R�uRTe��iO�VbLF���g���I��I}��,����9>�������X.��o�����`�m&iY�����"������:!��3�N�?��Lm#s��PIp�
Y8�e�u���4�]|�F��Q�|���JZ���l�N�����v��g�L���6;HC:���`��j����4Fe1��:�1��dK�})+����m�i��n����v}�T7~��'<d����-�>)w��J��\��*Fn��;#i��
 1*������9A�5��p�_a���rb�xW'�d���k��Z��� Q�f��u���)�o���m�]����[���X[Yl}�.��������:-c[0�09.\v�/�l����jq}�`LS�J���������tS��vN��a���������&S��:�����tl������#2|c�O9&b0?��p�,w>��8b��O�q�<�	3�_�6�(�D@��=��������W� �� �-N���z�oPKB���PKQfpK/gather_analysis_all/gather_robert_patch/7_1.outUX?�ZA<
Z��Xmo�6��_�o�����K�H��mI��+�}
T�H���+�k�_��(��"9r���AD������wG#�}������A�/����F���#G�"��B�I���Yb���`�K���K~J��xZ��Z6F�hR,�*��;e�
�-5�C[/�����#���%B��$��_z������m�n������VX��a�9���eC�rs��{<A	��M��O(a�p4�nQV�^<�({qrR�������7����.^�yc�l��h����tY�V��i��k�z�D������n��V�Lt��Vmq��u���a��p;��iv����EI��'H��������s���=��%}�4��dd��b���dsfK�������W>�'>�������fE�9f*�W��Aw�@6�@�2j�f�@�4����f���5@���xr��]�y��yB+q�MC�$"'�Ym)��q�� el�e��A�� ����s�m�����8�	)���`����ZC��r�uP(����X�D��c�1Hi�/in��,���D���8m�
�Z��:�.��Q�x����������]��?-&wQ6FgW��Qs��w�gW�	c��5j������c��'#�5���!���)�����+��::�M�X@�4�Vh,��ag�.+��b-!�Z`(,Bb��@���!�Az��/�m?��.�y�r���E�����3�0��&�����)�Ve9�����V���B��k8,��0Xi*��e\�B�uY�s�W�K�
�V
@�2�r�M�9LS���f����VD�&J
��e`k���g���T
�]
��A���*�g��+mO�|9�wE�}�F�������s&��G�^��2g��WP��0��[	��.^s8O���:��+����DE�t����Rv�y0�*?��:��eY����d�p{���o�Gz6�����>��y�@o��A��X����a7+b�v�bp@�����.����ar_'��&�#d���U/����|�B�r���v
]����'k@�K���A:�>��hW��xg6s=$������;l����aA?���+�2�`�\3��,\	�}��Z	�0�S�f�,d�>w�z*M��
N�g_mS��og_�$��;���w1���Z����A�{��y�,�"����9t8������	>��Ix.%����q F��Xd�g�
�m,���m0��X%���Im�Qy��G�UP��)G
O��o���f�.aCG�`�I#�a!%&��U�5UCbeiW����q�e� ��� ]�C�&��9t���5��MD��V��m�X�2S�K��o)��K:��4&jm	5��n��b�Y���� ��V����mF�5�6N��>2y��Hd>K���� -�3!)����F���	P4����7Y�� |�^H��H����a|t�PK�r����PKdfpK/gather_analysis_all/gather_robert_patch/7_2.outUXG�Zc<
Z��XmO�F����o$RY��*'A�q��BOU?!_�"��������k;���C�N����;;���3�A�m����7��OW��o�:8����
ga��`��	e�X��%�����$���P�-g���h��E0EY8s'LB���jh�%�8��sx��C�}���CI�����'�>�\���h��-���K6��
��/���w�t�"���(���P���`F���$>� 9<>��c_L�������]:	���!|	6�Y0��}�$^d�U�{����9A6q	����.�� ����U�nk�1g�v[������'.I��iEn|�D���`�&�SrO><!X�Jcs���TRL`�/���dJm-V��e��I>�����	�m�d]0���)�\����XS��-��%zF�p3^
C~,+�|t���u����n'`�PZ<�m��9��������2�Gc��2�C� ��R�{(�\���r.~�	�-���R	��&����J\���|�q��*R��
[Q��1����0��8�Vy'c�?�TS��zh-||N3�����f<�;<��8������e>���^���a�g]��^/'�7��g}F����6�/7�#����������3��T/�h�6U�bA1�\Z���>fv���"I#�R`f8�z�X���(&�����t1��h��i��V/{�.�P0��)�u_��(�P�j�
��P[����:�ZJ���D�7k,�BY�I	�f�q�
�We��e^E��7��#B��Sd[O��i?�i���#^O���&�*���DI���l�3Q����JA���m!����{%�g������n|2h^��U�D�����J^���_C������o����k
$�V���{pzi�*�� �]�?�m���eL�,�I�&e'��,i�9%�	�3�~�.���t_���0�s����
�*�5�����fI���b�<fsZO��������0z��pz����oY��;�R�q6��O9F�|;�J�(\]��e4v�5�����B�4�.wf=�3�����#�sP��/�^�����s%U���������K#!�3�	V<O\��^��}���\��+��l���0���������v0����0,��2��(��������n^G�4���G�:�Q�rK����X'<��	�|Y�q���+��"	�Z�O���EpZ^C���Q��R�V��6����X���*.w*G5O�bTl��h���0��
RR�Pn���H���voH,-mk��^4\��q���>4nb��C�-lW#��Dx�j%�|�d�5Z`S�K�wo ����mR$Y.�f���X����Sc������j�wU����"������<�LK@�������6����r���7�0�/l��
�\�f�:t�E��7L��^|;(E�PK�iG��PK]fpK/gather_analysis_all/gather_robert_patch/7_3.outUX��ZY<
Z��X�n�6}�W�mm�!x��I��6I��.�>Z�����W���~}��$_"9r�`�����������zY������������o�]���@h0J���2K,�����`�P�~�O)�����eC4�F�"��"��S��QX�q3��L�t���v���qM���������6s�Q�:��D+�l��&��A-mPm6)7G����P��$�����c0�fe������W''��/��i���p3<��'�����`C^D�9�WL�EQZ�������x��K��v��!��`?�������S��]��Z��w���,G�Q���	]3.�E2��N��|xb��+��z��TRL`�3��`@!�$����W>�G>���n�)��.�s�*j���`�m0,!����5zF�p;AJC~,+�\:Xd���Gwy������'��w��4N"r�C���2�O�����R��Y��(�\nW���>�'��4NjBJa$1��G�5���|�Z
�%%����j�	���&��&��*��l���t��a���]���o�i�2�;4�}����������0��|4��!:��
���g}�>�ZN�?�����-*Z6�~2�\�����/���R=��5�T�����*fv�bkI6b-5V����JJ���Q�g�Y��p���<D�����p^����*C�t������>���	�7�q�z
�U�j��k)%V�
��xX���C�p�2�w���,�����5��Z+�	� N�m�V����c��y3mG6�VD�&J
��e`k���g���T
�]
o��f�g���gb{���"��p#�������_�L�9�n���m���_A��.���o%���xM�_�I���^���$*b����'t�������~&auQ��� K�oO�>�v���o�Ez������>��Y4G���A���5|�[�n�����+�a��Pg������}�wpFwu�o?B���[�b�]J0��)�)'���o�Z���KV�����}�t<�����L�����z�?���q�Z������$~�7s���X�����auf��H���AZ�-�V3l2n��g=�&�
j'���6�������a�B
evg�����/TP%8���y���i^G�����t��~�����X'<��	�I�c�q Ft�X$=s���0����<�M�2VF���F+AR[bd�.�e5Z��r�r���)F�v�*iv��0��
R#�a�,n�W������XZ��8�=��8ik6nb��C�-l_#�D�j%�|��k�T��8P��]��C|�!4����jW�6��z}cj,����Z-�����u$V<[;����d�|@4�	��|�pYk9�I�p���6B����D����w�E��'LkI����A�-����?PKz��d��PK]fpK/gather_analysis_all/gather_robert_patch/7_4.outUXE�ZZ<
Z��X�n�F}�W�[$�^��b��6��vS�A�'��6a�TH�����������HA�sE�.g���9B������������;?���o�]���@h0J���2K,�����`�P�~��)����s��h��y4AE<u�LYc���fX/��������%B��$��_z������}����u<_����f�0��������|8��=�����.���'��f8�Sw�(+/�]��8;+�S_L�������]����!|	>�E4���C:/J���4q���z�D���������V�Lt��Vm	[k��n����%�`����r�~%��!�5�*�'���)>��O�t��X���J�	L{�R)� �eLSR1�W1���~lZ����fE9���+]��[`�JJ���
=��E������#���>���c^bt��i�q�����'9����\�����L@	����������LK�}3���Q��~K��&�F���|4ZS����G�,�X�Pa���T�qaL/>+]�%M �e����A����������1��'��`���a<�����������\�?�FQ6D7��As��w�7�	C��-Z�guF�-�~2�������O����R���5�T���1X�q�!�Pa�\K��ki0�@���ZW��F1i�Ht��t���,d��&��p^���w�����M��}E	JT?���)R�-�����Rb��P����5��!\.$�f�q���m��E]E/�7�ZyJ(�q�j��4��0M�Rv��i{:�N��!�4QR�}�/[�LT<�U�R��jx7HzG3���Y���m��|���������Q���k&��G�n���^9�����]�{u�J`/��GM!^ASL�8<��s��O�"��J�E%*Y�{~��3	���A�[�~{I������H�z�}�<��q1�f���5�u[����5�fA��_��^�2(���.��rH����.��]�G��U�n����I
:��'��\+��qu�����{j���Ow5���w>��Z��
�7�#s8��r\�����z��T����rR
�3/�������XS�r��Y������iM��
N�W_m���[�^�5LRI0�{�N5�L��K����<���i��h�!�~�NA����{��^��3V	��d�[�U����Uo`�����x��b�i�6��X�%��Z-%Im���<��h�����Q�S��i���[%,`h�
J#�a�YJx�A8�V������6���E8PZ
�Z�C8k�kob����-�P'�+"�uI��^�3����������7G��:$�ak�b	5��nk�b������E�yA��B��V�Qgb)���a��LKB����2�����IG���#%�o�qr_��a�4?A���h^���
���|��o������PK6ym���PK\fpK/gather_analysis_all/gather_robert_patch/8_1.outUXD�ZX<
Z��X[O�8~�W��"-��q�2���XX���*�Z%�M:I:�����{M!�e�Bjp������E�}�_���������w>��o��LGi�P����1B��q�1T�?�c�~����X��Iz�$�*�c�`����)�0��1|���? �)��a����E>��$�i<[�H`N�VL���bz�h:�w.��<�NN������?���?:��S���E�%����.;�����JFc�Q=����~��9�;����T�@W�������J�b���jL�Hj����@�������a�e������$�=4.�$n�0)�X�)����v*��cIV<�@�r��
�8&5V��+t~������E�H�k����R�8�R�p%��q%�1
���+t�p~������6I{��������L���/�����Q��(k�m����f����f��f"zy3��/I�����,�N@V���$GX��j!9�8V���(�.�����a��g��SN��aj
\v��J���<�c�1��&Z���#Q�#��a�W92��Qrm9j�4~�U��IY�#`�[dJ7d:�{
Ah��@�1��{�"�8�+���I�/'M
�m����BEZH�1���s�G���e[x*�d�D���d�f�Jb��[em
R����
�S��nM��Xs�F?�u4L3�Vf��]<^EsaTOnq��lk�$
��P��3��Y�@k��;6�,t�ph���|Cw�$Cy�l5�A'��N�
��R*
yn7��C1Z�b���h?>�������[=�]e�����:?���B����w_�����uQ����@
�B
�n��<�BEr^������cR�����nP�k6�V������}>�!9,W$p�y�`�k}qR�	��Y�s	�����Ox&�:���PB&���uK���AW3%�ZKq�a��]i�t�{4Uy�4�B�DRA�44%��L��&��F��i�#;$���no'x��%u�U�wf�F������4@����\Y�N�E��rbW��_�FqL��KZ���������+4av�sf�n�.�>@��"	�?�p���N�������:uH�!������G7�sV|:��������[o*���5Bu�?9@��MX�x�k�y�O�z��F��|�f���K�|�����V�-�I��bJz�F�����m)��Qj;�38���0�9,fB�������'T.�j��0
�.4#c��������'�_����x��]�m��X����X���_?7�e������e����x�����z\���M��Sz\Z�<4�_�a���yu���M	��]J��r���PiE�N�,j �~����!�@�����C�O�7q��o�2a���������PK�=�l�PK�fpK/gather_analysis_all/gather_robert_patch/8_2.outUXU�Z�<
Z��X�n�8}�W���!x��i���M�n��b��&!���r���w(R���\�}X"�����9s��������������,��o��L�i�P��O�S&���q�j�������o���?�u�~9KF�L��T	k0��8��<�������5B��,���P����pX�aR�F�Jh�9
G1A�������$>�J.��=uQg�{��(;��\R��vK�X��:y1p���j���35-���(��YY	G���/	�(?$��+��+�[�1k��AM����4��f4f�V�4�2�����O^<����(�27�"�m�e2����[<�O`����)n��`n�*9�d�S
p���h�H�M$
�p��_��u��yQ6B�F)@	F-�5+�V>fVp �����p��t�E��$^9xH�e������8/@������B���YT��0��0ft��X���P�������<�>&�{�G�f5�$����������`U���elG��Q��6�Og:�LF�+���%e�g�	����l��������aK������p�Q�Y��!��H��i��i)���5���KPj	1�< ��������"b��3�"X1���a��;C~I=�-��i�j�R=i��Rh��2���P��3C�Tj,cw@�J@����"�
cj5���<��p���a,�h�f.-��z>x���(.�!m����,DA���*�0�6�L��(�X�e�,4�h�F��}A��$Cy�|1[�N�'��
���RJ�ynCo���1g�����)��tT�\�����IUX���;_]�����Bo�n�/n?��C_\��| �7
�@
�
n��4WF[��<�W.��{��M�l���c�fo�kT\���WW�&0����N�	\�������&�S�V�RG`�|<?��za����|�Jh�<k�$����h��)a���2NV�9w�����vSZ,XHJh�a��M	��&CW�9��~��i�#�Nvk���N$\%
����H���s�J��z�w���sW���/mb9�;�'l�U��R�yGk?V���@��BS��!g�����5�:�!��P��W]0���;�����O�Pv����Pg[����9>�;���������Tt��MD5������ ���V<��y����W���S3�������R+��YY~��?���D	{6%�p#l������� 
���:���pC�9�fB�M��A+�	e����>�C�����>���������I�>)~B�n|Zj3��'�&�f�������|�W���u���~+���i�.������ey,K������zv����|��[�.%��w=Nj��Pi2#
 ����>~s, W��G����g���DIB�3��a'�+����PK��N7u�PKOfpK/gather_analysis_all/gather_robert_patch/8_3.outUX
�Z><
Z��X]o�6}���[`!�-��$M�f��,Y0��Pm�bK�$�I�.E��l��c�#
D�)�{x���z�������������
�����8B��4���,�1!|�a"0v�Ci�#�Q�#��=ANP'��p��h�{���Rb��T��$If������B��8�D?5��&���x��q��F�Jxsj�b��W���y�d���.�S�tQg��0�;�/:L���\?���N��t��dP<�e'����p:7��d����$�'��0�)��t��{��@1��yl
#�2ZB#������O�>���v��u����:�����%�-L~lb��\��/TB%���1��x�3e�P	�
���#���"�V��'i���J
N`4`A�J��U��r����t{��v>�"��}
��(����SV�C_O�������m;Xm!�m�g��f��f������L��������������#�#��Rt�$�Br���T����,���a�gG���$<�l>�M"��l�y��O��P�p��h��f�D��gK#
�4Q��Rc�&��H��,O��tXGJwD��{A�G��� ��[���q�r��(�&_M���1d �X�f]���|g���Gp� �}Aq)����P���$������M��=��S�a��Q���i�op��(�%���%�\���UM�XG������UQ%����6�G�4�B�e��(��t,���v�z�a���e�,t�d�'�^C��0FI�L5[��Q����� ��<��7�@��7��k�$g�`N��)��:J;�A�2+*+T]���yq��G�w7W�_./��W7�(���BwPC��__�x��0;+W�H?[:�Y�Q4z��-��B���r�.5����y�V+P� �f�%jNML)�tZK���6����{]���$�����3�a��L	w)2�{��s(����O:���)F���0�.&h����,k�����&m���e�[����/P�������Sj�m�E���+:WF���_�i���Y	���U/��~Kk?j�v}��v����{�Yf��e}�dED�
� �Wtvs�j�~���)e�|�nw6�?�{���y��-X/'���"i)��{�R��w��9@�	�MPc|80�Y��=7�S7ZM^�L�\��[
--��}�[k�"+Q����`�|���m!�"��`�A\�SM������
�7UV�W�}Be]W��E��p��j����/��>�A�~�
��"v��R���A�W��=V�v�j��^'K����oG-�����p���������q�;m�_�2j�����X�+�����'�UL�8)9d���
$��D���j�cw�����iv�.��p^X��(2�P���q	����?PK�]VHt�PKUfpK/gather_analysis_all/gather_robert_patch/8_4.outUXA�ZI<
Z��X[O�8}�W��"-��q�2���XX���*�Z%�M:I:���s�M�BJ�<�����r���]����u~�/��<�~����������x�i^3!40W�CY�#?��G<,�9@�hP��1*��9(�XH�q�D��q�N��`���B��$�?
������h��QT������RL���B���n���r_��y>B�!���FY��6Q�tT���7�K���r���;���VyM��xHgE�9��&�`i��sT<�]�l��C�3TLT���F(��U�x(��B�������q�$fx�����,<��$n�`b���R\s�C]��J�%iX�I���#,8	�����l��EV[%�fE+U���6�!�*U��&XQ
\�f����;P��u����8/�xe`���,<��9\�I���O_Z�i���	tZ�vY��n��.�1vZ�wY��������D��#�/:Qo)9�B�T�����*��m�8�J,�����t:�M��~q�<�gt���EG�5�s$�rd�1,u�#
����[�2�!o��d:��E:��:S�!��kpB3D�j�1��1��ae!*��&_
�4����
S�1Sh!���_�G�p�ZP�����l:��s���d�f�<
b���J���4�[ Rf3/
$���d���
�k�:���3���u��%7�:��m����X����4��X��h�Z@|�r����D4�1�3��� JP� ���	��S������
|�k�&��!,r%grg�]����I{�~�<-3+d]0�������:������r~��������6�R�r�w0���9� *����E24O��<NF(>��z�b�������K
�����������~G ��XR��)���N�q�X�w�g���wl�t����B	�U�{��Xb[u�)����q��6gW�=�
M�!�:��|���(A�y42�h��\�t=�e���n����L��x���Sj�+��e����%�++�@�/��M������P�0�A������]��N�K���n��"�-]&�v�D�!��!��b���3T���);��q[�Y����N�����[�w,�.Jk*�b��2��N��������Y|���������k�jt9x�2(�*����V���/���c�Q�B>x)&�r�j��Z
��+)�-).�,�����P��M�������@X��6	b���h&�2'W��O�a�NQ�������-P�U����r_�W�U�^7~��J��;��k5^�z\�g��M.�sz��+m0���~b��T���nJ���R��yW������J'���0�-$��V>v0��B4�������]���2�uX��U�����PKjr�PK|fpK/gather_analysis_all/gather_robert_patch/9_1.outUXR�Z�<
Z��W�n�8��+�)b���R����#
2	��+Cc�Y�Hrw�o�KQ�m�q;Y7�)���s�B�����/o������74�z����CW�,�������``�NQ�}/�}�'�t(���q5�T�33�J)�	��S$�����u�+BwY��)rL�;'	�9sti�n3�Fm�/�8CiT�Y��Q��/�?�*3�����,LT���U���Y11E��Q=���L�U4�V5��U
��R3��w�;6?�69CU��^�����������0�p�<b��-N�$�a��Q9=��/�=�n��Z������zO��D9���w�[�r�c���7��k���1���hIKJu!�%��;��Iv����e�kbD�`����x"�&������}W�<Owl��n@��mw-x�����%��\+)�I����O�|�����]��2�p�L���z����B�d����9�����d��)WX�FC� W��9X���*��z,�4��Rr�Br(,7���JER����y�'�)p9���@C����+3��_���-u����8�P{���a�0`��yw�����Z�(���v���v�F�>K'��]�:�+q�,����
@^b%$\Y����@��( $��
�_A�Ml�Z-p��qq��'2�`��B�%z��
���
V���K�Pa�$1	�3���q��,EV�%h!��%TJ�  ���7!��+���A�����h����ew��,�����;b����G[���:�o���eg��q���N��S|^��=�'�#����d���E��q����6]Aa�}45�G
�RKu�>�����E=�A����ITgl��qB��&MCC�h�]l-�p��f�v��x���LUBW*��DQ=
,P����eto`�r���������Tu�tm���������2]����,{O�K>��"d�B�{�qO�J�n���|����6�}�'sVo��z0�'���h�<�o4���.�BW�\�b�t{��C5`U��	����Z���S=���K:������e��
��8�S-��������+���X���Xg�Z�������_�5i����&�����]���Q
�3�����.�x^��{e�f�o��s����PK�V�BUPKhfpK/gather_analysis_all/gather_robert_patch/9_2.outUXI�Zk<
Z��W�n�8��+�)b�/������#
2	����M�BdI#�m�E�}.EQ��G'Y7�)Q��s�D����_.o�F7W�oh��������y\!��de5�!!�c�1n�����#����j6�r����ZD	���	r��1n���-I���;��X�?#t�{L�P`J�>Ih@�2�ZsR�2�Fm���4��,��8���'��F��QQ���&*����<V� +��(q6�'��<�e�s�U��EU�A�����������^�Q����hf����6��8,xx��b��%N�$�n��Q9���/�=�n��Z�������H�$��y�H�����OE���\��U+������%-)U����L���3A$9���X���8
k}��e�A7I��<(�w-�������
���z��2�b��k���D��8��6�C�)7������\�N3EW��!BWb�"�3��.dD�F�@u��T���
\��VN��\9F��`QPc���3`	H\L� � 8��gq
�"�L1D�r��Il
\����,�%qj���q�����nc��Y'�RP
P�0l0������Y���V�c��C��P<��5�Y:�d�b�q_��f��6�+%E��E����V�@I������&����!�'��	rc��c�wh���@0�V	&(f�k�B&T�(IL�����n�(K�g�@3"C	nq8��g������&��IZ������fq?����UY��:{g��]Sw_l���[�<��'�Y��!�%S/��tj���2N�Q<}���'6$g���(�x���>z��+(�j�����Q��z�>�����E=�A����ITgl��� �����4

��Q�v����f������~1y0U	]�dpN�BTAO�w��K�X������9�s���N���
ho��WZ��0]����Z��W|����:N}���k����/���^!����wjn��H���
Nf������N��@�X��
-4eA����;���l/�`�Z"�~��6�G?�6�
�.�?�~�vV�8���FG����q��\�wM���
6�6.����>�s����>^�
i�;�V��W�������8�Hc������G3Y������n�"��i�;�z��PK����zSPK�fpK/gather_analysis_all/gather_robert_patch/9_3.outUX.�Z�<
Z��WMo�8��W�R�6�DJ\ i��m� ��(z2�6a�%�$�q��;E��?����3�����p�����\�}C��7�hu���A��,*�����!�c�1n�(O�~D�r:��������1*���@z
+
G�T�#q�f���Y�o�O��h 0%��$�#O�#Ksd�9;*���������'�$��?Pw�z��y�=[�0?��K�X��4�������m=x��2�e�V9M�e���&��C�����l~��m�Ge���������Y���a��������(	���F���b2��l7l�����J0��%1S
[B9����/;>��<{u��V
	�r����TQ+_2_~�=D���	#���+�s��v|M��n�0I4DP��w���t�F�������:��2�D�Th��H`*6B���T�_.���fn���kp��h&�
!<I0�u�F|Q����3��!���s����(��Cy�-�
�T2�p�C$������e��4J�T�����[��,�t����>��8JtT�����o7��F��%p���`Efu*QFQw[�7��a�I�;����Q�Qy�.M����v-��������K�<	W�[���o�"��3�0���IR��V%��������,�q@���V�D��E�R�/����Pa�8�1�����Q��4A����$��$�u��	'.s����B5��Sv���������nV�������/t��������l��Gw�y|�H��h,��%S��w2����y%��N���H��Y[W����!�z�f(($��q�������:���eT�x=���
������7^�}�E;�bF�m3@9��TS;Y;s�/��]��JF���2,����:V��'�`|�u=���u�����scd�Mi���/@L;^��&���s�l)A�Sw����]��%lT�q!i�����89%6C���wV��TiX]h�W����2p���\a�[��������}�Z"���P��n�*�nv]��G�x���r�o~��m��Y]r��+U���NP�q1�eV�l����x�N�5�7�F��V���7��a��#n_���z��y%M�2)�R���m�{���PK�@��~SPK�fpK/gather_analysis_all/gather_robert_patch/9_4.outUXU�Z�<
Z��W�n�8}�W���
Ln") i3�%
2	��O��&l!����6�C�}.E���8^��!��I�s�]x�����?_�}E��7ohu�������Y\!�ee5�!!�c�1^LQ�}/}���t eu�Q5�T�33���@b���I�,��^�:����x�
������%sbiNl3�Fm�/��GiT�Y��a��/�;�*3�����=��8��+�X�Y16E��a=��z��.�h��j��������C����l~2�m�GU�����������������ry�j�[�FI�����rz1�f�ji-�,@-*�p��SK� l��b����-z����������jE�USS,HKJu!�%��;�LIv�p�i�Z+L�������/Y�D�m�������y:���hi��`1���Z��g))cK�/��b)�|���3$_r��yy�}��
�4ct
p�"�����bC��]����j�A9� �S�3�p X#'DA��#�K�((�1UD�X
j���p�6��?�8�R�T���n9��$6.�v�`��%qj���p���uK���3�$N0����a�0`��y����m�j��C��P�����,�3w�����q�x������*�peQ�E��( ��@�X	�

l�`��j�������=�	p��ceH����kG�}�`Lb�/�WPb	*l�$&A��_t?�R�����M �����Pr���\�	��j�B,u��sh�N��"��y]v{����?��;�#��5u�dKw�A��
<����2��d�4��H���)>/�t�����Ge=�!�8���G��j�!}��MWP�j?�����Q��:Y��gT�@=�A��������7A���E7ibG�m+`k	�k����9=/��S���JF�P�������X��2�X�{����������T��tmlX�2�a�_C���\�M��g�+>�D�]�r��{�w~����
9�%mg�[s�{2��ehp4�g��oh�4�/4���.�BS�\�
������S
V�%B�]�V<�I�x���Wtq�u���z�
��6�S-������W�i�`]�v�bo>�LF�_�2 <���&M�f��Q�*�j���<,�>����hVv������t���,t�J�o��y����PK=R�K{SPK
�qpK@�Agather_analysis_all/UX#P
Z7P
ZPK�qpK���(@��Bgather_analysis_all/.DS_StoreUX7P
ZFP
ZPK
rpK	@�A�__MACOSX/UXXP
ZXP
ZPK
rpK@�A�__MACOSX/gather_analysis_all/UXXP
ZXP
ZPK�qpK���4x(@��%__MACOSX/gather_analysis_all/._.DS_StoreUX7P
ZFP
ZPK
�JiK @�A�gather_analysis_all/gather_head/UX
�Z&�ZPKfepK�?@B1�.(@��
	gather_analysis_all/gather_head/10_1.outUX�Z�:
ZPKrepK`s7)/(@���gather_analysis_all/gather_head/10_2.outUX�Z�:
ZPK|epK�w(b/(@��gather_analysis_all/gather_head/10_3.outUXA�Z�:
ZPKvepK��\�.(@���"gather_analysis_all/gather_head/10_4.outUX�:
Z�:
ZPK�epK��so��(@��
+gather_analysis_all/gather_head/11_1.outUX �Z�:
ZPKvepK-�����(@��Y/gather_analysis_all/gather_head/11_2.outUX�Z�:
ZPKQepK�R#���(@���3gather_analysis_all/gather_head/11_3.outUX`�Zb:
ZPK\epKy�
B��(@���7gather_analysis_all/gather_head/11_4.outUX�Zw:
ZPKaepK^_z�f'(@��=<gather_analysis_all/gather_head/12_1.outUX�Z}:
ZPKuepK��A(e'(@��	?gather_analysis_all/gather_head/12_2.outUX�Z�:
ZPKbepK�2J�d'(@���Agather_analysis_all/gather_head/12_3.outUXDP
Z:
ZPKTepKH�dRd'(@���Dgather_analysis_all/gather_head/12_4.outUXg:
Zg:
ZPK{epK���y�(@��hGgather_analysis_all/gather_head/13_1.outUX�Z�:
ZPKxepK'A�y�(@��GJgather_analysis_all/gather_head/13_2.outUX�:
Z�:
ZPKlepK���Qz�(@��&Mgather_analysis_all/gather_head/13_3.outUX��Z�:
ZPK�epK�=Zu�(@��Pgather_analysis_all/gather_head/13_4.outUX$�Z�:
ZPKiepKL�i���(@���Rgather_analysis_all/gather_head/14_1.outUX�Z�:
ZPKgepK������(@���Ugather_analysis_all/gather_head/14_2.outUX�Z�:
ZPKlepK�[�-��(@���Xgather_analysis_all/gather_head/14_3.outUX��Z�:
ZPK`epK.xw���(@���[gather_analysis_all/gather_head/14_4.outUX�Z|:
ZPKRepKN��*��(@���^gather_analysis_all/gather_head/15_1.outUX�Zc:
ZPK�epK�����(@���bgather_analysis_all/gather_head/15_2.outUX"�Z�:
ZPK�epK�$��(@���fgather_analysis_all/gather_head/15_3.outUX��Z�:
ZPKoepK�!���(@���jgather_analysis_all/gather_head/15_4.outUX�Z�:
ZPK�epK`����(@���ngather_analysis_all/gather_head/16_1.outUX%�Z�:
ZPKMepK�:M�(@���rgather_analysis_all/gather_head/16_2.outUX
�ZZ:
ZPKsepKyx~��(@���vgather_analysis_all/gather_head/16_3.outUX��Z�:
ZPK�epK�	��(@���zgather_analysis_all/gather_head/16_4.outUX#�Z�:
ZPKkepK�d�6"(@��gather_analysis_all/gather_head/17_1.outUX�Z�:
ZPKqepK����7"(@����gather_analysis_all/gather_head/17_2.outUX�Z�:
ZPK�epK����;"(@��H�gather_analysis_all/gather_head/17_3.outUX��Z�:
ZPKWepK�+6"(@����gather_analysis_all/gather_head/17_4.outUX�Zm:
ZPK�epKj(�|�(@����gather_analysis_all/gather_head/18_1.outUX&�Z�:
ZPK}epK&sW*|�(@��g�gather_analysis_all/gather_head/18_2.outUX �Z�:
ZPKtepK[T;�{�(@��I�gather_analysis_all/gather_head/18_3.outUX�Z�:
ZPKyepKJ,���(@��*�gather_analysis_all/gather_head/18_4.outUX��Z�:
ZPK�epKi�M(@���gather_analysis_all/gather_head/19_1.outUX"�Z�:
ZPK�epK"L�w#M(@����gather_analysis_all/gather_head/19_2.outUX�:
Z�:
ZPKtepK}�$� M(@���gather_analysis_all/gather_head/19_3.outUX�:
Z�:
ZPKpepK���l$W(@����gather_analysis_all/gather_head/19_4.outUX�Z�:
ZPKdepKz�	j'@��-�gather_analysis_all/gather_head/1_1.outUX�Z�:
ZPK�epKI&�!
j'@����gather_analysis_all/gather_head/1_2.outUX	�
Z�:
ZPKhepK�8q�j'@��
�gather_analysis_all/gather_head/1_3.outUXW�Z�:
ZPK}epK���j'@��q�gather_analysis_all/gather_head/1_4.outUX�Z�:
ZPKzepKT�<fm(@����gather_analysis_all/gather_head/20_1.outUX�Z�:
ZPKyepKRH�`l(@����gather_analysis_all/gather_head/20_2.outUX)�Z�:
ZPKNepK���u]l(@��p�gather_analysis_all/gather_head/20_3.outUX%�Z\:
ZPKOepK�f�Yal(@��3�gather_analysis_all/gather_head/20_4.outUX�Z^:
ZPKcepKoq����(@����gather_analysis_all/gather_head/21_1.outUX�:
Z�:
ZPKmepK�����(@���gather_analysis_all/gather_head/21_2.outUXu�Z�:
ZPK�epK������(@���gather_analysis_all/gather_head/21_3.outUXy�Z�:
ZPKZepK�c��(@��*�gather_analysis_all/gather_head/21_4.outUX~�Zs:
ZPKXepKK��
�	(@��7�gather_analysis_all/gather_head/22_1.outUX�Zo:
ZPK[epK>d6�	(@����gather_analysis_all/gather_head/22_2.outUXu:
Zu:
ZPK�epK|H3��	(@���gather_analysis_all/gather_head/22_3.outUX��Z�:
ZPKYepK���w�	(@����gather_analysis_all/gather_head/22_4.outUX�Zq:
ZPKnepK�3���'@���gather_analysis_all/gather_head/2_1.outUX�Z�:
ZPKiepK��:��'@��a�gather_analysis_all/gather_head/2_2.outUX�Z�:
ZPK�epK�[����'@����gather_analysis_all/gather_head/2_3.outUXx�Z�:
ZPKeepKC�����'@��!�gather_analysis_all/gather_head/2_4.outUX�Z�:
ZPKmepK%N�F'@���gather_analysis_all/gather_head/3_1.outUX�Z�:
ZPK�epKQ��S�F'@��pgather_analysis_all/gather_head/3_2.outUX%�Z�:
ZPKfepK�zF:�G'@��V
gather_analysis_all/gather_head/3_3.outUX��Z�:
ZPKpepK�og�F'@��?gather_analysis_all/gather_head/3_4.outUX�Z�:
ZPKkepK�w}�.~'@��)gather_analysis_all/gather_head/4_1.outUX�Z�:
ZPKXepK���+~'@���gather_analysis_all/gather_head/4_2.outUX�Zp:
ZPKcepKg��('@��Lgather_analysis_all/gather_head/4_3.outUX��Z�:
ZPKUepK��.~'@���!gather_analysis_all/gather_head/4_4.outUX�Zj:
ZPKVepK)+�T��'@��l%gather_analysis_all/gather_head/5_1.outUX�Zk:
ZPKWepK�g:�~'@��v+gather_analysis_all/gather_head/5_2.outUX�Zn:
ZPKwepKFw���~'@��y1gather_analysis_all/gather_head/5_3.outUX��Z�:
ZPK[epK��u��~'@��~7gather_analysis_all/gather_head/5_4.outUX�Zv:
ZPKSepK����'@���=gather_analysis_all/gather_head/6_1.outUXe:
Ze:
ZPKPepK'C���'@��@gather_analysis_all/gather_head/6_2.outUX�Z`:
ZPKjepKf�S�'@���Bgather_analysis_all/gather_head/6_3.outUX��Z�:
ZPK�epKax6"�'@��Egather_analysis_all/gather_head/6_4.outUX!�Z�:
ZPKPepK�a����'@���Ggather_analysis_all/gather_head/7_1.outUX�Z_:
ZPKdepKJ� o��'@��~Mgather_analysis_all/gather_head/7_2.outUX�Z�:
ZPK]epK�:���'@��zSgather_analysis_all/gather_head/7_3.outUX��Zz:
ZPK`epK�����'@��xYgather_analysis_all/gather_head/7_4.outUX�Z{:
ZPK\epK��eMq�'@��u_gather_analysis_all/gather_head/8_1.outUX�Zx:
ZPK�epK"���u�'@��Kegather_analysis_all/gather_head/8_2.outUX!�Z�:
ZPKOepKaB�n�'@��%kgather_analysis_all/gather_head/8_3.outUX�Z]:
ZPKUepK.F|�y�'@���pgather_analysis_all/gather_head/8_4.outUXi:
Zi:
ZPK{epK5dY�zU'@���vgather_analysis_all/gather_head/9_1.outUX�Z�:
ZPKhepKQ��|T'@���{gather_analysis_all/gather_head/9_2.outUX�Z�:
ZPK�epKO��z�T'@����gather_analysis_all/gather_head/9_3.outUX5�Z�:
ZPK�epK!H{�|S'@��{�gather_analysis_all/gather_head/9_4.outUX"�Z�:
ZPK
�KiK(@�A\�gather_analysis_all/gather_robert_patch/UX=�Zf�ZPK�LiK�����01@����gather_analysis_all/gather_robert_patch/.DS_StoreUXm
Z��ZPK
rpK1@�A��__MACOSX/gather_analysis_all/gather_robert_patch/UXXP
ZXP
ZPK�LiK���4x<@���__MACOSX/gather_analysis_all/gather_robert_patch/._.DS_StoreUXm
Z��ZPKffpK���6/0@����gather_analysis_all/gather_robert_patch/10_1.outUXh<
Zh<
ZPKsfpK&�LP/0@��T�gather_analysis_all/gather_robert_patch/10_2.outUX�<
Z�<
ZPK|fpK2�;-�.0@����gather_analysis_all/gather_robert_patch/10_3.outUXH�Z�<
ZPKwfpKl`\�/0@��u�gather_analysis_all/gather_robert_patch/10_4.outUXP�Z�<
ZPK�fpK�����0@����gather_analysis_all/gather_robert_patch/11_1.outUXS�Z�<
ZPKwfpK�4(E��0@��I�gather_analysis_all/gather_robert_patch/11_2.outUXP�Z�<
ZPKRfpK,w���0@����gather_analysis_all/gather_robert_patch/11_3.outUXY�ZD<
ZPK\fpK������0@����gather_analysis_all/gather_robert_patch/11_4.outUXD�ZW<
ZPK`fpK����b!0@��G�gather_analysis_all/gather_robert_patch/12_1.outUXF�Z\<
ZPKvfpK�|�_!0@���gather_analysis_all/gather_robert_patch/12_2.outUX=P
Z�<
ZPKafpK��'{c!0@����gather_analysis_all/gather_robert_patch/12_3.outUXu�Z^<
ZPKTfpKl��A`!0@����gather_analysis_all/gather_robert_patch/12_4.outUX@�ZG<
ZPK{fpK�1�@x�0@����gather_analysis_all/gather_robert_patch/13_1.outUXR�Z�<
ZPKyfpKO��*s�0@��i�gather_analysis_all/gather_robert_patch/13_2.outUX�Z�<
ZPKmfpK]RCw�0@��J�gather_analysis_all/gather_robert_patch/13_3.outUXL�Zv<
ZPK�fpK�DOv�0@��/�gather_analysis_all/gather_robert_patch/13_4.outUXW�Z�<
ZPKjfpK�����0@���gather_analysis_all/gather_robert_patch/14_1.outUXJ�Zo<
ZPKgfpK��-���0@��#�gather_analysis_all/gather_robert_patch/14_2.outUX��Zj<
ZPKmfpKl�����0@��3�gather_analysis_all/gather_robert_patch/14_3.outUXK�Zu<
ZPK`fpKv�HD��0@��D�gather_analysis_all/gather_robert_patch/14_4.outUXE�Z[<
ZPKSfpK`�>m��0@��N�gather_analysis_all/gather_robert_patch/15_1.outUX@�ZE<
ZPK�fpKMU)��0@��H�gather_analysis_all/gather_robert_patch/15_2.outUXU�Z�<
ZPK�fpK]&�h��0@��D�gather_analysis_all/gather_robert_patch/15_3.outUX��Z�<
ZPKpfpKB�Q��0@��:�gather_analysis_all/gather_robert_patch/15_4.outUX|<
Z|<
ZPK�fpKN�g�0@��0�gather_analysis_all/gather_robert_patch/16_1.outUXX�Z�<
ZPKNfpKA�h�0@��K�gather_analysis_all/gather_robert_patch/16_2.outUX<hZ<<
ZPKtfpK�6R��0@��h�gather_analysis_all/gather_robert_patch/16_3.outUX��Z�<
ZPK�fpK$���0@���gather_analysis_all/gather_robert_patch/16_4.outUXW�Z�<
ZPKkfpKH�!�9"0@���gather_analysis_all/gather_robert_patch/17_1.outUXK�Zr<
ZPKrfpKe�y~7"0@��I
gather_analysis_all/gather_robert_patch/17_2.outUX��Z�<
ZPK�fpK�X��7"0@���
gather_analysis_all/gather_robert_patch/17_3.outUX��Z�<
ZPKWfpK5�%�5"0@���gather_analysis_all/gather_robert_patch/17_4.outUX��ZM<
ZPK�fpKF�����0@��6gather_analysis_all/gather_robert_patch/18_1.outUXY�Z�<
ZPK�fpK/�.���0@��$gather_analysis_all/gather_robert_patch/18_2.outUXS�Z�<
ZPKufpK�kMz�0@��gather_analysis_all/gather_robert_patch/18_3.outUX��Z�<
ZPKyfpK�2��z�0@��� gather_analysis_all/gather_robert_patch/18_4.outUXQ�Z�<
ZPK�fpKM���"M0@���$gather_analysis_all/gather_robert_patch/19_1.outUXV�Z�<
ZPK�fpKZ�P!M0@��r(gather_analysis_all/gather_robert_patch/19_2.outUXV�Z�<
ZPKufpK4��M0@��,gather_analysis_all/gather_robert_patch/19_3.outUXO�Z�<
ZPKqfpK[�Cv!M0@���/gather_analysis_all/gather_robert_patch/19_4.outUX�Z}<
ZPKdfpK4N��j/@��3gather_analysis_all/gather_robert_patch/1_1.outUX2hZd<
ZPK�fpK���uj/@���5gather_analysis_all/gather_robert_patch/1_2.outUX�
Z�<
ZPKifpKy�j/@��8gather_analysis_all/gather_robert_patch/1_3.outUXI�Zm<
ZPK}fpK����j/@��r:gather_analysis_all/gather_robert_patch/1_4.outUXS�Z�<
ZPKzfpK���Xel0@���<gather_analysis_all/gather_robert_patch/20_1.outUXR�Z�<
ZPKzfpK�Ybl0@���Bgather_analysis_all/gather_robert_patch/20_2.outUX3�Z�<
ZPKOfpKRC?dl0@���Hgather_analysis_all/gather_robert_patch/20_3.outUX�Z=<
ZPKPfpK�jQnZl0@��XNgather_analysis_all/gather_robert_patch/20_4.outUX;�Z@<
ZPKbfpK��c���0@�� Tgather_analysis_all/gather_robert_patch/21_1.outUX`<
Z`<
ZPKnfpKl�'���0@��-Ygather_analysis_all/gather_robert_patch/21_2.outUX\�Zx<
ZPK�fpK7�`(��0@��B^gather_analysis_all/gather_robert_patch/21_3.outUXY�Z�<
ZPKZfpK�����0@��Wcgather_analysis_all/gather_robert_patch/21_4.outUXD�ZS<
ZPKXfpK%����	0@��vhgather_analysis_all/gather_robert_patch/22_1.outUXB�ZO<
ZPKZfpK�d�	0@���kgather_analysis_all/gather_robert_patch/22_2.outUX��ZT<
ZPK�fpKe'�~�	0@��wogather_analysis_all/gather_robert_patch/22_3.outUXW�Z�<
ZPKYfpK*�[H�	0@���rgather_analysis_all/gather_robert_patch/22_4.outUXC�ZR<
ZPKofpK������/@��qvgather_analysis_all/gather_robert_patch/2_1.outUXL�Zy<
ZPKifpK�2���/@���{gather_analysis_all/gather_robert_patch/2_2.outUXJ�Zn<
ZPK�fpK�(���/@��8�gather_analysis_all/gather_robert_patch/2_3.outUXV�Z�<
ZPKefpK?x����/@����gather_analysis_all/gather_robert_patch/2_4.outUXH�Zf<
ZPKnfpK\&��G/@���gather_analysis_all/gather_robert_patch/3_1.outUX2hZw<
ZPK�fpKd�U�G/@����gather_analysis_all/gather_robert_patch/3_2.outUXX�Z�<
ZPKgfpK#� ��F/@����gather_analysis_all/gather_robert_patch/3_3.outUX��Zi<
ZPKrfpKY���G/@����gather_analysis_all/gather_robert_patch/3_4.outUXM�Z<
ZPKlfpKi�X+~/@����gather_analysis_all/gather_robert_patch/4_1.outUXK�Zs<
ZPKYfpK��v�,~/@��_�gather_analysis_all/gather_robert_patch/4_2.outUXC�ZQ<
ZPKcfpK6Q��*/@����gather_analysis_all/gather_robert_patch/4_3.outUX��Zb<
ZPKUfpKLu�,~/@����gather_analysis_all/gather_robert_patch/4_4.outUXA�ZJ<
ZPKVfpK���n��/@��(�gather_analysis_all/gather_robert_patch/5_1.outUXA�ZL<
ZPKWfpK�{k{�~/@��/�gather_analysis_all/gather_robert_patch/5_2.outUXB�ZN<
ZPKxfpKy�6��/@��7�gather_analysis_all/gather_robert_patch/5_3.outUX��Z�<
ZPK[fpK���3��/@��E�gather_analysis_all/gather_robert_patch/5_4.outUXD�ZU<
ZPKSfpKi��c�/@��S�gather_analysis_all/gather_robert_patch/6_1.outUX@�ZF<
ZPKRfpKM�9;�/@����gather_analysis_all/gather_robert_patch/6_2.outUX?�ZC<
ZPKjfpK�V,�/@��g�gather_analysis_all/gather_robert_patch/6_3.outUX��Zp<
ZPK�fpKB���/@����gather_analysis_all/gather_robert_patch/6_4.outUXT�Z�<
ZPKQfpK�r����/@��v�gather_analysis_all/gather_robert_patch/7_1.outUX?�ZA<
ZPKdfpK�iG��/@��w�gather_analysis_all/gather_robert_patch/7_2.outUXG�Zc<
ZPK]fpKz��d��/@��~�gather_analysis_all/gather_robert_patch/7_3.outUX��ZY<
ZPK]fpK6ym���/@��x�gather_analysis_all/gather_robert_patch/7_4.outUXE�ZZ<
ZPK\fpK�=�l�/@����gather_analysis_all/gather_robert_patch/8_1.outUXD�ZX<
ZPK�fpK��N7u�/@��]�gather_analysis_all/gather_robert_patch/8_2.outUXU�Z�<
ZPKOfpK�]VHt�/@��?�gather_analysis_all/gather_robert_patch/8_3.outUX
�Z><
ZPKUfpKjr�/@�� �gather_analysis_all/gather_robert_patch/8_4.outUXA�ZI<
ZPK|fpK�V�BU/@����gather_analysis_all/gather_robert_patch/9_1.outUXR�Z�<
ZPKhfpK����zS/@���gather_analysis_all/gather_robert_patch/9_2.outUXI�Zk<
ZPK�fpK�@��~S/@���	gather_analysis_all/gather_robert_patch/9_3.outUX.�Z�<
ZPK�fpK=R�K{S/@���gather_analysis_all/gather_robert_patch/9_4.outUXU�Z�<
ZPK���I�
#44Ants Aasma
ants.aasma@eesti.ee
In reply to: Robert Haas (#42)
Re: [HACKERS] [POC] Faster processing at Gather node

On Thu, Nov 16, 2017 at 6:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The problem here is that we have no idea how big the queue needs to
be. The workers will always be happy to generate tuples faster than
the leader can read them, if that's possible, but it will only
sometimes help performance to let them do so. I think in most cases
we'll end up allocating the local queue - because the workers can
generate faster than the leader can read - but only occasionally will
it make anything faster.

For the Gather Merge driven by Parallel Index Scan case it seems to me
that the correct queue size is one that can store two index pages
worth of tuples. Additional space will always help buffer any
performance variations, but there should be a step function somewhere
around 1+1/n_workers pages. I wonder if the queue could be dynamically
sized based on the driving scan. With some limits of course as parent
nodes to the parallel index scan can increase the row count by
arbitrary amounts.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

#45Robert Haas
robertmhaas@gmail.com
In reply to: Ants Aasma (#44)
Re: [HACKERS] [POC] Faster processing at Gather node

On Thu, Nov 16, 2017 at 10:23 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

For the Gather Merge driven by Parallel Index Scan case it seems to me
that the correct queue size is one that can store two index pages
worth of tuples. Additional space will always help buffer any
performance variations, but there should be a step function somewhere
around 1+1/n_workers pages. I wonder if the queue could be dynamically
sized based on the driving scan. With some limits of course as parent
nodes to the parallel index scan can increase the row count by
arbitrary amounts.

Currently, Gather Merge can store 100 tuples + as much more stuff as
fits in a 64kB queue. That should already be more than 2 index pages,
I would think, although admittedly I haven't tested.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#46Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#37)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Fri, Nov 10, 2017 at 8:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 10, 2017 at 5:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am seeing the assertion failure as below on executing the above
mentioned Create statement:

TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File:
"heapam.c", Line: 2634)
server closed the connection unexpectedly
This probably means the server terminated abnormally

OK, I see it now. Not sure why I couldn't reproduce this before.

I think the problem is not actually with the code that I just wrote.
What I'm seeing is that the slot descriptor's tdhasoid value is false
for both the funnel slot and the result slot; therefore, we conclude
that no projection is needed to remove the OIDs. That seems to make
sense: if the funnel slot doesn't have OIDs and the result slot
doesn't have OIDs either, then we don't need to remove them.
Unfortunately, even though the funnel slot descriptor is marked
tdhashoid = false, the tuples being stored there actually do have
OIDs. And that is because they are coming from the underlying
sequential scan, which *also* has OIDs despite the fact that tdhasoid
for it's slot is false.

This had me really confused until I realized that there are two
processes involved. The problem is that we don't pass eflags down to
the child process -- so in the user backend, everybody agrees that
there shouldn't be OIDs anywhere, because EXEC_FLAG_WITHOUT_OIDS is
set. In the parallel worker, however, it's not set, so the worker
feels free to do whatever comes naturally, and in this test case that
happens to be returning tuples with OIDs. Patch for this attached.

I also noticed that the code that initializes the funnel slot is using
its own PlanState rather than the outer plan's PlanState to call
ExecContextForcesOids. I think that's formally incorrect, because the
goal is to end up with a slot that is the same as the outer plan's
slot. It doesn't matter because ExecContextForcesOids doesn't care
which PlanState it gets passed, but the comments in
ExecContextForcesOids imply that somebody it might, so perhaps it's
best to clean that up. Patch for this attached, too.

- if (!ExecContextForcesOids(&gatherstate->ps, &hasoid))
+ if (!ExecContextForcesOids(outerPlanState(gatherstate), &hasoid))
  hasoid = false;

Don't we need a similar change in nodeGatherMerge.c (in function
ExecInitGatherMerge)?

And here are the other patches again, too.

The 0001* patch doesn't apply, please find the attached rebased
version which I have used to verify the patch.

Now, along with 0001* and 0002*, 0003-skip-gather-project-v2 looks
good to me. I think we can proceed with the commit of 0001*~0003*
patches unless somebody else has any comments.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-pass-eflags-to-worker-v2.patchapplication/octet-stream; name=0001-pass-eflags-to-worker-v2.patchDownload
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 2ead32d..53c5254 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -69,6 +69,7 @@ typedef struct FixedParallelExecutorState
 {
 	int64		tuples_needed;	/* tuple bound, see ExecSetTupleBound */
 	dsa_pointer param_exec;
+	int			eflags;
 } FixedParallelExecutorState;
 
 /*
@@ -647,6 +648,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	fpes = shm_toc_allocate(pcxt->toc, sizeof(FixedParallelExecutorState));
 	fpes->tuples_needed = tuples_needed;
 	fpes->param_exec = InvalidDsaPointer;
+	fpes->eflags = estate->es_top_eflags;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
 
 	/* Store query string */
@@ -1224,7 +1226,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	area = dsa_attach_in_place(area_space, seg);
 
 	/* Start up the executor */
-	ExecutorStart(queryDesc, 0);
+	ExecutorStart(queryDesc, fpes->eflags);
 
 	/* Special executor initialization steps for parallel workers */
 	queryDesc->planstate->state->es_query_dsa = area;
#47Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#46)
Re: [HACKERS] [POC] Faster processing at Gather node

On Sat, Nov 18, 2017 at 7:23 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 10, 2017 at 8:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 10, 2017 at 5:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am seeing the assertion failure as below on executing the above
mentioned Create statement:

- if (!ExecContextForcesOids(&gatherstate->ps, &hasoid))
+ if (!ExecContextForcesOids(outerPlanState(gatherstate), &hasoid))
hasoid = false;

Don't we need a similar change in nodeGatherMerge.c (in function
ExecInitGatherMerge)?

And here are the other patches again, too.

The 0001* patch doesn't apply, please find the attached rebased
version which I have used to verify the patch.

Now, along with 0001* and 0002*, 0003-skip-gather-project-v2 looks
good to me. I think we can proceed with the commit of 0001*~0003*
patches unless somebody else has any comments.

I see that you have committed 0001* and 0002* patches, so continuing my review.

Review of 0006-remove-memory-leak-protection-v1

remove-memory-leak-protection-v1.patch removes the memory leak
protection that Tom installed upon discovering that the original
version of tqueue.c leaked memory like crazy. I think that it
shouldn't do that any more, courtesy of
6b65a7fe62e129d5c2b85cd74d6a91d8f7564608. Assuming that's correct, we
can avoid a whole lot of tuple copying in Gather Merge and a much more
modest amount of overhead in Gather. Since my test case exercised
Gather Merge, this bought ~400 ms or so.

I think Tom didn't installed memory protection in nodeGatherMerge.c
related to an additional copy of tuple. I could see it is present in
the original commit of Gather Merge
(355d3993c53ed62c5b53d020648e4fbcfbf5f155). Tom just avoided applying
heap_copytuple to a null tuple in his commit
04e9678614ec64ad9043174ac99a25b1dc45233a. I am not sure whether you
are just referring to the protection Tom added in nodeGather.c, If
so, it is not clear from your mail.

I think we don't need the additional copy of tuple in
nodeGatherMerge.c and your patch seem to be doing the right thing.
However, after your changes, it looks slightly odd that
gather_merge_clear_tuples() is explicitly calling heap_freetuple for
the tuples allocated by tqueue.c, maybe we can add some comment. It
was much clear before this patch as nodeGatherMerge.c was explicitly
copying the tuples that it is freeing.

Isn't it better to explicitly call gather_merge_clear_tuples in
ExecEndGatherMerge so that we can free the memory for tuples allocated
in this node rather than relying on reset/free of MemoryContext in
which those tuples are allocated?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#48Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Andres Freund (#40)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Thu, Nov 16, 2017 at 12:24 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-15 13:48:18 -0500, Robert Haas wrote:

I think that we need a little bit deeper analysis here to draw any
firm conclusions.

Indeed.

I suspect that one factor is that many of the queries actually send
very few rows through the Gather.

Yep. I kinda wonder if the same result would present if the benchmarks
were run with parallel_leader_participation. The theory being what were
seing is just that the leader doesn't accept any tuples, and the large
queue size just helps because workers can run for longer.

I ran Q12 with parallel_leader_participation = off and could not get
any performance improvement with the patches given by Robert.The
result was same for head as well. The query plan also remain
unaffected with the value of this parameter.

Here are the details of the experiment,
TPC-H scale factor = 20,
work_mem = 1GB
random_page_cost = seq_page_cost = 0.1
max_parallel_workers_per_gather = 4

PG commit: 745948422c799c1b9f976ee30f21a7aac050e0f3

Please find the attached file for the explain analyse output for
either values of parallel_leader_participation and patches.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

plp_patch.txttext/plain; charset=US-ASCII; name=plp_patch.txtDownload
#49Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#47)
Re: [HACKERS] [POC] Faster processing at Gather node

On Wed, Nov 22, 2017 at 8:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

remove-memory-leak-protection-v1.patch removes the memory leak
protection that Tom installed upon discovering that the original
version of tqueue.c leaked memory like crazy. I think that it
shouldn't do that any more, courtesy of
6b65a7fe62e129d5c2b85cd74d6a91d8f7564608. Assuming that's correct, we
can avoid a whole lot of tuple copying in Gather Merge and a much more
modest amount of overhead in Gather. Since my test case exercised
Gather Merge, this bought ~400 ms or so.

I think Tom didn't installed memory protection in nodeGatherMerge.c
related to an additional copy of tuple. I could see it is present in
the original commit of Gather Merge
(355d3993c53ed62c5b53d020648e4fbcfbf5f155). Tom just avoided applying
heap_copytuple to a null tuple in his commit
04e9678614ec64ad9043174ac99a25b1dc45233a. I am not sure whether you
are just referring to the protection Tom added in nodeGather.c, If
so, it is not clear from your mail.

Yes, that's what I mean. What got done for Gather Merge was motivated
by what Tom did for Gather. Sorry for not expressing the idea more
precisely.

I think we don't need the additional copy of tuple in
nodeGatherMerge.c and your patch seem to be doing the right thing.
However, after your changes, it looks slightly odd that
gather_merge_clear_tuples() is explicitly calling heap_freetuple for
the tuples allocated by tqueue.c, maybe we can add some comment. It
was much clear before this patch as nodeGatherMerge.c was explicitly
copying the tuples that it is freeing.

OK, I can add a comment about that.

Isn't it better to explicitly call gather_merge_clear_tuples in
ExecEndGatherMerge so that we can free the memory for tuples allocated
in this node rather than relying on reset/free of MemoryContext in
which those tuples are allocated?

Generally relying on reset/free of a MemoryContext is cheaper.
Typically you only want to free manually if the freeing would
otherwise not happen soon enough.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#50Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#49)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Sat, Nov 25, 2017 at 9:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 22, 2017 at 8:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

remove-memory-leak-protection-v1.patch removes the memory leak
protection that Tom installed upon discovering that the original
version of tqueue.c leaked memory like crazy. I think that it
shouldn't do that any more, courtesy of
6b65a7fe62e129d5c2b85cd74d6a91d8f7564608. Assuming that's correct, we
can avoid a whole lot of tuple copying in Gather Merge and a much more
modest amount of overhead in Gather. Since my test case exercised
Gather Merge, this bought ~400 ms or so.

I think Tom didn't installed memory protection in nodeGatherMerge.c
related to an additional copy of tuple. I could see it is present in
the original commit of Gather Merge
(355d3993c53ed62c5b53d020648e4fbcfbf5f155). Tom just avoided applying
heap_copytuple to a null tuple in his commit
04e9678614ec64ad9043174ac99a25b1dc45233a. I am not sure whether you
are just referring to the protection Tom added in nodeGather.c, If
so, it is not clear from your mail.

Yes, that's what I mean. What got done for Gather Merge was motivated
by what Tom did for Gather. Sorry for not expressing the idea more
precisely.

I think we don't need the additional copy of tuple in
nodeGatherMerge.c and your patch seem to be doing the right thing.
However, after your changes, it looks slightly odd that
gather_merge_clear_tuples() is explicitly calling heap_freetuple for
the tuples allocated by tqueue.c, maybe we can add some comment. It
was much clear before this patch as nodeGatherMerge.c was explicitly
copying the tuples that it is freeing.

OK, I can add a comment about that.

Sure, I think apart from that the patch looks good to me. I think a
good test of this patch could be to try to pass many tuples via gather
merge and see if there is any leak in memory. Do you have any other
ideas?

Isn't it better to explicitly call gather_merge_clear_tuples in
ExecEndGatherMerge so that we can free the memory for tuples allocated
in this node rather than relying on reset/free of MemoryContext in
which those tuples are allocated?

Generally relying on reset/free of a MemoryContext is cheaper.
Typically you only want to free manually if the freeing would
otherwise not happen soon enough.

Yeah and I think something like that can happen after your patch
because now the memory for tuples returned via TupleQueueReaderNext
will be allocated in ExecutorState and that can last for long. I
think it is better to free memory, but we can leave it as well if you
don't feel it important. In any case, I have written a patch, see if
you think it makes sense.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

release_memory_at_gather_merge_shutdown_v1.patchapplication/octet-stream; name=release_memory_at_gather_merge_shutdown_v1.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 4a8a59e..2843e42 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -311,6 +311,12 @@ ExecShutdownGatherMerge(GatherMergeState *node)
 static void
 ExecShutdownGatherMergeWorkers(GatherMergeState *node)
 {
+	/*
+	 * Free any unused tuples, so we don't leak memory across rescans or after
+	 * shutdown.
+	 */
+	gather_merge_clear_tuples(node);
+
 	if (node->pei != NULL)
 		ExecParallelFinish(node->pei);
 
@@ -335,9 +341,6 @@ ExecReScanGatherMerge(GatherMergeState *node)
 	/* Make sure any existing workers are gracefully shut down */
 	ExecShutdownGatherMergeWorkers(node);
 
-	/* Free any unused tuples, so we don't leak memory across rescans */
-	gather_merge_clear_tuples(node);
-
 	/* Mark node so that shared state will be rebuilt at next call */
 	node->initialized = false;
 	node->gm_initialized = false;
#51Michael Paquier
michael.paquier@gmail.com
In reply to: Amit Kapila (#50)
Re: [HACKERS] [POC] Faster processing at Gather node

On Sun, Nov 26, 2017 at 5:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah and I think something like that can happen after your patch
because now the memory for tuples returned via TupleQueueReaderNext
will be allocated in ExecutorState and that can last for long. I
think it is better to free memory, but we can leave it as well if you
don't feel it important. In any case, I have written a patch, see if
you think it makes sense.

OK. I can see some fresh and unreviewed patches so moved to next CF.
--
Michael

#52Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#50)
Re: [HACKERS] [POC] Faster processing at Gather node

On Sun, Nov 26, 2017 at 3:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah and I think something like that can happen after your patch
because now the memory for tuples returned via TupleQueueReaderNext
will be allocated in ExecutorState and that can last for long. I
think it is better to free memory, but we can leave it as well if you
don't feel it important. In any case, I have written a patch, see if
you think it makes sense.

Well, I don't really know. My intuition is that in most cases after
ExecShutdownGatherMergeWorkers() we will very shortly thereafter call
ExecutorEnd() and everything will go away. Maybe that's wrong, but
Tom put that call where it is in
2d44c58c79aeef2d376be0141057afbb9ec6b5bc, and he could have put it
inside ExecShutdownGatherMergeWorkers() instead. Now maybe he didn't
consider that approach, but Tom is usually smart about stuff like
that...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#53Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#52)
Re: [HACKERS] [POC] Faster processing at Gather node

On Fri, Dec 1, 2017 at 8:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Nov 26, 2017 at 3:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah and I think something like that can happen after your patch
because now the memory for tuples returned via TupleQueueReaderNext
will be allocated in ExecutorState and that can last for long. I
think it is better to free memory, but we can leave it as well if you
don't feel it important. In any case, I have written a patch, see if
you think it makes sense.

Well, I don't really know. My intuition is that in most cases after
ExecShutdownGatherMergeWorkers() we will very shortly thereafter call
ExecutorEnd() and everything will go away.

I thought there are some cases (though less) where we want to Shutdown
the nodes (ExecShutdownNode) earlier and release the resources sooner.
However, if you are not completely sure about this change, then we can
leave it as it. Thanks for sharing your thoughts.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#54Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#53)
2 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Sun, Dec 3, 2017 at 10:30 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought there are some cases (though less) where we want to Shutdown
the nodes (ExecShutdownNode) earlier and release the resources sooner.
However, if you are not completely sure about this change, then we can
leave it as it. Thanks for sharing your thoughts.

OK, thanks. I committed that patch, after first running 100 million
tuples through a Gather over and over again to test for leaks.
Hopefully I haven't missed anything here, but it looks like it's
solid. Here once again are the remaining patches. While the
already-committed patches are nice, these two are the ones that
actually produced big improvements in my testing, so it would be good
to move them along. Any reviews appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-shm-mq-less-spinlocks-v2.patchapplication/octet-stream; name=0001-shm-mq-less-spinlocks-v2.patchDownload
From bae1bea8d1af58683a62203019b78a1541116747 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 17:42:53 +0100
Subject: [PATCH 1/2] shm-mq-less-spinlocks-v2

---
 src/backend/storage/ipc/shm_mq.c | 237 +++++++++++++++++++--------------------
 1 file changed, 116 insertions(+), 121 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 770559a03e..75c6bbd4fb 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -31,27 +31,27 @@
  * Some notes on synchronization:
  *
  * mq_receiver and mq_bytes_read can only be changed by the receiver; and
- * mq_sender and mq_bytes_written can only be changed by the sender.  However,
- * because most of these fields are 8 bytes and we don't assume that 8 byte
- * reads and writes are atomic, the spinlock must be taken whenever the field
- * is updated, and whenever it is read by a process other than the one allowed
- * to modify it. But the process that is allowed to modify it is also allowed
- * to read it without the lock.  On architectures where 8-byte writes are
- * atomic, we could replace these spinlocks with memory barriers, but
- * testing found no performance benefit, so it seems best to keep things
- * simple for now.
+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.
  *
- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.
  *
  * mq_ring_size and mq_ring_offset never change after initialization, and
  * can therefore be read without the lock.
  *
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * At any given time, the difference between mq_bytes_read and
  * mq_bytes_written defines the number of bytes within mq_ring that contain
  * unread data, and mq_bytes_read defines the position where those bytes
  * begin.  The sender can increase the number of unread bytes at any time,
@@ -71,8 +71,8 @@ struct shm_mq
 	slock_t		mq_mutex;
 	PGPROC	   *mq_receiver;
 	PGPROC	   *mq_sender;
-	uint64		mq_bytes_read;
-	uint64		mq_bytes_written;
+	pg_atomic_uint64 mq_bytes_read;
+	pg_atomic_uint64 mq_bytes_written;
 	Size		mq_ring_size;
 	bool		mq_detached;
 	uint8		mq_ring_offset;
@@ -150,11 +150,8 @@ static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 					 BackgroundWorkerHandle *handle);
-static uint64 shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n);
-static uint64 shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n);
-static shm_mq_result shm_mq_notify_receiver(volatile shm_mq *mq);
 static void shm_mq_detach_callback(dsm_segment *seg, Datum arg);
 
 /* Minimum queue size is enough for header and at least one chunk of data. */
@@ -182,8 +179,8 @@ shm_mq_create(void *address, Size size)
 	SpinLockInit(&mq->mq_mutex);
 	mq->mq_receiver = NULL;
 	mq->mq_sender = NULL;
-	mq->mq_bytes_read = 0;
-	mq->mq_bytes_written = 0;
+	pg_atomic_init_u64(&mq->mq_bytes_read, 0);
+	pg_atomic_init_u64(&mq->mq_bytes_written, 0);
 	mq->mq_ring_size = size - data_offset;
 	mq->mq_detached = false;
 	mq->mq_ring_offset = data_offset - offsetof(shm_mq, mq_ring);
@@ -352,6 +349,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	PGPROC	   *receiver;
 	Size		nbytes = 0;
 	Size		bytes_written;
 	int			i;
@@ -492,8 +490,30 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_length_word_complete = false;
 
+	/* If queue has been detached, let caller know. */
+	if (mq->mq_detached)
+		return SHM_MQ_DETACHED;
+
+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */
+	if (mqh->mqh_counterparty_attached)
+		receiver = mq->mq_receiver;
+	else
+	{
+		SpinLockAcquire(&mq->mq_mutex);
+		receiver = mq->mq_receiver;
+		SpinLockRelease(&mq->mq_mutex);
+		if (receiver == NULL)
+			return SHM_MQ_SUCCESS;
+		mqh->mqh_counterparty_attached = true;
+	}
+
 	/* Notify receiver of the newly-written data, and return. */
-	return shm_mq_notify_receiver(mq);
+	SetLatch(&receiver->procLatch);
+	return SHM_MQ_SUCCESS;
 }
 
 /*
@@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 
 	while (sent < nbytes)
 	{
-		bool		detached;
 		uint64		rb;
+		uint64		wb;
 
 		/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;
 		Assert(used <= ringsize);
 		available = Min(ringsize - used, nbytes - sent);
 
 		/* Bail out if the queue has been detached. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			*bytes_written = sent;
 			return SHM_MQ_DETACHED;
@@ -900,15 +921,13 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else if (available == 0)
 		{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			Assert(mqh->mqh_counterparty_attached);
+			SetLatch(&mq->mq_receiver->procLatch);
 
 			/* Skip manipulation of our latch if nowait = true. */
 			if (nowait)
@@ -934,10 +953,18 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else
 		{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
+
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);
 
-			/* Write as much data as we can via a single memcpy(). */
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 */
+			pg_memory_barrier();
 			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
 				   (char *) data + sent, sendnow);
 			sent += sendnow;
@@ -983,19 +1010,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 	for (;;)
 	{
 		Size		offset;
-		bool		detached;
+		uint64		read;
 
 		/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+		used = written - read;
 		Assert(used <= ringsize);
-		offset = mq->mq_bytes_read % (uint64) ringsize;
+		offset = read % (uint64) ringsize;
 
 		/* If we have enough data or buffer has wrapped, we're done. */
 		if (used >= bytes_needed || offset + used >= ringsize)
 		{
 			*nbytesp = Min(used, ringsize - offset);
 			*datap = &mq->mq_ring[mq->mq_ring_offset + offset];
+
+			/*
+			 * Separate the read of mq_bytes_written, above, from caller's
+			 * attempt to read the data itself.  Pairs with the barrier in
+			 * shm_mq_inc_bytes_written.
+			 */
+			pg_read_barrier();
 			return SHM_MQ_SUCCESS;
 		}
 
@@ -1007,7 +1042,7 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		 * receiving a message stored in the buffer even after the sender has
 		 * detached.
 		 */
-		if (detached)
+		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1037,16 +1072,10 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 static bool
 shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 {
-	bool		detached;
 	pid_t		pid;
 
-	/* Acquire the lock just long enough to check the pointer. */
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
 	/* If the queue has been detached, counterparty is definitely gone. */
-	if (detached)
+	if (mq->mq_detached)
 		return true;
 
 	/* If there's a handle, check worker status. */
@@ -1059,9 +1088,7 @@ shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 		if (status != BGWH_STARTED && status != BGWH_NOT_YET_STARTED)
 		{
 			/* Mark it detached, just to make it official. */
-			SpinLockAcquire(&mq->mq_mutex);
 			mq->mq_detached = true;
-			SpinLockRelease(&mq->mq_mutex);
 			return true;
 		}
 	}
@@ -1091,16 +1118,14 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	{
 		BgwHandleStatus status;
 		pid_t		pid;
-		bool		detached;
 
 		/* Acquire the lock just long enough to check the pointer. */
 		SpinLockAcquire(&mq->mq_mutex);
-		detached = mq->mq_detached;
 		result = (*ptr != NULL);
 		SpinLockRelease(&mq->mq_mutex);
 
 		/* Fail if detached; else succeed if initialized. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			result = false;
 			break;
@@ -1133,23 +1158,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 }
 
 /*
- * Get the number of bytes read.  The receiver need not use this to access
- * the count of bytes read, but the sender must.
- */
-static uint64
-shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_read;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes read.
  */
 static void
@@ -1157,63 +1165,50 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
 	PGPROC	   *sender;
 
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes().
+	 * We only need a read barrier here because the increment of mq_bytes_read
+	 * is actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method should be cheaper.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_read,
+						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
+
+	/*
+	 * We shouldn't have any bytes to read without a sender, so we can read
+	 * mq_sender here without a lock.  Once it's initialized, it can't change.
+	 */
 	sender = mq->mq_sender;
-	SpinLockRelease(&mq->mq_mutex);
-
-	/* We shouldn't have any bytes to read without a sender. */
 	Assert(sender != NULL);
 	SetLatch(&sender->procLatch);
 }
 
 /*
- * Get the number of bytes written.  The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_written;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes written.
  */
 static void
 shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n)
 {
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_written += n;
-	SpinLockRelease(&mq->mq_mutex);
-}
-
-/*
- * Set receiver's latch, unless queue is detached.
- */
-static shm_mq_result
-shm_mq_notify_receiver(volatile shm_mq *mq)
-{
-	PGPROC	   *receiver;
-	bool		detached;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	receiver = mq->mq_receiver;
-	SpinLockRelease(&mq->mq_mutex);
-
-	if (detached)
-		return SHM_MQ_DETACHED;
-	if (receiver)
-		SetLatch(&receiver->procLatch);
-	return SHM_MQ_SUCCESS;
+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with the read barrier found in
+	 * shm_mq_get_receive_bytes.
+	 */
+	pg_write_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_written,
+						pg_atomic_read_u64(&mq->mq_bytes_written) + n);
 }
 
 /* Shim for on_dsm_callback. */
-- 
2.13.5 (Apple Git-94)

0002-shm-mq-reduce-receiver-latch-set-v1.patchapplication/octet-stream; name=0002-shm-mq-reduce-receiver-latch-set-v1.patchDownload
From 666d33a363036a647dde83cb28b9d7ad0b31f76c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 2/2] shm-mq-reduce-receiver-latch-set-v1

---
 src/backend/storage/ipc/shm_mq.c | 69 +++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 75c6bbd4fb..ceeb487390 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -142,10 +142,10 @@ struct shm_mq_handle
 };
 
 static void shm_mq_detach_internal(shm_mq *mq);
-static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
+static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes,
 				  const void *data, bool nowait, Size *bytes_written);
-static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
-					 bool nowait, Size *nbytesp, void **datap);
+static shm_mq_result shm_mq_receive_bytes(shm_mq_handle *mqh,
+				  Size bytes_needed, bool nowait, Size *nbytesp, void **datap);
 static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
@@ -585,8 +585,14 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_counterparty_attached = true;
 	}
 
-	/* Consume any zero-copy data from previous receive operation. */
-	if (mqh->mqh_consume_pending > 0)
+	/*
+	 * If we've consumed an amount of data greater than 1/4th of the ring
+	 * size, mark it consumed in shared memory.  We try to avoid doing this
+	 * unnecessarily when only a small amount of data has been consumed,
+	 * because SetLatch() is fairly expensive and we don't want to do it
+	 * too often.
+	 */
+	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
 	{
 		shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
 		mqh->mqh_consume_pending = 0;
@@ -597,7 +603,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 	{
 		/* Try to receive the message length word. */
 		Assert(mqh->mqh_partial_bytes < sizeof(Size));
-		res = shm_mq_receive_bytes(mq, sizeof(Size) - mqh->mqh_partial_bytes,
+		res = shm_mq_receive_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
 								   nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
@@ -617,13 +623,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			needed = MAXALIGN(sizeof(Size)) + MAXALIGN(nbytes);
 			if (rb >= needed)
 			{
-				/*
-				 * Technically, we could consume the message length
-				 * information at this point, but the extra write to shared
-				 * memory wouldn't be free and in most cases we would reap no
-				 * benefit.
-				 */
-				mqh->mqh_consume_pending = needed;
+				mqh->mqh_consume_pending += needed;
 				*nbytesp = nbytes;
 				*datap = ((char *) rawdata) + MAXALIGN(sizeof(Size));
 				return SHM_MQ_SUCCESS;
@@ -635,7 +635,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			 */
 			mqh->mqh_expected_bytes = nbytes;
 			mqh->mqh_length_word_complete = true;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(sizeof(Size)));
+			mqh->mqh_consume_pending += MAXALIGN(sizeof(Size));
 			rb -= MAXALIGN(sizeof(Size));
 		}
 		else
@@ -654,7 +654,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			}
 			Assert(mqh->mqh_buflen >= sizeof(Size));
 
-			/* Copy and consume partial length word. */
+			/* Copy partial length word; remember to consume it. */
 			if (mqh->mqh_partial_bytes + rb > sizeof(Size))
 				lengthbytes = sizeof(Size) - mqh->mqh_partial_bytes;
 			else
@@ -662,7 +662,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			memcpy(&mqh->mqh_buffer[mqh->mqh_partial_bytes], rawdata,
 				   lengthbytes);
 			mqh->mqh_partial_bytes += lengthbytes;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(lengthbytes));
+			mqh->mqh_consume_pending += MAXALIGN(lengthbytes);
 			rb -= lengthbytes;
 
 			/* If we now have the whole word, we're ready to read payload. */
@@ -684,13 +684,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		 * we need not copy the data and can return a pointer directly into
 		 * shared memory.
 		 */
-		res = shm_mq_receive_bytes(mq, nbytes, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, nbytes, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb >= nbytes)
 		{
 			mqh->mqh_length_word_complete = false;
-			mqh->mqh_consume_pending = MAXALIGN(nbytes);
+			mqh->mqh_consume_pending += MAXALIGN(nbytes);
 			*nbytesp = nbytes;
 			*datap = rawdata;
 			return SHM_MQ_SUCCESS;
@@ -730,13 +730,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_partial_bytes += rb;
 
 		/*
-		 * Update count of bytes read, with alignment padding.  Note that this
-		 * will never actually insert any padding except at the end of a
-		 * message, because the buffer size is a multiple of MAXIMUM_ALIGNOF,
-		 * and each read and write is as well.
+		 * Update count of bytes that can be consumed, accounting for
+		 * alignment padding.  Note that this will never actually insert any
+		 * padding except at the end of a message, because the buffer size is
+		 * a multiple of MAXIMUM_ALIGNOF, and each read and write is as well.
 		 */
 		Assert(mqh->mqh_partial_bytes == nbytes || rb == MAXALIGN(rb));
-		shm_mq_inc_bytes_read(mq, MAXALIGN(rb));
+		mqh->mqh_consume_pending += MAXALIGN(rb);
 
 		/* If we got all the data, exit the loop. */
 		if (mqh->mqh_partial_bytes >= nbytes)
@@ -744,7 +744,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 
 		/* Wait for some more data. */
 		still_needed = nbytes - mqh->mqh_partial_bytes;
-		res = shm_mq_receive_bytes(mq, still_needed, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, still_needed, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb > still_needed)
@@ -1000,9 +1000,10 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
  * is SHM_MQ_SUCCESS.
  */
 static shm_mq_result
-shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
+shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 					 Size *nbytesp, void **datap)
 {
+	shm_mq	   *mq = mqh->mqh_queue;
 	Size		ringsize = mq->mq_ring_size;
 	uint64		used;
 	uint64		written;
@@ -1014,7 +1015,13 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 
 		/* Get bytes written, so we can compute what's available to read. */
 		written = pg_atomic_read_u64(&mq->mq_bytes_written);
-		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+
+		/*
+		 * Get bytes read.  Include bytes we could consume but have not yet
+		 * consumed.
+		 */
+		read = pg_atomic_read_u64(&mq->mq_bytes_read) +
+			mqh->mqh_consume_pending;
 		used = written - read;
 		Assert(used <= ringsize);
 		offset = read % (uint64) ringsize;
@@ -1045,6 +1052,16 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
+		/*
+		 * We didn't get enough data to satisfy the request, so mark any
+		 * data previously-consumed as read to make more buffer space.
+		 */
+		if (mqh->mqh_consume_pending > 0)
+		{
+			shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
+			mqh->mqh_consume_pending = 0;
+		}
+
 		/* Skip manipulation of our latch if nowait = true. */
 		if (nowait)
 			return SHM_MQ_WOULD_BLOCK;
-- 
2.13.5 (Apple Git-94)

#55Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Robert Haas (#54)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Mon, Dec 4, 2017 at 9:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Dec 3, 2017 at 10:30 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

I thought there are some cases (though less) where we want to Shutdown
the nodes (ExecShutdownNode) earlier and release the resources sooner.
However, if you are not completely sure about this change, then we can
leave it as it. Thanks for sharing your thoughts.

OK, thanks. I committed that patch, after first running 100 million
tuples through a Gather over and over again to test for leaks.
Hopefully I haven't missed anything here, but it looks like it's
solid. Here once again are the remaining patches. While the
already-committed patches are nice, these two are the ones that

Hi,
I was spending sometime in verifying this memory-leak patch for
gather-merge case and I too found it good. In the query I tried, around 10
million tuples were passed through gather-merge. On analysing the output of
top it looks acceptable memory usage and it gets freed once the query is
completed. Since, I was trying on my local system only, I tested for upto 8
workers and didn't find any memory leaks for the queries I tried.
One may find the attached file for the test-case.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

ml_test_query.sqlapplication/octet-stream; name=ml_test_query.sqlDownload
#56Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#54)
Re: [HACKERS] [POC] Faster processing at Gather node

Hi,

On 2017-12-04 10:50:53 -0500, Robert Haas wrote:

Subject: [PATCH 1/2] shm-mq-less-spinlocks-v2

+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.

I don't recall our conversation around this anymore, and haven't read
down far enough to see the relevant code. Lest I forget: Such construct
often need careful use of barriers.

- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.

s/should/is/ or similar?

Perhaps a short benchmark for 32bit systems using shm_mq wouldn't hurt?
I suspect there won't be much of a performance impact, but it's probably
worth checking.

* mq_ring_size and mq_ring_offset never change after initialization, and
* can therefore be read without the lock.
*
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * At any given time, the difference between mq_bytes_read and

Hm, why did you remove the first part about mq_ring itself?

@@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,

while (sent < nbytes)
{
- bool detached;
uint64 rb;
+ uint64 wb;

/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;

Just to make sure my understanding is correct: No barriers needed here
because "bytes_written" is only written by the sending backend, and
"bytes_read" cannot lap it. Correct?

Assert(used <= ringsize);
available = Min(ringsize - used, nbytes - sent);

/* Bail out if the queue has been detached. */
-		if (detached)
+		if (mq->mq_detached)

Hm, do all paths here guarantee that mq->mq_detached won't be stored on
the stack / register in the first iteration, and then not reread any
further? I think it's fine because every branch of the if below ends up
in a syscall / barrier, but it might be worth noting on that here.

+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			Assert(mqh->mqh_counterparty_attached);
+			SetLatch(&mq->mq_receiver->procLatch);

Perhaps mention that this could lead to spuriously signalling the wrong
backend in case of detach, but that that is fine?

/* Skip manipulation of our latch if nowait = true. */
if (nowait)
@@ -934,10 +953,18 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
}
else
{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
+
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);

I know the scheme isn't new, but I do find it not immediately obvious
that 'wb' is short for 'bytes_written'.

-			/* Write as much data as we can via a single memcpy(). */
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 */

s/above/above. Otherwise a newer mq_bytes_read could become visible
before the corresponding reads have fully finished./?

Could you also add a comment as to why you think a read barrier isn't
sufficient? IIUC that's the case because we need to prevent reordering
in both directions: Can't neither start reading based on a "too new"
bytes_read, nor can affort writes to mq_ring being reordered to before
the barrier. Correct?

+ pg_memory_barrier();
memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
(char *) data + sent, sendnow);
sent += sendnow;

Btw, this mq_ring_offset stuff seems a bit silly, why don't we use
proper padding/union in the struct to make it unnecessary to do that bit
of offset calculation every time? I think it currently prevents
efficient address calculation instructions from being used.

From 666d33a363036a647dde83cb28b9d7ad0b31f76c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 2/2] shm-mq-reduce-receiver-latch-set-v1

-	/* Consume any zero-copy data from previous receive operation. */
-	if (mqh->mqh_consume_pending > 0)
+	/*
+	 * If we've consumed an amount of data greater than 1/4th of the ring
+	 * size, mark it consumed in shared memory.  We try to avoid doing this
+	 * unnecessarily when only a small amount of data has been consumed,
+	 * because SetLatch() is fairly expensive and we don't want to do it
+	 * too often.
+	 */
+	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
{

Hm. Why are we doing this at the level of updating the variables, rather
than SetLatch calls?

Greetings,

Andres Freund

#57Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#56)
2 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Tue, Jan 9, 2018 at 7:09 PM, Andres Freund <andres@anarazel.de> wrote:

+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.

I don't recall our conversation around this anymore, and haven't read
down far enough to see the relevant code. Lest I forget: Such construct
often need careful use of barriers.

I think the only thing the code assumes here is that if we previously
read the value with the spinlock and didn't get NULL, we can later
read the value without the spinlock and count on seeing the same value
we saw previously. I think that's safe enough.

s/should/is/ or similar?

I prefer it the way that I have it.

Perhaps a short benchmark for 32bit systems using shm_mq wouldn't hurt?
I suspect there won't be much of a performance impact, but it's probably
worth checking.

I don't think I understand your concern here. If this is used on a
system where we're emulating atomics and barriers in painful ways, it
might hurt performance, but I think we have a policy of not caring.

Also, I don't know where I'd find a 32-bit system to test.

- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * At any given time, the difference between mq_bytes_read and

Hm, why did you remove the first part about mq_ring itself?

Bad editing. Restored.

@@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,

while (sent < nbytes)
{
- bool detached;
uint64 rb;
+ uint64 wb;

/* Compute number of ring buffer bytes used and available. */
-             rb = shm_mq_get_bytes_read(mq, &detached);
-             Assert(mq->mq_bytes_written >= rb);
-             used = mq->mq_bytes_written - rb;
+             rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+             wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+             Assert(wb >= rb);
+             used = wb - rb;

Just to make sure my understanding is correct: No barriers needed here
because "bytes_written" is only written by the sending backend, and
"bytes_read" cannot lap it. Correct?

We can't possibly read a stale value of mq_bytes_written because we
are the only process that can write it. It's possible that the
receiver has just increased mq_bytes_read and that the change isn't
visible to us yet, but if so, the sender's also going to set our
latch, or has done so already. So the worst thing that happens is
that we decide to sleep because it looks like no space is available
and almost immediately get woken up because there really is space.

Assert(used <= ringsize);
available = Min(ringsize - used, nbytes - sent);

/* Bail out if the queue has been detached. */
-             if (detached)
+             if (mq->mq_detached)

Hm, do all paths here guarantee that mq->mq_detached won't be stored on
the stack / register in the first iteration, and then not reread any
further? I think it's fine because every branch of the if below ends up
in a syscall / barrier, but it might be worth noting on that here.

Aargh. I hate compilers. I added a comment. Should I think about
changing mq_detached to pg_atomic_uint32 instead?

Perhaps mention that this could lead to spuriously signalling the wrong
backend in case of detach, but that that is fine?

I think that's a general risk of latches that doesn't need to be
specifically recapitulated here.

I know the scheme isn't new, but I do find it not immediately obvious
that 'wb' is short for 'bytes_written'.

Sorry.

-                     /* Write as much data as we can via a single memcpy(). */
+                     /*
+                      * Write as much data as we can via a single memcpy(). Make sure
+                      * these writes happen after the read of mq_bytes_read, above.
+                      * This barrier pairs with the one in shm_mq_inc_bytes_read.
+                      */

s/above/above. Otherwise a newer mq_bytes_read could become visible
before the corresponding reads have fully finished./?

I don't find that very clear. A newer mq_bytes_read could become
visible whenever, and a barrier doesn't prevent that from happening.
What it does is ensure (together with the one in
shm_mq_inc_bytes_read) that we don't try to read bytes that aren't
fully *written* yet.

Generally, my mental model is that barriers make things happen in
program order rather than some other order that the CPU thinks would
be fun. Imagine a single-core server doing all of this stuff the "old
school" way. If the reader puts data into the queue before
advertising its presence and the writer finishes using the data from
the queue before advertising its consumption, then everything works.
If you do anything else, it's flat busted, even on that single-core
system, because a context switch could happen at any time, and then
you might read data that isn't written yet. The barrier just ensures
that we get that order of execution even on fancy modern hardware, but
I'm not sure how much of that we really need to explain here.

Could you also add a comment as to why you think a read barrier isn't
sufficient? IIUC that's the case because we need to prevent reordering
in both directions: Can't neither start reading based on a "too new"
bytes_read, nor can affort writes to mq_ring being reordered to before
the barrier. Correct?

I can't parse that statement. We're separating from the read of
mqh_read_bytes from the write to mqh_ring. My understanding is that a
read barrier can separate two reads, a write barrier can separate two
writes, and a full barrier is needed to separate a write from a read
in either order. Added a comment to that effect.

+ pg_memory_barrier();
memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
(char *) data + sent, sendnow);
sent += sendnow;

Btw, this mq_ring_offset stuff seems a bit silly, why don't we use
proper padding/union in the struct to make it unnecessary to do that bit
of offset calculation every time? I think it currently prevents
efficient address calculation instructions from being used.

Well, the root cause -- aside from me being a fallible human being
with only limited programing skills -- is that I wanted the parallel
query code to be able to request whatever queue size it preferred
without having to worry about how many bytes of that space was going
to get consumed by overhead. But it would certainly be possible to
change it up, if somebody felt like working out how the API should be
set up. I don't really want to do that right now, though.

From 666d33a363036a647dde83cb28b9d7ad0b31f76c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 2/2] shm-mq-reduce-receiver-latch-set-v1

-     /* Consume any zero-copy data from previous receive operation. */
-     if (mqh->mqh_consume_pending > 0)
+     /*
+      * If we've consumed an amount of data greater than 1/4th of the ring
+      * size, mark it consumed in shared memory.  We try to avoid doing this
+      * unnecessarily when only a small amount of data has been consumed,
+      * because SetLatch() is fairly expensive and we don't want to do it
+      * too often.
+      */
+     if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
{

Hm. Why are we doing this at the level of updating the variables, rather
than SetLatch calls?

Hmm, I'm not sure I understand what you're suggesting, here. In
general, we return with the data for the current message unconsumed,
and then consume it the next time the function is called, so that
(except when the message wraps the end of the buffer) we can return a
pointer directly into the buffer rather than having to memcpy(). What
this patch does is postpone consuming the data further, either until
we can free up at least a quarter of the ring buffer or until we need
to wait for more data. It seemed worthwhile to free up space in the
ring buffer occasionally even if we weren't to the point of waiting
yet, so that the sender has an opportunity to write new data into that
space if it happens to still be running.

Slightly revised patches attached. 0002 is unchanged except for being
made pgindent-clean.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-shm-mq-less-spinlocks-v3.patchapplication/octet-stream; name=0001-shm-mq-less-spinlocks-v3.patchDownload
From d4c50d41e663bafd66670536b73d4d39fcbf0aff Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 17:42:53 +0100
Subject: [PATCH 1/2] shm-mq-less-spinlocks-v3

---
 src/backend/storage/ipc/shm_mq.c | 249 ++++++++++++++++++++-------------------
 1 file changed, 127 insertions(+), 122 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 1131e27e2e..0a2776e37a 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -31,27 +31,28 @@
  * Some notes on synchronization:
  *
  * mq_receiver and mq_bytes_read can only be changed by the receiver; and
- * mq_sender and mq_bytes_written can only be changed by the sender.  However,
- * because most of these fields are 8 bytes and we don't assume that 8 byte
- * reads and writes are atomic, the spinlock must be taken whenever the field
- * is updated, and whenever it is read by a process other than the one allowed
- * to modify it. But the process that is allowed to modify it is also allowed
- * to read it without the lock.  On architectures where 8-byte writes are
- * atomic, we could replace these spinlocks with memory barriers, but
- * testing found no performance benefit, so it seems best to keep things
- * simple for now.
+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.
  *
- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.
  *
  * mq_ring_size and mq_ring_offset never change after initialization, and
  * can therefore be read without the lock.
  *
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * Importantly, mq_ring can be safely read and written without a lock.
+ * At any given time, the difference between mq_bytes_read and
  * mq_bytes_written defines the number of bytes within mq_ring that contain
  * unread data, and mq_bytes_read defines the position where those bytes
  * begin.  The sender can increase the number of unread bytes at any time,
@@ -71,8 +72,8 @@ struct shm_mq
 	slock_t		mq_mutex;
 	PGPROC	   *mq_receiver;
 	PGPROC	   *mq_sender;
-	uint64		mq_bytes_read;
-	uint64		mq_bytes_written;
+	pg_atomic_uint64 mq_bytes_read;
+	pg_atomic_uint64 mq_bytes_written;
 	Size		mq_ring_size;
 	bool		mq_detached;
 	uint8		mq_ring_offset;
@@ -150,11 +151,8 @@ static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 					 BackgroundWorkerHandle *handle);
-static uint64 shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n);
-static uint64 shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n);
-static shm_mq_result shm_mq_notify_receiver(volatile shm_mq *mq);
 static void shm_mq_detach_callback(dsm_segment *seg, Datum arg);
 
 /* Minimum queue size is enough for header and at least one chunk of data. */
@@ -182,8 +180,8 @@ shm_mq_create(void *address, Size size)
 	SpinLockInit(&mq->mq_mutex);
 	mq->mq_receiver = NULL;
 	mq->mq_sender = NULL;
-	mq->mq_bytes_read = 0;
-	mq->mq_bytes_written = 0;
+	pg_atomic_init_u64(&mq->mq_bytes_read, 0);
+	pg_atomic_init_u64(&mq->mq_bytes_written, 0);
 	mq->mq_ring_size = size - data_offset;
 	mq->mq_detached = false;
 	mq->mq_ring_offset = data_offset - offsetof(shm_mq, mq_ring);
@@ -352,6 +350,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	PGPROC	   *receiver;
 	Size		nbytes = 0;
 	Size		bytes_written;
 	int			i;
@@ -492,8 +491,30 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_length_word_complete = false;
 
+	/* If queue has been detached, let caller know. */
+	if (mq->mq_detached)
+		return SHM_MQ_DETACHED;
+
+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */
+	if (mqh->mqh_counterparty_attached)
+		receiver = mq->mq_receiver;
+	else
+	{
+		SpinLockAcquire(&mq->mq_mutex);
+		receiver = mq->mq_receiver;
+		SpinLockRelease(&mq->mq_mutex);
+		if (receiver == NULL)
+			return SHM_MQ_SUCCESS;
+		mqh->mqh_counterparty_attached = true;
+	}
+
 	/* Notify receiver of the newly-written data, and return. */
-	return shm_mq_notify_receiver(mq);
+	SetLatch(&receiver->procLatch);
+	return SHM_MQ_SUCCESS;
 }
 
 /*
@@ -848,18 +869,26 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 
 	while (sent < nbytes)
 	{
-		bool		detached;
 		uint64		rb;
+		uint64		wb;
 
 		/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;
 		Assert(used <= ringsize);
 		available = Min(ringsize - used, nbytes - sent);
 
-		/* Bail out if the queue has been detached. */
-		if (detached)
+		/*
+		 * Bail out if the queue has been detached.  Note that we would be in
+		 * trouble if the compiler decided to cache the value of
+		 * mq->mq_detached in a register or on the stack across loop
+		 * iterations, but it shouldn't do that since we'll always return,
+		 * call an external function that performs a system call, or reach a
+		 * memory barrier at some point later in the loop.
+		 */
+		if (mq->mq_detached)
 		{
 			*bytes_written = sent;
 			return SHM_MQ_DETACHED;
@@ -900,15 +929,13 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else if (available == 0)
 		{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			Assert(mqh->mqh_counterparty_attached);
+			SetLatch(&mq->mq_receiver->procLatch);
 
 			/* Skip manipulation of our latch if nowait = true. */
 			if (nowait)
@@ -934,10 +961,20 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else
 		{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
 
-			/* Write as much data as we can via a single memcpy(). */
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);
+
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 * (Since we're separating the read of mq_bytes_read from a
+			 * subsequent write to mq_ring, we need a full barrier here.)
+			 */
+			pg_memory_barrier();
 			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
 				   (char *) data + sent, sendnow);
 			sent += sendnow;
@@ -983,19 +1020,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 	for (;;)
 	{
 		Size		offset;
-		bool		detached;
+		uint64		read;
 
 		/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+		used = written - read;
 		Assert(used <= ringsize);
-		offset = mq->mq_bytes_read % (uint64) ringsize;
+		offset = read % (uint64) ringsize;
 
 		/* If we have enough data or buffer has wrapped, we're done. */
 		if (used >= bytes_needed || offset + used >= ringsize)
 		{
 			*nbytesp = Min(used, ringsize - offset);
 			*datap = &mq->mq_ring[mq->mq_ring_offset + offset];
+
+			/*
+			 * Separate the read of mq_bytes_written, above, from caller's
+			 * attempt to read the data itself.  Pairs with the barrier in
+			 * shm_mq_inc_bytes_written.
+			 */
+			pg_read_barrier();
 			return SHM_MQ_SUCCESS;
 		}
 
@@ -1007,7 +1052,7 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		 * receiving a message stored in the buffer even after the sender has
 		 * detached.
 		 */
-		if (detached)
+		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1037,16 +1082,10 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 static bool
 shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 {
-	bool		detached;
 	pid_t		pid;
 
-	/* Acquire the lock just long enough to check the pointer. */
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
 	/* If the queue has been detached, counterparty is definitely gone. */
-	if (detached)
+	if (mq->mq_detached)
 		return true;
 
 	/* If there's a handle, check worker status. */
@@ -1059,9 +1098,7 @@ shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 		if (status != BGWH_STARTED && status != BGWH_NOT_YET_STARTED)
 		{
 			/* Mark it detached, just to make it official. */
-			SpinLockAcquire(&mq->mq_mutex);
 			mq->mq_detached = true;
-			SpinLockRelease(&mq->mq_mutex);
 			return true;
 		}
 	}
@@ -1091,16 +1128,14 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	{
 		BgwHandleStatus status;
 		pid_t		pid;
-		bool		detached;
 
 		/* Acquire the lock just long enough to check the pointer. */
 		SpinLockAcquire(&mq->mq_mutex);
-		detached = mq->mq_detached;
 		result = (*ptr != NULL);
 		SpinLockRelease(&mq->mq_mutex);
 
 		/* Fail if detached; else succeed if initialized. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			result = false;
 			break;
@@ -1133,23 +1168,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 }
 
 /*
- * Get the number of bytes read.  The receiver need not use this to access
- * the count of bytes read, but the sender must.
- */
-static uint64
-shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_read;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes read.
  */
 static void
@@ -1157,63 +1175,50 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
 	PGPROC	   *sender;
 
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes(). We
+	 * only need a read barrier here because the increment of mq_bytes_read is
+	 * actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method should be cheaper.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_read,
+						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
+
+	/*
+	 * We shouldn't have any bytes to read without a sender, so we can read
+	 * mq_sender here without a lock.  Once it's initialized, it can't change.
+	 */
 	sender = mq->mq_sender;
-	SpinLockRelease(&mq->mq_mutex);
-
-	/* We shouldn't have any bytes to read without a sender. */
 	Assert(sender != NULL);
 	SetLatch(&sender->procLatch);
 }
 
 /*
- * Get the number of bytes written.  The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_written;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
-/*
  * Increment the number of bytes written.
  */
 static void
 shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n)
 {
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_written += n;
-	SpinLockRelease(&mq->mq_mutex);
-}
-
-/*
- * Set receiver's latch, unless queue is detached.
- */
-static shm_mq_result
-shm_mq_notify_receiver(volatile shm_mq *mq)
-{
-	PGPROC	   *receiver;
-	bool		detached;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	receiver = mq->mq_receiver;
-	SpinLockRelease(&mq->mq_mutex);
-
-	if (detached)
-		return SHM_MQ_DETACHED;
-	if (receiver)
-		SetLatch(&receiver->procLatch);
-	return SHM_MQ_SUCCESS;
+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with the read barrier found in
+	 * shm_mq_get_receive_bytes.
+	 */
+	pg_write_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_written,
+						pg_atomic_read_u64(&mq->mq_bytes_written) + n);
 }
 
 /* Shim for on_dsm_callback. */
-- 
2.13.5 (Apple Git-94)

0002-shm-mq-reduce-receiver-latch-set-v2.patchapplication/octet-stream; name=0002-shm-mq-reduce-receiver-latch-set-v2.patchDownload
From 8d9e6f5242b4c287c0abf9a827b21cae0925fb5f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 2/2] shm-mq-reduce-receiver-latch-set-v2

---
 src/backend/storage/ipc/shm_mq.c | 69 +++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 0a2776e37a..3bcc07e6cb 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -143,10 +143,10 @@ struct shm_mq_handle
 };
 
 static void shm_mq_detach_internal(shm_mq *mq);
-static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
+static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes,
 				  const void *data, bool nowait, Size *bytes_written);
-static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
-					 bool nowait, Size *nbytesp, void **datap);
+static shm_mq_result shm_mq_receive_bytes(shm_mq_handle *mqh,
+					 Size bytes_needed, bool nowait, Size *nbytesp, void **datap);
 static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
@@ -586,8 +586,14 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_counterparty_attached = true;
 	}
 
-	/* Consume any zero-copy data from previous receive operation. */
-	if (mqh->mqh_consume_pending > 0)
+	/*
+	 * If we've consumed an amount of data greater than 1/4th of the ring
+	 * size, mark it consumed in shared memory.  We try to avoid doing this
+	 * unnecessarily when only a small amount of data has been consumed,
+	 * because SetLatch() is fairly expensive and we don't want to do it too
+	 * often.
+	 */
+	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
 	{
 		shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
 		mqh->mqh_consume_pending = 0;
@@ -598,7 +604,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 	{
 		/* Try to receive the message length word. */
 		Assert(mqh->mqh_partial_bytes < sizeof(Size));
-		res = shm_mq_receive_bytes(mq, sizeof(Size) - mqh->mqh_partial_bytes,
+		res = shm_mq_receive_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
 								   nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
@@ -618,13 +624,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			needed = MAXALIGN(sizeof(Size)) + MAXALIGN(nbytes);
 			if (rb >= needed)
 			{
-				/*
-				 * Technically, we could consume the message length
-				 * information at this point, but the extra write to shared
-				 * memory wouldn't be free and in most cases we would reap no
-				 * benefit.
-				 */
-				mqh->mqh_consume_pending = needed;
+				mqh->mqh_consume_pending += needed;
 				*nbytesp = nbytes;
 				*datap = ((char *) rawdata) + MAXALIGN(sizeof(Size));
 				return SHM_MQ_SUCCESS;
@@ -636,7 +636,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			 */
 			mqh->mqh_expected_bytes = nbytes;
 			mqh->mqh_length_word_complete = true;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(sizeof(Size)));
+			mqh->mqh_consume_pending += MAXALIGN(sizeof(Size));
 			rb -= MAXALIGN(sizeof(Size));
 		}
 		else
@@ -655,7 +655,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			}
 			Assert(mqh->mqh_buflen >= sizeof(Size));
 
-			/* Copy and consume partial length word. */
+			/* Copy partial length word; remember to consume it. */
 			if (mqh->mqh_partial_bytes + rb > sizeof(Size))
 				lengthbytes = sizeof(Size) - mqh->mqh_partial_bytes;
 			else
@@ -663,7 +663,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			memcpy(&mqh->mqh_buffer[mqh->mqh_partial_bytes], rawdata,
 				   lengthbytes);
 			mqh->mqh_partial_bytes += lengthbytes;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(lengthbytes));
+			mqh->mqh_consume_pending += MAXALIGN(lengthbytes);
 			rb -= lengthbytes;
 
 			/* If we now have the whole word, we're ready to read payload. */
@@ -685,13 +685,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		 * we need not copy the data and can return a pointer directly into
 		 * shared memory.
 		 */
-		res = shm_mq_receive_bytes(mq, nbytes, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, nbytes, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb >= nbytes)
 		{
 			mqh->mqh_length_word_complete = false;
-			mqh->mqh_consume_pending = MAXALIGN(nbytes);
+			mqh->mqh_consume_pending += MAXALIGN(nbytes);
 			*nbytesp = nbytes;
 			*datap = rawdata;
 			return SHM_MQ_SUCCESS;
@@ -731,13 +731,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_partial_bytes += rb;
 
 		/*
-		 * Update count of bytes read, with alignment padding.  Note that this
-		 * will never actually insert any padding except at the end of a
-		 * message, because the buffer size is a multiple of MAXIMUM_ALIGNOF,
-		 * and each read and write is as well.
+		 * Update count of bytes that can be consumed, accounting for
+		 * alignment padding.  Note that this will never actually insert any
+		 * padding except at the end of a message, because the buffer size is
+		 * a multiple of MAXIMUM_ALIGNOF, and each read and write is as well.
 		 */
 		Assert(mqh->mqh_partial_bytes == nbytes || rb == MAXALIGN(rb));
-		shm_mq_inc_bytes_read(mq, MAXALIGN(rb));
+		mqh->mqh_consume_pending += MAXALIGN(rb);
 
 		/* If we got all the data, exit the loop. */
 		if (mqh->mqh_partial_bytes >= nbytes)
@@ -745,7 +745,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 
 		/* Wait for some more data. */
 		still_needed = nbytes - mqh->mqh_partial_bytes;
-		res = shm_mq_receive_bytes(mq, still_needed, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, still_needed, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb > still_needed)
@@ -1010,9 +1010,10 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
  * is SHM_MQ_SUCCESS.
  */
 static shm_mq_result
-shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
+shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 					 Size *nbytesp, void **datap)
 {
+	shm_mq	   *mq = mqh->mqh_queue;
 	Size		ringsize = mq->mq_ring_size;
 	uint64		used;
 	uint64		written;
@@ -1024,7 +1025,13 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 
 		/* Get bytes written, so we can compute what's available to read. */
 		written = pg_atomic_read_u64(&mq->mq_bytes_written);
-		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+
+		/*
+		 * Get bytes read.  Include bytes we could consume but have not yet
+		 * consumed.
+		 */
+		read = pg_atomic_read_u64(&mq->mq_bytes_read) +
+			mqh->mqh_consume_pending;
 		used = written - read;
 		Assert(used <= ringsize);
 		offset = read % (uint64) ringsize;
@@ -1055,6 +1062,16 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
+		/*
+		 * We didn't get enough data to satisfy the request, so mark any data
+		 * previously-consumed as read to make more buffer space.
+		 */
+		if (mqh->mqh_consume_pending > 0)
+		{
+			shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
+			mqh->mqh_consume_pending = 0;
+		}
+
 		/* Skip manipulation of our latch if nowait = true. */
 		if (nowait)
 			return SHM_MQ_WOULD_BLOCK;
-- 
2.13.5 (Apple Git-94)

#58Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#57)
Re: [HACKERS] [POC] Faster processing at Gather node

Hi,

On 2018-01-25 12:09:23 -0500, Robert Haas wrote:

Perhaps a short benchmark for 32bit systems using shm_mq wouldn't hurt?
I suspect there won't be much of a performance impact, but it's probably
worth checking.

I don't think I understand your concern here. If this is used on a
system where we're emulating atomics and barriers in painful ways, it
might hurt performance, but I think we have a policy of not caring.

Well, it's more than just systems like that - for 64bit atomics we
sometimes do fall back to spinlock based atomics on 32bit systems, even
if they support 32 bit atomics.

Also, I don't know where I'd find a 32-bit system to test.

You can compile with -m32 on reasonable systems ;)

Assert(used <= ringsize);
available = Min(ringsize - used, nbytes - sent);

/* Bail out if the queue has been detached. */
-             if (detached)
+             if (mq->mq_detached)

Hm, do all paths here guarantee that mq->mq_detached won't be stored on
the stack / register in the first iteration, and then not reread any
further? I think it's fine because every branch of the if below ends up
in a syscall / barrier, but it might be worth noting on that here.

Aargh. I hate compilers. I added a comment. Should I think about
changing mq_detached to pg_atomic_uint32 instead?

I think a pg_compiler_barrier() would suffice to alleviate my concern,
right? If you wanted to go for an atomic, using pg_atomic_flag would
probably be more appropriate than pg_atomic_uint32.

-                     /* Write as much data as we can via a single memcpy(). */
+                     /*
+                      * Write as much data as we can via a single memcpy(). Make sure
+                      * these writes happen after the read of mq_bytes_read, above.
+                      * This barrier pairs with the one in shm_mq_inc_bytes_read.
+                      */

s/above/above. Otherwise a newer mq_bytes_read could become visible
before the corresponding reads have fully finished./?

I don't find that very clear. A newer mq_bytes_read could become
visible whenever, and a barrier doesn't prevent that from happening.

Well, my point was that the barrier prevents the the write to
mq_bytes_read becoming visible before the corresponding reads have
finished. Which then would mean the memcpy() could overwrite them. And a
barrier *does* prevent that from happening.

I don't think this is the same as:

What it does is ensure (together with the one in
shm_mq_inc_bytes_read) that we don't try to read bytes that aren't
fully *written* yet.

which seems much more about the barrier in shm_mq_inc_bytes_written()?

Generally, my mental model is that barriers make things happen in
program order rather than some other order that the CPU thinks would
be fun. Imagine a single-core server doing all of this stuff the "old
school" way. If the reader puts data into the queue before
advertising its presence and the writer finishes using the data from
the queue before advertising its consumption, then everything works.
If you do anything else, it's flat busted, even on that single-core
system, because a context switch could happen at any time, and then
you might read data that isn't written yet. The barrier just ensures
that we get that order of execution even on fancy modern hardware, but
I'm not sure how much of that we really need to explain here.

IDK, I find it nontrivial to understand individual uses of
barriers. There's often multiple non isometric ways to use barriers, and
the logic why a specific one is correct isn't always obvious.

+ pg_memory_barrier();
memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
(char *) data + sent, sendnow);
sent += sendnow;

Btw, this mq_ring_offset stuff seems a bit silly, why don't we use
proper padding/union in the struct to make it unnecessary to do that bit
of offset calculation every time? I think it currently prevents
efficient address calculation instructions from being used.

Well, the root cause -- aside from me being a fallible human being
with only limited programing skills -- is that I wanted the parallel
query code to be able to request whatever queue size it preferred
without having to worry about how many bytes of that space was going
to get consumed by overhead.

What I meant is that instead of doing
struct shm_mq
{
...
bool mq_detached;
uint8 mq_ring_offset;
char mq_ring[FLEXIBLE_ARRAY_MEMBER];
};

it'd be possible to do something like

{
...
bool mq_detached;
union {
char mq_ring[FLEXIBLE_ARRAY_MEMBER];
double forcealign;
} d;
};

which'd force the struct to be laid out so mq_ring is at a suitable
offset. We use that in a bunch of places.

As far as I understand that'd not run counter to your goals of:

without having to worry about how many bytes of that space was going
to get consumed by overhead.

right?

change it up, if somebody felt like working out how the API should be
set up. I don't really want to do that right now, though.

Right.

From 666d33a363036a647dde83cb28b9d7ad0b31f76c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 2/2] shm-mq-reduce-receiver-latch-set-v1

-     /* Consume any zero-copy data from previous receive operation. */
-     if (mqh->mqh_consume_pending > 0)
+     /*
+      * If we've consumed an amount of data greater than 1/4th of the ring
+      * size, mark it consumed in shared memory.  We try to avoid doing this
+      * unnecessarily when only a small amount of data has been consumed,
+      * because SetLatch() is fairly expensive and we don't want to do it
+      * too often.
+      */
+     if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
{

Hm. Why are we doing this at the level of updating the variables, rather
than SetLatch calls?

Hmm, I'm not sure I understand what you're suggesting, here. In
general, we return with the data for the current message unconsumed,
and then consume it the next time the function is called, so that
(except when the message wraps the end of the buffer) we can return a
pointer directly into the buffer rather than having to memcpy(). What
this patch does is postpone consuming the data further, either until
we can free up at least a quarter of the ring buffer or until we need
to wait for more data. It seemed worthwhile to free up space in the
ring buffer occasionally even if we weren't to the point of waiting
yet, so that the sender has an opportunity to write new data into that
space if it happens to still be running.

What I'm trying to suggest is that instead of postponing an update of
mq_bytes_read (by storing amount of already performed reads in
mqh_consume_pending), we continue to update mq_bytes_read but only set
the latch if your above thresholds are crossed. That way a burst of
writes can fully fill the ringbuffer, but the cost of doing a SetLatch()
is amortized. In my testing SetLatch() was the expensive part, not the
necessary barriers in shm_mq_inc_bytes_read().

- Andres

#59Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#58)
Re: [HACKERS] [POC] Faster processing at Gather node

On Wed, Feb 7, 2018 at 1:41 PM, Andres Freund <andres@anarazel.de> wrote:

Well, it's more than just systems like that - for 64bit atomics we
sometimes do fall back to spinlock based atomics on 32bit systems, even
if they support 32 bit atomics.

I built with -m32 on my laptop and tried "select aid, count(*) from
pgbench_accounts group by 1 having count(*) > 1" on pgbench at scale
factor 100 with pgbench_accounts_pkey dropped and
max_parallel_workers_per_gather set to 10 on (a) commit
0b5e33f667a2042d7022da8bef31a8be5937aad1 (I know this is a little old,
but I think it doesn't matter), (b) same plus
shm-mq-less-spinlocks-v3, and (c) same plus shm-mq-less-spinlocks-v3
and shm-mq-reduce-receiver-latch-set-v2.

(a) 16563.790 ms, 16625.257 ms, 16496.062 ms
(b) 17217.051 ms, 17157.745 ms, 17225.755 ms [median to median +3.9% vs. (a)]
(c) 15491.947 ms, 15455.840 ms, 15452.649 ms [median to median -7.0%
vs. (a), -10.2% vs (b)]

Do you think that's a problem? If it is, what do you think we should
do about it? It seems to me that it's probably OK because (1) with
both patches we still come out ahead and (2) 32-bit systems will
presumably continue to become rarer as time goes on, but you might
disagree.

Hm, do all paths here guarantee that mq->mq_detached won't be stored on
the stack / register in the first iteration, and then not reread any
further? I think it's fine because every branch of the if below ends up
in a syscall / barrier, but it might be worth noting on that here.

Aargh. I hate compilers. I added a comment. Should I think about
changing mq_detached to pg_atomic_uint32 instead?

I think a pg_compiler_barrier() would suffice to alleviate my concern,
right? If you wanted to go for an atomic, using pg_atomic_flag would
probably be more appropriate than pg_atomic_uint32.

Hmm, all right, I'll add pg_compiler_barrier().

-                     /* Write as much data as we can via a single memcpy(). */
+                     /*
+                      * Write as much data as we can via a single memcpy(). Make sure
+                      * these writes happen after the read of mq_bytes_read, above.
+                      * This barrier pairs with the one in shm_mq_inc_bytes_read.
+                      */

s/above/above. Otherwise a newer mq_bytes_read could become visible
before the corresponding reads have fully finished./?

I don't find that very clear. A newer mq_bytes_read could become
visible whenever, and a barrier doesn't prevent that from happening.

Well, my point was that the barrier prevents the the write to
mq_bytes_read becoming visible before the corresponding reads have
finished. Which then would mean the memcpy() could overwrite them. And a
barrier *does* prevent that from happening.

I think we're talking about the same thing, but not finding each
others' explanations very clear.

Hmm, I'm not sure I understand what you're suggesting, here. In
general, we return with the data for the current message unconsumed,
and then consume it the next time the function is called, so that
(except when the message wraps the end of the buffer) we can return a
pointer directly into the buffer rather than having to memcpy(). What
this patch does is postpone consuming the data further, either until
we can free up at least a quarter of the ring buffer or until we need
to wait for more data. It seemed worthwhile to free up space in the
ring buffer occasionally even if we weren't to the point of waiting
yet, so that the sender has an opportunity to write new data into that
space if it happens to still be running.

What I'm trying to suggest is that instead of postponing an update of
mq_bytes_read (by storing amount of already performed reads in
mqh_consume_pending), we continue to update mq_bytes_read but only set
the latch if your above thresholds are crossed. That way a burst of
writes can fully fill the ringbuffer, but the cost of doing a SetLatch()
is amortized. In my testing SetLatch() was the expensive part, not the
necessary barriers in shm_mq_inc_bytes_read().

OK, I'll try to check how feasible that would be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#60Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#59)
Re: [HACKERS] [POC] Faster processing at Gather node

Hi,

On 2018-02-27 16:03:17 -0500, Robert Haas wrote:

On Wed, Feb 7, 2018 at 1:41 PM, Andres Freund <andres@anarazel.de> wrote:

Well, it's more than just systems like that - for 64bit atomics we
sometimes do fall back to spinlock based atomics on 32bit systems, even
if they support 32 bit atomics.

I built with -m32 on my laptop and tried "select aid, count(*) from
pgbench_accounts group by 1 having count(*) > 1" on pgbench at scale
factor 100 with pgbench_accounts_pkey dropped and
max_parallel_workers_per_gather set to 10 on (a) commit
0b5e33f667a2042d7022da8bef31a8be5937aad1 (I know this is a little old,
but I think it doesn't matter), (b) same plus
shm-mq-less-spinlocks-v3, and (c) same plus shm-mq-less-spinlocks-v3
and shm-mq-reduce-receiver-latch-set-v2.

(a) 16563.790 ms, 16625.257 ms, 16496.062 ms
(b) 17217.051 ms, 17157.745 ms, 17225.755 ms [median to median +3.9% vs. (a)]
(c) 15491.947 ms, 15455.840 ms, 15452.649 ms [median to median -7.0%
vs. (a), -10.2% vs (b)]

Do you think that's a problem? If it is, what do you think we should
do about it? It seems to me that it's probably OK because (1) with
both patches we still come out ahead and (2) 32-bit systems will
presumably continue to become rarer as time goes on, but you might
disagree.

No, I think this is fairly reasonable. A fairly extreme usecase on a 32
bit machine regressing a bit, while gaining peformance in other case?
That works for me.

OK, I'll try to check how feasible that would be.

cool.

Greetings,

Andres Freund

#61Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#60)
3 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Tue, Feb 27, 2018 at 4:06 PM, Andres Freund <andres@anarazel.de> wrote:

OK, I'll try to check how feasible that would be.

cool.

It's not too hard, but it doesn't really seem to help, so I'm inclined
to leave it alone. To make it work, you need to keep two separate
counters in the shm_mq_handle, one for the number of bytes since we
did an increment and the other for the number of bytes since we sent a
signal. I don't really want to introduce that complexity unless there
is a benefit.

With just 0001 and 0002: 3968.899 ms, 4043.428 ms, 4042.472 ms, 4142.226 ms
With two-separate-counters.patch added: 4123.841 ms, 4101.917 ms,
4063.368 ms, 3985.148 ms

If you take the total of the 4 times, that's an 0.4% slowdown with the
patch applied, but I think that's just noise. It seems possible that
with a larger queue -- and maybe a different query shape it would
help, but I really just want to get the optimizations that I've got
committed, provided that you find them acceptable, rather than spend a
lot of time looking for new optimizations, because:

1. I've got other things to get done.

2. I think that the patches I've got here capture most of the available benefit.

3. This case isn't super-common in the first place -- we generally
want to avoid feeding tons of tuples through the Gather.

4. We might abandon the shm_mq approach entirely and switch to
something like sticking tuples in DSA using the flexible tuple slot
stuff you've proposed elsewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0002-shm-mq-reduce-receiver-latch-set-v3.patchapplication/octet-stream; name=0002-shm-mq-reduce-receiver-latch-set-v3.patchDownload
From 719edc52e6298790a1d7ab0942e232ef9d9e9954 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 19:03:03 +0100
Subject: [PATCH 2/2] shm-mq-reduce-receiver-latch-set-v3

---
 src/backend/storage/ipc/shm_mq.c | 70 +++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index f3ede48d9b..02a5df8da9 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -143,10 +143,11 @@ struct shm_mq_handle
 };
 
 static void shm_mq_detach_internal(shm_mq *mq);
-static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mq, Size nbytes,
+static shm_mq_result shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes,
 				  const void *data, bool nowait, Size *bytes_written);
-static shm_mq_result shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed,
-					 bool nowait, Size *nbytesp, void **datap);
+static shm_mq_result shm_mq_receive_bytes(shm_mq_handle *mqh,
+					 Size bytes_needed, bool nowait, Size *nbytesp,
+					 void **datap);
 static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
@@ -586,8 +587,14 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_counterparty_attached = true;
 	}
 
-	/* Consume any zero-copy data from previous receive operation. */
-	if (mqh->mqh_consume_pending > 0)
+	/*
+	 * If we've consumed an amount of data greater than 1/4th of the ring
+	 * size, mark it consumed in shared memory.  We try to avoid doing this
+	 * unnecessarily when only a small amount of data has been consumed,
+	 * because SetLatch() is fairly expensive and we don't want to do it too
+	 * often.
+	 */
+	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
 	{
 		shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
 		mqh->mqh_consume_pending = 0;
@@ -598,7 +605,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 	{
 		/* Try to receive the message length word. */
 		Assert(mqh->mqh_partial_bytes < sizeof(Size));
-		res = shm_mq_receive_bytes(mq, sizeof(Size) - mqh->mqh_partial_bytes,
+		res = shm_mq_receive_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
 								   nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
@@ -618,13 +625,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			needed = MAXALIGN(sizeof(Size)) + MAXALIGN(nbytes);
 			if (rb >= needed)
 			{
-				/*
-				 * Technically, we could consume the message length
-				 * information at this point, but the extra write to shared
-				 * memory wouldn't be free and in most cases we would reap no
-				 * benefit.
-				 */
-				mqh->mqh_consume_pending = needed;
+				mqh->mqh_consume_pending += needed;
 				*nbytesp = nbytes;
 				*datap = ((char *) rawdata) + MAXALIGN(sizeof(Size));
 				return SHM_MQ_SUCCESS;
@@ -636,7 +637,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			 */
 			mqh->mqh_expected_bytes = nbytes;
 			mqh->mqh_length_word_complete = true;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(sizeof(Size)));
+			mqh->mqh_consume_pending += MAXALIGN(sizeof(Size));
 			rb -= MAXALIGN(sizeof(Size));
 		}
 		else
@@ -655,7 +656,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			}
 			Assert(mqh->mqh_buflen >= sizeof(Size));
 
-			/* Copy and consume partial length word. */
+			/* Copy partial length word; remember to consume it. */
 			if (mqh->mqh_partial_bytes + rb > sizeof(Size))
 				lengthbytes = sizeof(Size) - mqh->mqh_partial_bytes;
 			else
@@ -663,7 +664,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			memcpy(&mqh->mqh_buffer[mqh->mqh_partial_bytes], rawdata,
 				   lengthbytes);
 			mqh->mqh_partial_bytes += lengthbytes;
-			shm_mq_inc_bytes_read(mq, MAXALIGN(lengthbytes));
+			mqh->mqh_consume_pending += MAXALIGN(lengthbytes);
 			rb -= lengthbytes;
 
 			/* If we now have the whole word, we're ready to read payload. */
@@ -685,13 +686,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		 * we need not copy the data and can return a pointer directly into
 		 * shared memory.
 		 */
-		res = shm_mq_receive_bytes(mq, nbytes, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, nbytes, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb >= nbytes)
 		{
 			mqh->mqh_length_word_complete = false;
-			mqh->mqh_consume_pending = MAXALIGN(nbytes);
+			mqh->mqh_consume_pending += MAXALIGN(nbytes);
 			*nbytesp = nbytes;
 			*datap = rawdata;
 			return SHM_MQ_SUCCESS;
@@ -731,13 +732,13 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		mqh->mqh_partial_bytes += rb;
 
 		/*
-		 * Update count of bytes read, with alignment padding.  Note that this
-		 * will never actually insert any padding except at the end of a
-		 * message, because the buffer size is a multiple of MAXIMUM_ALIGNOF,
-		 * and each read and write is as well.
+		 * Update count of bytes that can be consumed, accounting for
+		 * alignment padding.  Note that this will never actually insert any
+		 * padding except at the end of a message, because the buffer size is
+		 * a multiple of MAXIMUM_ALIGNOF, and each read and write is as well.
 		 */
 		Assert(mqh->mqh_partial_bytes == nbytes || rb == MAXALIGN(rb));
-		shm_mq_inc_bytes_read(mq, MAXALIGN(rb));
+		mqh->mqh_consume_pending += MAXALIGN(rb);
 
 		/* If we got all the data, exit the loop. */
 		if (mqh->mqh_partial_bytes >= nbytes)
@@ -745,7 +746,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 
 		/* Wait for some more data. */
 		still_needed = nbytes - mqh->mqh_partial_bytes;
-		res = shm_mq_receive_bytes(mq, still_needed, nowait, &rb, &rawdata);
+		res = shm_mq_receive_bytes(mqh, still_needed, nowait, &rb, &rawdata);
 		if (res != SHM_MQ_SUCCESS)
 			return res;
 		if (rb > still_needed)
@@ -1012,9 +1013,10 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
  * is SHM_MQ_SUCCESS.
  */
 static shm_mq_result
-shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
+shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 					 Size *nbytesp, void **datap)
 {
+	shm_mq	   *mq = mqh->mqh_queue;
 	Size		ringsize = mq->mq_ring_size;
 	uint64		used;
 	uint64		written;
@@ -1026,7 +1028,13 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 
 		/* Get bytes written, so we can compute what's available to read. */
 		written = pg_atomic_read_u64(&mq->mq_bytes_written);
-		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+
+		/*
+		 * Get bytes read.  Include bytes we could consume but have not yet
+		 * consumed.
+		 */
+		read = pg_atomic_read_u64(&mq->mq_bytes_read) +
+			mqh->mqh_consume_pending;
 		used = written - read;
 		Assert(used <= ringsize);
 		offset = read % (uint64) ringsize;
@@ -1057,6 +1065,16 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
+		/*
+		 * We didn't get enough data to satisfy the request, so mark any data
+		 * previously-consumed as read to make more buffer space.
+		 */
+		if (mqh->mqh_consume_pending > 0)
+		{
+			shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
+			mqh->mqh_consume_pending = 0;
+		}
+
 		/* Skip manipulation of our latch if nowait = true. */
 		if (nowait)
 			return SHM_MQ_WOULD_BLOCK;
-- 
2.14.3 (Apple Git-98)

0001-shm-mq-less-spinlocks-v4.patchapplication/octet-stream; name=0001-shm-mq-less-spinlocks-v4.patchDownload
From 0b1a9bacd0e67d7924ff07afd08d9262ec38e1b2 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sat, 4 Nov 2017 17:42:53 +0100
Subject: [PATCH 1/2] shm-mq-less-spinlocks-v4

---
 src/backend/storage/ipc/shm_mq.c | 251 ++++++++++++++++++++-------------------
 1 file changed, 129 insertions(+), 122 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 1131e27e2e..f3ede48d9b 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -31,27 +31,28 @@
  * Some notes on synchronization:
  *
  * mq_receiver and mq_bytes_read can only be changed by the receiver; and
- * mq_sender and mq_bytes_written can only be changed by the sender.  However,
- * because most of these fields are 8 bytes and we don't assume that 8 byte
- * reads and writes are atomic, the spinlock must be taken whenever the field
- * is updated, and whenever it is read by a process other than the one allowed
- * to modify it. But the process that is allowed to modify it is also allowed
- * to read it without the lock.  On architectures where 8-byte writes are
- * atomic, we could replace these spinlocks with memory barriers, but
- * testing found no performance benefit, so it seems best to keep things
- * simple for now.
+ * mq_sender and mq_bytes_written can only be changed by the sender.
+ * mq_receiver and mq_sender are protected by mq_mutex, although, importantly,
+ * they cannot change once set, and thus may be read without a lock once this
+ * is known to be the case.
  *
- * mq_detached can be set by either the sender or the receiver, so the mutex
- * must be held to read or write it.  Memory barriers could be used here as
- * well, if needed.
+ * mq_bytes_read and mq_bytes_written are not protected by the mutex.  Instead,
+ * they are written atomically using 8 byte loads and stores.  Memory barriers
+ * must be carefully used to synchronize reads and writes of these values with
+ * reads and writes of the actual data in mq_ring.
+ *
+ * mq_detached needs no locking.  It can be set by either the sender or the
+ * receiver, but only ever from false to true, so redundant writes don't
+ * matter.  It is important that if we set mq_detached and then set the
+ * counterparty's latch, the counterparty must be certain to see the change
+ * after waking up.  Since SetLatch begins with a memory barrier and ResetLatch
+ * ends with one, this should be OK.
  *
  * mq_ring_size and mq_ring_offset never change after initialization, and
  * can therefore be read without the lock.
  *
- * Importantly, mq_ring can be safely read and written without a lock.  Were
- * this not the case, we'd have to hold the spinlock for much longer
- * intervals, and performance might suffer.  Fortunately, that's not
- * necessary.  At any given time, the difference between mq_bytes_read and
+ * Importantly, mq_ring can be safely read and written without a lock.
+ * At any given time, the difference between mq_bytes_read and
  * mq_bytes_written defines the number of bytes within mq_ring that contain
  * unread data, and mq_bytes_read defines the position where those bytes
  * begin.  The sender can increase the number of unread bytes at any time,
@@ -71,8 +72,8 @@ struct shm_mq
 	slock_t		mq_mutex;
 	PGPROC	   *mq_receiver;
 	PGPROC	   *mq_sender;
-	uint64		mq_bytes_read;
-	uint64		mq_bytes_written;
+	pg_atomic_uint64 mq_bytes_read;
+	pg_atomic_uint64 mq_bytes_written;
 	Size		mq_ring_size;
 	bool		mq_detached;
 	uint8		mq_ring_offset;
@@ -150,11 +151,8 @@ static bool shm_mq_counterparty_gone(volatile shm_mq *mq,
 						 BackgroundWorkerHandle *handle);
 static bool shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 					 BackgroundWorkerHandle *handle);
-static uint64 shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n);
-static uint64 shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached);
 static void shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n);
-static shm_mq_result shm_mq_notify_receiver(volatile shm_mq *mq);
 static void shm_mq_detach_callback(dsm_segment *seg, Datum arg);
 
 /* Minimum queue size is enough for header and at least one chunk of data. */
@@ -182,8 +180,8 @@ shm_mq_create(void *address, Size size)
 	SpinLockInit(&mq->mq_mutex);
 	mq->mq_receiver = NULL;
 	mq->mq_sender = NULL;
-	mq->mq_bytes_read = 0;
-	mq->mq_bytes_written = 0;
+	pg_atomic_init_u64(&mq->mq_bytes_read, 0);
+	pg_atomic_init_u64(&mq->mq_bytes_written, 0);
 	mq->mq_ring_size = size - data_offset;
 	mq->mq_detached = false;
 	mq->mq_ring_offset = data_offset - offsetof(shm_mq, mq_ring);
@@ -352,6 +350,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 {
 	shm_mq_result res;
 	shm_mq	   *mq = mqh->mqh_queue;
+	PGPROC	   *receiver;
 	Size		nbytes = 0;
 	Size		bytes_written;
 	int			i;
@@ -492,8 +491,30 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_length_word_complete = false;
 
+	/* If queue has been detached, let caller know. */
+	if (mq->mq_detached)
+		return SHM_MQ_DETACHED;
+
+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */
+	if (mqh->mqh_counterparty_attached)
+		receiver = mq->mq_receiver;
+	else
+	{
+		SpinLockAcquire(&mq->mq_mutex);
+		receiver = mq->mq_receiver;
+		SpinLockRelease(&mq->mq_mutex);
+		if (receiver == NULL)
+			return SHM_MQ_SUCCESS;
+		mqh->mqh_counterparty_attached = true;
+	}
+
 	/* Notify receiver of the newly-written data, and return. */
-	return shm_mq_notify_receiver(mq);
+	SetLatch(&receiver->procLatch);
+	return SHM_MQ_SUCCESS;
 }
 
 /*
@@ -848,18 +869,28 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 
 	while (sent < nbytes)
 	{
-		bool		detached;
 		uint64		rb;
+		uint64		wb;
 
 		/* Compute number of ring buffer bytes used and available. */
-		rb = shm_mq_get_bytes_read(mq, &detached);
-		Assert(mq->mq_bytes_written >= rb);
-		used = mq->mq_bytes_written - rb;
+		rb = pg_atomic_read_u64(&mq->mq_bytes_read);
+		wb = pg_atomic_read_u64(&mq->mq_bytes_written);
+		Assert(wb >= rb);
+		used = wb - rb;
 		Assert(used <= ringsize);
 		available = Min(ringsize - used, nbytes - sent);
 
-		/* Bail out if the queue has been detached. */
-		if (detached)
+		/*
+		 * Bail out if the queue has been detached.  Note that we would be in
+		 * trouble if the compiler decided to cache the value of
+		 * mq->mq_detached in a register or on the stack across loop
+		 * iterations.  It probably shouldn't do that anyway since we'll
+		 * always return, call an external function that performs a system
+		 * call, or reach a memory barrier at some point later in the loop,
+		 * but just to be sure, insert a compiler barrier here.
+		 */
+		pg_compiler_barrier();
+		if (mq->mq_detached)
 		{
 			*bytes_written = sent;
 			return SHM_MQ_DETACHED;
@@ -900,15 +931,13 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else if (available == 0)
 		{
-			shm_mq_result res;
-
-			/* Let the receiver know that we need them to read some data. */
-			res = shm_mq_notify_receiver(mq);
-			if (res != SHM_MQ_SUCCESS)
-			{
-				*bytes_written = sent;
-				return res;
-			}
+			/*
+			 * Since mq->mqh_counterparty_attached is known to be true at this
+			 * point, mq_receiver has been set, and it can't change once set.
+			 * Therefore, we can read it without acquiring the spinlock.
+			 */
+			Assert(mqh->mqh_counterparty_attached);
+			SetLatch(&mq->mq_receiver->procLatch);
 
 			/* Skip manipulation of our latch if nowait = true. */
 			if (nowait)
@@ -934,10 +963,20 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 		}
 		else
 		{
-			Size		offset = mq->mq_bytes_written % (uint64) ringsize;
-			Size		sendnow = Min(available, ringsize - offset);
+			Size		offset;
+			Size		sendnow;
 
-			/* Write as much data as we can via a single memcpy(). */
+			offset = wb % (uint64) ringsize;
+			sendnow = Min(available, ringsize - offset);
+
+			/*
+			 * Write as much data as we can via a single memcpy(). Make sure
+			 * these writes happen after the read of mq_bytes_read, above.
+			 * This barrier pairs with the one in shm_mq_inc_bytes_read.
+			 * (Since we're separating the read of mq_bytes_read from a
+			 * subsequent write to mq_ring, we need a full barrier here.)
+			 */
+			pg_memory_barrier();
 			memcpy(&mq->mq_ring[mq->mq_ring_offset + offset],
 				   (char *) data + sent, sendnow);
 			sent += sendnow;
@@ -983,19 +1022,27 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 	for (;;)
 	{
 		Size		offset;
-		bool		detached;
+		uint64		read;
 
 		/* Get bytes written, so we can compute what's available to read. */
-		written = shm_mq_get_bytes_written(mq, &detached);
-		used = written - mq->mq_bytes_read;
+		written = pg_atomic_read_u64(&mq->mq_bytes_written);
+		read = pg_atomic_read_u64(&mq->mq_bytes_read);
+		used = written - read;
 		Assert(used <= ringsize);
-		offset = mq->mq_bytes_read % (uint64) ringsize;
+		offset = read % (uint64) ringsize;
 
 		/* If we have enough data or buffer has wrapped, we're done. */
 		if (used >= bytes_needed || offset + used >= ringsize)
 		{
 			*nbytesp = Min(used, ringsize - offset);
 			*datap = &mq->mq_ring[mq->mq_ring_offset + offset];
+
+			/*
+			 * Separate the read of mq_bytes_written, above, from caller's
+			 * attempt to read the data itself.  Pairs with the barrier in
+			 * shm_mq_inc_bytes_written.
+			 */
+			pg_read_barrier();
 			return SHM_MQ_SUCCESS;
 		}
 
@@ -1007,7 +1054,7 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 		 * receiving a message stored in the buffer even after the sender has
 		 * detached.
 		 */
-		if (detached)
+		if (mq->mq_detached)
 			return SHM_MQ_DETACHED;
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1037,16 +1084,10 @@ shm_mq_receive_bytes(shm_mq *mq, Size bytes_needed, bool nowait,
 static bool
 shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 {
-	bool		detached;
 	pid_t		pid;
 
-	/* Acquire the lock just long enough to check the pointer. */
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
 	/* If the queue has been detached, counterparty is definitely gone. */
-	if (detached)
+	if (mq->mq_detached)
 		return true;
 
 	/* If there's a handle, check worker status. */
@@ -1059,9 +1100,7 @@ shm_mq_counterparty_gone(volatile shm_mq *mq, BackgroundWorkerHandle *handle)
 		if (status != BGWH_STARTED && status != BGWH_NOT_YET_STARTED)
 		{
 			/* Mark it detached, just to make it official. */
-			SpinLockAcquire(&mq->mq_mutex);
 			mq->mq_detached = true;
-			SpinLockRelease(&mq->mq_mutex);
 			return true;
 		}
 	}
@@ -1091,16 +1130,14 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	{
 		BgwHandleStatus status;
 		pid_t		pid;
-		bool		detached;
 
 		/* Acquire the lock just long enough to check the pointer. */
 		SpinLockAcquire(&mq->mq_mutex);
-		detached = mq->mq_detached;
 		result = (*ptr != NULL);
 		SpinLockRelease(&mq->mq_mutex);
 
 		/* Fail if detached; else succeed if initialized. */
-		if (detached)
+		if (mq->mq_detached)
 		{
 			result = false;
 			break;
@@ -1132,23 +1169,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 	return result;
 }
 
-/*
- * Get the number of bytes read.  The receiver need not use this to access
- * the count of bytes read, but the sender must.
- */
-static uint64
-shm_mq_get_bytes_read(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_read;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
 /*
  * Increment the number of bytes read.
  */
@@ -1157,63 +1177,50 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
 	PGPROC	   *sender;
 
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_read += n;
+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes(). We
+	 * only need a read barrier here because the increment of mq_bytes_read is
+	 * actually a read followed by a dependent write.
+	 */
+	pg_read_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method should be cheaper.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_read,
+						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
+
+	/*
+	 * We shouldn't have any bytes to read without a sender, so we can read
+	 * mq_sender here without a lock.  Once it's initialized, it can't change.
+	 */
 	sender = mq->mq_sender;
-	SpinLockRelease(&mq->mq_mutex);
-
-	/* We shouldn't have any bytes to read without a sender. */
 	Assert(sender != NULL);
 	SetLatch(&sender->procLatch);
 }
 
-/*
- * Get the number of bytes written.  The sender need not use this to access
- * the count of bytes written, but the receiver must.
- */
-static uint64
-shm_mq_get_bytes_written(volatile shm_mq *mq, bool *detached)
-{
-	uint64		v;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	v = mq->mq_bytes_written;
-	*detached = mq->mq_detached;
-	SpinLockRelease(&mq->mq_mutex);
-
-	return v;
-}
-
 /*
  * Increment the number of bytes written.
  */
 static void
 shm_mq_inc_bytes_written(volatile shm_mq *mq, Size n)
 {
-	SpinLockAcquire(&mq->mq_mutex);
-	mq->mq_bytes_written += n;
-	SpinLockRelease(&mq->mq_mutex);
-}
-
-/*
- * Set receiver's latch, unless queue is detached.
- */
-static shm_mq_result
-shm_mq_notify_receiver(volatile shm_mq *mq)
-{
-	PGPROC	   *receiver;
-	bool		detached;
-
-	SpinLockAcquire(&mq->mq_mutex);
-	detached = mq->mq_detached;
-	receiver = mq->mq_receiver;
-	SpinLockRelease(&mq->mq_mutex);
-
-	if (detached)
-		return SHM_MQ_DETACHED;
-	if (receiver)
-		SetLatch(&receiver->procLatch);
-	return SHM_MQ_SUCCESS;
+	/*
+	 * Separate prior reads of mq_ring from the write of mq_bytes_written
+	 * which we're about to do.  Pairs with the read barrier found in
+	 * shm_mq_get_receive_bytes.
+	 */
+	pg_write_barrier();
+
+	/*
+	 * There's no need to use pg_atomic_fetch_add_u64 here, because nobody
+	 * else can be changing this value.  This method avoids taking the bus
+	 * lock unnecessarily.
+	 */
+	pg_atomic_write_u64(&mq->mq_bytes_written,
+						pg_atomic_read_u64(&mq->mq_bytes_written) + n);
 }
 
 /* Shim for on_dsm_callback. */
-- 
2.14.3 (Apple Git-98)

two-separate-counters.patchapplication/octet-stream; name=two-separate-counters.patchDownload
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 02a5df8da9..7e1a555094 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -107,6 +107,10 @@ struct shm_mq
  * locally by copying the chunks into a backend-local buffer.  mqh_buffer is
  * the buffer, and mqh_buflen is the number of bytes allocated for it.
  *
+ * mqh_increment_pending is the number of bytes that we have locally consumed
+ * but not yet marked added to mq_bytes_read, while mqh_signal_pending is the
+ * number of bytes we have consumed without signalling the sender.
+ *
  * mqh_partial_bytes, mqh_expected_bytes, and mqh_length_word_complete
  * are used to track the state of non-blocking operations.  When the caller
  * attempts a non-blocking operation that returns SHM_MQ_WOULD_BLOCK, they
@@ -134,7 +138,8 @@ struct shm_mq_handle
 	BackgroundWorkerHandle *mqh_handle;
 	char	   *mqh_buffer;
 	Size		mqh_buflen;
-	Size		mqh_consume_pending;
+	Size		mqh_increment_pending;
+	Size		mqh_signal_pending;
 	Size		mqh_partial_bytes;
 	Size		mqh_expected_bytes;
 	bool		mqh_length_word_complete;
@@ -293,7 +298,8 @@ shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle *handle)
 	mqh->mqh_handle = handle;
 	mqh->mqh_buffer = NULL;
 	mqh->mqh_buflen = 0;
-	mqh->mqh_consume_pending = 0;
+	mqh->mqh_increment_pending = 0;
+	mqh->mqh_signal_pending = 0;
 	mqh->mqh_partial_bytes = 0;
 	mqh->mqh_expected_bytes = 0;
 	mqh->mqh_length_word_complete = false;
@@ -594,10 +600,15 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 	 * because SetLatch() is fairly expensive and we don't want to do it too
 	 * often.
 	 */
-	if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
+	if (mqh->mqh_increment_pending > 0)
+	{
+		shm_mq_inc_bytes_read(mq, mqh->mqh_increment_pending);
+		mqh->mqh_increment_pending = 0;
+	}
+	if (mqh->mqh_signal_pending > mq->mq_ring_size / 4)
 	{
-		shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
-		mqh->mqh_consume_pending = 0;
+		SetLatch(&mq->mq_sender->procLatch);
+		mqh->mqh_signal_pending = 0;
 	}
 
 	/* Try to read, or finish reading, the length word from the buffer. */
@@ -625,7 +636,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			needed = MAXALIGN(sizeof(Size)) + MAXALIGN(nbytes);
 			if (rb >= needed)
 			{
-				mqh->mqh_consume_pending += needed;
+				mqh->mqh_increment_pending += needed;
+				mqh->mqh_signal_pending += needed;
 				*nbytesp = nbytes;
 				*datap = ((char *) rawdata) + MAXALIGN(sizeof(Size));
 				return SHM_MQ_SUCCESS;
@@ -637,7 +649,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			 */
 			mqh->mqh_expected_bytes = nbytes;
 			mqh->mqh_length_word_complete = true;
-			mqh->mqh_consume_pending += MAXALIGN(sizeof(Size));
+			mqh->mqh_increment_pending += MAXALIGN(sizeof(Size));
+			mqh->mqh_signal_pending += MAXALIGN(sizeof(Size));
 			rb -= MAXALIGN(sizeof(Size));
 		}
 		else
@@ -664,7 +677,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			memcpy(&mqh->mqh_buffer[mqh->mqh_partial_bytes], rawdata,
 				   lengthbytes);
 			mqh->mqh_partial_bytes += lengthbytes;
-			mqh->mqh_consume_pending += MAXALIGN(lengthbytes);
+			mqh->mqh_increment_pending += MAXALIGN(lengthbytes);
+			mqh->mqh_signal_pending += MAXALIGN(lengthbytes);
 			rb -= lengthbytes;
 
 			/* If we now have the whole word, we're ready to read payload. */
@@ -692,7 +706,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		if (rb >= nbytes)
 		{
 			mqh->mqh_length_word_complete = false;
-			mqh->mqh_consume_pending += MAXALIGN(nbytes);
+			mqh->mqh_increment_pending += MAXALIGN(nbytes);
+			mqh->mqh_signal_pending += MAXALIGN(nbytes);
 			*nbytesp = nbytes;
 			*datap = rawdata;
 			return SHM_MQ_SUCCESS;
@@ -738,7 +753,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 		 * a multiple of MAXIMUM_ALIGNOF, and each read and write is as well.
 		 */
 		Assert(mqh->mqh_partial_bytes == nbytes || rb == MAXALIGN(rb));
-		mqh->mqh_consume_pending += MAXALIGN(rb);
+		mqh->mqh_increment_pending += MAXALIGN(rb);
+		mqh->mqh_signal_pending += MAXALIGN(rb);
 
 		/* If we got all the data, exit the loop. */
 		if (mqh->mqh_partial_bytes >= nbytes)
@@ -1034,7 +1050,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 		 * consumed.
 		 */
 		read = pg_atomic_read_u64(&mq->mq_bytes_read) +
-			mqh->mqh_consume_pending;
+			mqh->mqh_increment_pending;
 		used = written - read;
 		Assert(used <= ringsize);
 		offset = read % (uint64) ringsize;
@@ -1067,12 +1083,18 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 
 		/*
 		 * We didn't get enough data to satisfy the request, so mark any data
-		 * previously-consumed as read to make more buffer space.
+		 * previously-consumed as read to make more buffer space, and signal
+		 * the sender as necessary to make sure they're not waiting for us.
 		 */
-		if (mqh->mqh_consume_pending > 0)
+		if (mqh->mqh_increment_pending > 0)
+		{
+			shm_mq_inc_bytes_read(mq, mqh->mqh_increment_pending);
+			mqh->mqh_increment_pending = 0;
+		}
+		if (mqh->mqh_signal_pending > 0)
 		{
-			shm_mq_inc_bytes_read(mq, mqh->mqh_consume_pending);
-			mqh->mqh_consume_pending = 0;
+			SetLatch(&mq->mq_sender->procLatch);
+			mqh->mqh_signal_pending = 0;
 		}
 
 		/* Skip manipulation of our latch if nowait = true. */
@@ -1193,8 +1215,6 @@ shm_mq_wait_internal(volatile shm_mq *mq, PGPROC *volatile *ptr,
 static void
 shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 {
-	PGPROC	   *sender;
-
 	/*
 	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
 	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes(). We
@@ -1209,14 +1229,6 @@ shm_mq_inc_bytes_read(volatile shm_mq *mq, Size n)
 	 */
 	pg_atomic_write_u64(&mq->mq_bytes_read,
 						pg_atomic_read_u64(&mq->mq_bytes_read) + n);
-
-	/*
-	 * We shouldn't have any bytes to read without a sender, so we can read
-	 * mq_sender here without a lock.  Once it's initialized, it can't change.
-	 */
-	sender = mq->mq_sender;
-	Assert(sender != NULL);
-	SetLatch(&sender->procLatch);
 }
 
 /*
#62Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#61)
Re: [HACKERS] [POC] Faster processing at Gather node

On Wed, Feb 28, 2018 at 10:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:

[ latest patches ]

Committed. Thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#63Tels
nospam-pg-abuse@bloodgate.com
In reply to: Robert Haas (#62)
Re: [HACKERS] [POC] Faster processing at Gather node

Hello Robert,

On Fri, March 2, 2018 12:22 pm, Robert Haas wrote:

On Wed, Feb 28, 2018 at 10:06 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

[ latest patches ]

Committed. Thanks for the review.

Cool :)

There is a typo, tho:

+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */

s/counterpary/counterparty/;

Sorry, only noticed while re-reading the thread.

Also, either a double space is missing, or one is too many:

+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes(). We
+	 * only need a read barrier here because the increment of mq_bytes_read is
+	 * actually a read followed by a dependent write.
+	 */

(" Pairs ..." vs. ". We only ...")

Best regards,

Tels

#64Bruce Momjian
bruce@momjian.us
In reply to: Tels (#63)
1 attachment(s)
Re: [HACKERS] [POC] Faster processing at Gather node

On Fri, Mar 2, 2018 at 05:21:28PM -0500, Tels wrote:

Hello Robert,

On Fri, March 2, 2018 12:22 pm, Robert Haas wrote:

On Wed, Feb 28, 2018 at 10:06 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

[ latest patches ]

Committed. Thanks for the review.

Cool :)

There is a typo, tho:

+	/*
+	 * If the counterpary is known to have attached, we can read mq_receiver
+	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+	 * more caution is needed.
+	 */

s/counterpary/counterparty/;

Sorry, only noticed while re-reading the thread.

Also, either a double space is missing, or one is too many:

+	/*
+	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
+	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes(). We
+	 * only need a read barrier here because the increment of mq_bytes_read is
+	 * actually a read followed by a dependent write.
+	 */

(" Pairs ..." vs. ". We only ...")

Best regards,

Change applied with the attached patch.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

shm.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
new file mode 100644
index 3faace2..c80cb6e
*** a/src/backend/storage/ipc/shm_mq.c
--- b/src/backend/storage/ipc/shm_mq.c
*************** shm_mq_sendv(shm_mq_handle *mqh, shm_mq_
*** 493,499 ****
  		return SHM_MQ_DETACHED;
  
  	/*
! 	 * If the counterpary is known to have attached, we can read mq_receiver
  	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
  	 * more caution is needed.
  	 */
--- 493,499 ----
  		return SHM_MQ_DETACHED;
  
  	/*
! 	 * If the counterparty is known to have attached, we can read mq_receiver
  	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
  	 * more caution is needed.
  	 */
*************** shm_mq_inc_bytes_read(shm_mq *mq, Size n
*** 1203,1211 ****
  
  	/*
  	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
! 	 * which follows.  Pairs with the full barrier in shm_mq_send_bytes(). We
! 	 * only need a read barrier here because the increment of mq_bytes_read is
! 	 * actually a read followed by a dependent write.
  	 */
  	pg_read_barrier();
  
--- 1203,1211 ----
  
  	/*
  	 * Separate prior reads of mq_ring from the increment of mq_bytes_read
! 	 * which follows.  This pairs with the full barrier in shm_mq_send_bytes().
! 	 * We only need a read barrier here because the increment of mq_bytes_read
! 	 * is actually a read followed by a dependent write.
  	 */
  	pg_read_barrier();