contrib/cache_scan (Re: What's needed for cache-only table scan?)

Started by Kohei KaiGaialmost 12 years ago25 messages
#1Kohei KaiGai
kaigai@kaigai.gr.jp
1 attachment(s)

Hello,

The attached patch is what we discussed just before the commit-fest:Nov.

It implements an alternative way to scan a particular table using on-memory
cache instead of the usual heap access method. Unlike buffer cache, this
mechanism caches a limited number of columns on the memory, so memory
consumption per tuple is much smaller than the regular heap access method,
thus it allows much larger number of tuples on the memory.

I'd like to extend this idea to implement a feature to cache data according to
column-oriented data structure to utilize parallel calculation processors like
CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to
evaluate multiple records with a single vector instruction if contents of
a particular column is put as a large array.)
However, this patch still keeps all the tuples in row-oriented data format,
because row <=> column translation makes this patch bigger than the
current form (about 2KL), and GPU integration needs to link proprietary
library (cuda or opencl) thus I thought it is not preferable for the upstream
code.

Also note that this patch needs part-1 ~ part-3 patches of CustomScan
APIs as prerequisites because it is implemented on top of the APIs.

One thing I have to apologize is, lack of documentation and source code
comments around the contrib/ code. Please give me a couple of days to
clean-up the code.
Aside from the extension code, I put two enhancement on the core code
as follows. I'd like to have a discussion about adequacy of these enhancement.

The first enhancement is a hook on heap_page_prune() to synchronize
internal state of extension with changes of heap image on the disk.
It is not avoidable to hold garbage, increasing time by time, on the cache,
thus needs to clean up as vacuum process doing. The best timing to do
is when dead tuples are reclaimed because it is certain nobody will
reference the tuples any more.

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
    bool        marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
                 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, Transacti
onId OldestXmin,
     * and update FSM with the remaining space.
     */
+   /*
+    * This callback allows extensions to synchronize their own status with
+    * heap image on the disk, when this buffer page is vacuumed.
+    */
+   if (heap_page_prune_hook)
+       (*heap_page_prune_hook)(relation,
+                               buffer,
+                               ndeleted,
+                               OldestXmin,
+                               prstate.latestRemovedXid);
    return ndeleted;
 }

The second enhancement makes SetHintBits() accepts InvalidBuffer to
ignore all the jobs. We need to check visibility of cached tuples when
custom-scan node scans cached table instead of the heap.
Even though we can use MVCC snapshot to check tuple's visibility,
it may internally set hint bit of tuples thus we always needs to give
a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately,
it kills all the benefit of table cache if it takes to load the heap buffer
being associated with the cached tuple.
So, I'd like to have a special case handling on the SetHintBits() for
dry-run when InvalidBuffer is given.

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid,
Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
            uint16 infomask, TransactionId xid)
 {
+   if (BufferIsInvalid(buffer))
+       return;
+
    if (TransactionIdIsValid(xid))
    {
        /* NB: xid must be known committed here! */

Thanks,

2013/11/13 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

So, are you thinking it is a feasible approach to focus on custom-scan
APIs during the upcoming CF3, then table-caching feature as use-case
of this APIs on CF4?

Sure. If you work on this extension after CF3, and it reveals that the
custom scan stuff needs some adjustments, there would be time to do that
in CF4. The policy about what can be submitted in CF4 is that we don't
want new major features that no one has seen before, not that you can't
make fixes to previously submitted stuff. Something like a new hook
in vacuum wouldn't be a "major feature", anyway.

Thanks for this clarification.
3 days are too short to write a patch, however, 2 month may be sufficient
to develop a feature on top of the scheme being discussed in the previous
comitfest.

Best regards,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-4.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-4.v5.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    2 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   83 ++
 contrib/cache_scan/ccache.c                        | 1395 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         |  668 ++++++++++
 src/backend/access/heap/pruneheap.c                |   13 +
 src/backend/utils/time/tqual.c                     |    7 +
 src/include/access/heapam.h                        |    7 +
 10 files changed, 2225 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..4e68b68
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/dbcache/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..43567e2
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_invalidation_trigger()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..04b53ef
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,2 @@
+DROP FUNCTION public.cache_scan_invalidation_trigger() CASCADE;
+
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..79b9f1e
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,83 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLockId		lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern void ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_invalidation_trigger(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+
+#define CS_DEBUG(fmt,...)	\
+	elog(INFO, "%s:%d " fmt, __FUNCTION__, __LINE__, __VA_ARGS__)
+
+static inline const char *
+ctid_to_cstring(ItemPointer ctid)
+{
+	char buf[1024];
+
+	snprintf(buf, sizeof(buf), "(%u,%u)",
+			 ctid->ip_blkid.bi_hi << 16 | ctid->ip_blkid.bi_lo,
+			 ctid->ip_posid);
+	return pstrdup(buf);
+}
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..bf70266
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1395 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	volatile int	lwlocks_usage;
+	LWLockId	   *lwlocks;
+	dlist_head	   *slots;
+} ccache_hash;
+
+/*
+ * Data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory, then it shall be split into the fixed-length memory blocks.
+ * All the memory allocation and relase are done by block, to avoid memory
+ * fragmentation that eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Its blocks[] array contains shmseg_block structures that points a particular
+ * address of the associated memory block.
+ * The shmseg_block being chained in the free_list of shmseg_head are available
+ * to allocate. Elsewhere, this block is already allocated on somewhere.
+ */
+typedef struct {
+	dlist_node		chain;
+	Size			address;
+} shmseg_block;
+
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	shmseg_block	blocks[FLEXIBLE_ARRAY_MEMBER];	
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define MAX_DEPTH(chunk) Max((chunk)->l_depth, (chunk)->r_depth)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = MAX_DEPTH(cchunk->right) + 1;
+		cchunk->upper = rchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = MAX_DEPTH(rchunk->left) + 1;
+		rchunk->upper = upper;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = MAX_DEPTH(rchunk) + 1;
+		}
+		else
+		{
+			upper->right = rchunk;
+			upper->r_depth = MAX_DEPTH(rchunk) + 1;
+		}
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = MAX_DEPTH(cchunk->left) + 1;
+		cchunk->upper = lchunk;
+
+		lchunk->right = cchunk;
+		lchunk->l_depth = MAX_DEPTH(lchunk->right) + 1;
+		lchunk->upper = upper;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = MAX_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			upper->left = lchunk;
+			upper->l_depth = MAX_DEPTH(lchunk) + 1;
+		}
+	}
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups + 1]))
+
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	int			i, required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);
+
+	Assert(required <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		HeapTuple	movtup = cchunk->tuples[i_min];
+		Size		movlen = HEAPTUPLESIZE + MAXALIGN(movtup->t_len);
+		char	   *destaddr = (char *)movtup + movlen - required;
+
+		Assert(ItemPointerCompare(&tuple->t_self, &movtup->t_self) < 0);
+
+		memmove((char *)cchunk + cchunk->usage - required,
+				(char *)cchunk + cchunk->usage,
+				((Size)movtup + movlen) - ((Size)cchunk + cchunk->usage));
+		for (i=cchunk->ntups; i > i_min; i--)
+		{
+			HeapTuple	temp;
+
+			temp = (HeapTuple)((char *)cchunk->tuples[i-1] - required);
+			cchunk->tuples[i] = temp;
+			temp->t_data = (HeapTupleHeader)((char *)temp->t_data - required);
+		}
+		cchunk->tuples[i_min] = newtup = (HeapTuple)destaddr;
+		memcpy(newtup, tuple, HEAPTUPLESIZE);
+		newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+		memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+		cchunk->usage -= required;
+		cchunk->ntups++;
+
+		Assert(cchunk->usage >= offsetof(ccache_chunk, tuples[cchunk->ntups]));
+	}
+	else
+	{
+		cchunk->usage -= required;
+		newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+		memcpy(newtup, tuple, HEAPTUPLESIZE);
+		newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+		memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+		cchunk->tuples[i_min] = newtup;
+		cchunk->ntups++;
+
+		Assert(cchunk->usage >= offsetof(ccache_chunk, tuples[cchunk->ntups]));
+	}
+	Assert(cchunk->ntups < 10000);
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	Assert(cchunk->ntups > 0);
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left && required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right && required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+		}
+	}
+	else
+	{
+		if (required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->usage += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	ccache_chunk *cchunk;
+	int			required;
+	int			i, j;
+	bool		rc;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	cchunk = ccache->root_chunk;
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+		rc = true;
+	}
+	else
+		rc = ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+
+	return rc;
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion. 
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_chunk
+ *
+ * It merges two chunks if these have enough free space to consolidate
+ * its contents into one.
+ */
+static void
+ccache_merge_chunk(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	ccache_chunk   *curr;
+	ccache_chunk  **upper;
+	int			   *p_depth;
+	int				i;
+	bool			needs_rebalance = false;
+
+	/* find the least right node that has no left node */
+	upper = &cchunk->right;
+	p_depth = &cchunk->r_depth;
+	curr = cchunk->right;
+	while (curr != NULL)
+	{
+		if (!curr->left)
+		{
+			Size	shift = shmseg_blocksize - curr->usage;
+			Size	total_usage = cchunk->usage - shift;
+			int		total_ntups = cchunk->ntups + curr->ntups;
+
+			if (offsetof(ccache_chunk, tuples[total_ntups]) < total_usage)
+			{
+				ccache_chunk   *rchunk = curr->right;
+
+				/* merge contents */
+				for (i=0; i < curr->ntups; i++)
+				{
+					HeapTuple	oldtup = curr->tuples[i];
+					HeapTuple	newtup;
+
+					cchunk->usage -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+					newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+					memcpy(newtup, oldtup, HEAPTUPLESIZE);
+					newtup->t_data
+						= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+					memcpy(newtup->t_data, oldtup->t_data,
+						   MAXALIGN(oldtup->t_len));
+
+					cchunk->tuples[cchunk->ntups++] = newtup;
+				}
+
+				/* detach the current chunk */
+				*upper = curr->right;
+				*p_depth = curr->r_depth;
+				if (rchunk)
+					rchunk->upper = curr->upper;
+				/* release it */
+				cs_free_shmblock(curr);
+				needs_rebalance = true;
+			}
+			break;
+		}
+		upper = &curr->left;
+		p_depth = &curr->l_depth;
+		curr = cchunk->left;
+	}
+
+	/* find the greatest left node that has no right node */
+	upper = &cchunk->left;
+	p_depth = &cchunk->l_depth;
+	curr = cchunk->left;
+	while (curr != NULL)
+	{
+		if (!curr->right)
+		{
+			Size	shift = shmseg_blocksize - curr->usage;
+			Size	total_usage = cchunk->usage - shift;
+			int		total_ntups = cchunk->ntups + curr->ntups;
+
+			if (offsetof(ccache_chunk, tuples[total_ntups]) < total_usage)
+			{
+				ccache_chunk   *lchunk = curr->left;
+				Size			offset;
+
+				/* merge contents */
+				memmove((char *)cchunk + cchunk->usage - shift,
+						(char *)cchunk + cchunk->usage,
+						shmseg_blocksize - cchunk->usage);
+				for (i=cchunk->ntups - 1; i >= 0; i--)
+				{
+					HeapTuple	temp
+						= (HeapTuple)((char *)cchunk->tuples[i] - shift);
+
+					cchunk->tuples[curr->ntups + i] = temp;
+					temp->t_data = (HeapTupleHeader)((char *)temp +
+													 HEAPTUPLESIZE);
+				}
+				cchunk->usage -= shift;
+				cchunk->ntups += curr->ntups;
+
+				/* merge contents */
+				offset = shmseg_blocksize;
+				for (i=0; i < curr->ntups; i++)
+				{
+					HeapTuple	oldtup = curr->tuples[i];
+					HeapTuple	newtup;
+
+					offset -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+					newtup = (HeapTuple)((char *)cchunk + offset);
+					memcpy(newtup, oldtup, HEAPTUPLESIZE);
+					newtup->t_data
+						= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+					memcpy(newtup->t_data, oldtup->t_data,
+						   MAXALIGN(oldtup->t_len));
+					cchunk->tuples[i] = newtup;
+				}
+
+				/* detach the current chunk */
+				*upper = curr->left;
+				*p_depth = curr->l_depth;
+				if (lchunk)
+					lchunk->upper = curr->upper;
+				/* release it */
+				cs_free_shmblock(curr);
+				needs_rebalance = true;
+			}
+			break;
+		}
+		upper = &curr->right;
+		p_depth = &curr->r_depth;
+		curr = cchunk->right;
+	}
+	/* Rebalance the tree, if needed */
+	if (needs_rebalance)
+		ccache_rebalance_tree(ccache, cchunk);
+}
+
+/*
+ * ccache_vacuum_page
+ *
+ * It reclaims the tuples being already vacuumed. It shall be kicked on
+ * the callback function of heap_page_prune_hook to synchronize contents
+ * of the cache with on-disk image.
+ */
+static void
+ccache_vacuum_tuple(ccache_head *ccache,
+					ccache_chunk *cchunk,
+					ItemPointer ctid)
+{
+	ItemPointer	min_ctid;
+	ItemPointer	max_ctid;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+
+	if (cchunk->ntups == 0)
+		return;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (cchunk->left)
+			ccache_vacuum_tuple(ccache, cchunk->left, ctid);
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (cchunk->right)
+			ccache_vacuum_tuple(ccache, cchunk->right, ctid);
+	}
+	else
+	{
+		while (i_min < i_max)
+		{
+			int	i_mid = (i_min + i_max) / 2;
+
+			if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+				i_max = i_mid;
+			else
+				i_min = i_mid + 1;
+		}
+		Assert(i_min == i_max);
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+		{
+			HeapTuple	tuple = cchunk->tuples[i_min];
+			int			length = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+
+			if (i_min < cchunk->ntups - 1)
+			{
+				int		j;
+
+				memmove((char *)cchunk + cchunk->usage + length,
+						(char *)cchunk + cchunk->usage,
+						(Size)tuple - ((Size)cchunk + cchunk->usage));
+				for (j=i_min + 1; j < cchunk->ntups; j++)
+				{
+					HeapTuple	temp;
+
+					temp = (HeapTuple)((char *)cchunk->tuples[j] + length);
+					cchunk->tuples[j-1] = temp;
+					temp->t_data
+						= (HeapTupleHeader)((char *)temp->t_data + length);
+				}
+			}
+			cchunk->usage += length;
+			cchunk->ntups--;
+		}
+	}
+	/* merge chunks if this chunk has enough space to merge */
+	ccache_merge_chunk(ccache, cchunk);
+}
+
+void
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* XXX it needs buffer is valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	Page			page = BufferGetPage(buffer);
+	OffsetNumber	maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemPointerData	ctid;
+		ItemId			itemid = PageGetItemId(page, offnum);
+
+		if (ItemIdIsNormal(itemid))
+			continue;
+
+		ItemPointerSetBlockNumber(&ctid, blknum);
+		ItemPointerSetOffsetNumber(&ctid, offnum);
+
+		ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+	}
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		int		i;
+
+		PG_TRY();
+		{
+			for (i=0; i < 20; i++)
+			{
+				entry = MemoryContextAlloc(TopMemoryContext,
+										   sizeof(ccache_entry));
+				dlist_push_tail(&ccache_free_list, &entry->chain);
+			}
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+static void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+	int				i;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	i = cs_ccache_hash->lwlocks_usage++ % ccache_hash_size;
+	new_cache->lock = cs_ccache_hash->lwlocks[i];
+	new_cache->refcnt = 2;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			if (old_cache)
+				attrs_used = bms_union(attrs_used, &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	shmseg_block   *block;
+	void		   *address = NULL;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			cs_put_ccache_nolock(ccache);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+		block = dlist_container(shmseg_block, chain, dnode);
+
+		memset(&block->chain, 0, sizeof(dlist_node));
+
+		address = (void *) block->address;
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size	curr = (Size) address;
+	Size	base = cs_shmseg_head->base_address;
+	ulong	index;
+	shmseg_block *block;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	block = &cs_shmseg_head->blocks[index];
+
+	dlist_push_head(&cs_shmseg_head->free_list, &block->chain);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	Size	curr_address;
+	ulong	i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash = ShmemInitStruct("cache_scan: hash of columnar cache",
+									 MAXALIGN(sizeof(ccache_hash)) +
+									 MAXALIGN(sizeof(LWLockId) *
+											  ccache_hash_size) +
+									 MAXALIGN(sizeof(dlist_node) *
+											  ccache_hash_size),
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	cs_ccache_hash->lwlocks = (void *)(&cs_ccache_hash[1]);
+	cs_ccache_hash->slots
+		= (void *)(&cs_ccache_hash->lwlocks[ccache_hash_size]);
+
+	for (i=0; i < ccache_hash_size; i++)
+		cs_ccache_hash->lwlocks[i] = LWLockAssign();
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 shmseg_num_blocks * shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+
+	curr_address = MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+
+	cs_shmseg_head->base_address = curr_address;
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		shmseg_block   *block = &cs_shmseg_head->blocks[i];
+
+		block->address = curr_address;
+		dlist_push_tail(&cs_shmseg_head->free_list, &block->chain);
+
+		curr_address += shmseg_blocksize;
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							256,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   shmseg_num_blocks * shmseg_blocksize);
+	RequestAddinLWLocks(ccache_hash_size);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..6da6b4a
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,668 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/trigger.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/var.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+
+static bool cache_scan_disabled;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	int				total_width = 0;
+	int				tuple_width = 0;
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Estimation of average width of cached tuples - it does not make
+	 * sense to construct a new cache if its average width is more than
+	 * 30% of the raw data.
+	 */
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		Form_pg_attribute attr = tupdesc->attrs[i];
+		int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+		int		width;
+
+		if (attr->attlen > 0)
+			width = attr->attlen;
+		else
+			width = get_attavgwidth(tableoid, attr->attnum);
+
+		total_width += width;
+		if (bms_is_member(attidx, *attrs_used))
+			tuple_width += width;
+	}
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		if ((double)tuple_width / (double)total_width > 0.3)
+			return false;
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+static bool
+cs_relation_has_invalidator(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_invalidator = false;
+	bool	has_on_update_invalidator = false;
+	bool	has_on_delete_invalidator = false;
+
+	/*
+	 * Cacheable tables must have invalidation trigger on UPDATE and DELETE.
+	 */
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_invalidation_trigger") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0 &&
+				(trig->tgtype & (TRIGGER_TYPE_ROW | TRIGGER_TYPE_AFTER))
+							 == (TRIGGER_TYPE_ROW | TRIGGER_TYPE_AFTER))
+			{
+				if ((trig->tgtype & TRIGGER_TYPE_INSERT) != 0)
+					has_on_insert_invalidator = true;
+				if ((trig->tgtype & TRIGGER_TYPE_UPDATE) != 0)
+					has_on_update_invalidator = true;
+				if ((trig->tgtype & TRIGGER_TYPE_DELETE) != 0)
+					has_on_delete_invalidator = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+	if (has_on_insert_invalidator &&
+		has_on_update_invalidator &&
+		has_on_delete_invalidator)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (cache_scan_disabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_invalidator(rel))
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		if (cs_estimate_costs(root, baserel, rel, cpath, &attrs_used))
+		{
+			cpath->custom_name = pstrdup("cache scan");
+			cpath->custom_flags = 0;
+			cpath->custom_private
+				= list_make1(makeString(bms_to_string(attrs_used)));
+
+			add_path(baserel, &cpath->path);
+		}
+	}
+	heap_close(rel, NoLock);
+}
+
+static void
+cs_init_custom_scan_plan(PlannerInfo *root,
+						 CustomScan *cscan_plan,
+						 CustomPath *cscan_path,
+						 List *tlist,
+						 List *scan_clauses)
+{
+	List	   *quals = NIL;
+	ListCell   *lc;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* extract the supplied RestrictInfo */
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst(lc);
+		quals = lappend(quals, rinfo->clause);
+	}
+
+	/* do nothing something special pushing-down */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = quals;
+	cscan_plan->custom_private = cscan_path->custom_private;
+}
+
+typedef struct
+{
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	bool			normal_seqscan;
+	bool			with_construction;
+} cs_state;
+
+static void
+cs_begin_custom_scan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Relation		rel = node->ss.ss_currentRelation;
+	EState		   *estate = node->ss.ps.state;
+	HeapScanDesc	scandesc = NULL;
+	cs_state	   *csstate;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	csstate = palloc0(sizeof(cs_state));
+
+	attrs_used = bms_from_string(strVal(linitial(cscan->custom_private)));
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(ccache->lock, LW_SHARED);
+		if (ccache->status != CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(ccache->lock);
+			LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	node->ss.ss_currentScanDesc = scandesc;
+
+	node->custom_state = csstate;
+}
+
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+	Relation		rel = node->ss.ss_currentRelation;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	if (csstate->normal_seqscan)
+	{
+		tuple = heap_getnext(scan, estate->es_direction);
+		if (HeapTupleIsValid(tuple))
+			ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+		else
+			ExecClearTuple(slot);
+		return slot;
+	}
+
+	do {
+		if (csstate->ccache)
+		{
+			ccache_head	   *ccache = csstate->ccache;
+
+			if (csstate->with_construction)
+			{
+				tuple = heap_getnext(scan, estate->es_direction);
+
+				LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+				if (HeapTupleIsValid(tuple))
+				{
+					if (ccache_insert_tuple(ccache, rel, tuple))
+						LWLockRelease(ccache->lock);
+					else
+					{
+						LWLockRelease(ccache->lock);
+						cs_put_ccache(ccache);
+						cs_put_ccache(ccache);
+						csstate->ccache = NULL;
+					}
+				}
+				else
+				{
+					ccache->status = CCACHE_STATUS_CONSTRUCTED;
+					LWLockRelease(ccache->lock);
+				}
+				buffer = scan->rs_cbuf;
+			}
+			else
+			{
+				LWLockAcquire(ccache->lock, LW_SHARED);
+				tuple = ccache_find_tuple(ccache->root_chunk,
+										  &csstate->curr_ctid,
+										  estate->es_direction);
+				if (HeapTupleIsValid(tuple))
+				{
+					ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+					tuple = heap_copytuple(tuple);
+				}
+				LWLockRelease(ccache->lock);
+				buffer = InvalidBuffer;
+			}
+		}
+		else
+		{
+			Assert(scan != NULL);
+			tuple = heap_getnext(scan, estate->es_direction);
+			buffer = scan->rs_cbuf;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_scan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_scan(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+
+	if (csstate->ccache)
+		cs_put_ccache(csstate->ccache);
+	if (node->ss.ss_currentScanDesc)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+static void
+cs_rescan_custom_scan(CustomScanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+Datum
+cache_scan_invalidation_trigger(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	TriggerEvent	tg_event = trigdata->tg_event;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	if (!TRIGGER_FIRED_AFTER(tg_event) ||
+		!TRIGGER_FIRED_FOR_ROW(tg_event))
+		elog(ERROR, "%s: not fired by AFTER FOR EACH ROW event", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		if (TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else
+			elog(ERROR, "%s: fired by unsupported event", tg_name);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_invalidation_trigger);
+
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+
+		ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(ccache->lock);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.disabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_disabled,
+							 false,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	memset(&provider, 0, sizeof(provider));
+	strncpy(provider.name, "cache scan", sizeof(provider.name));
+	provider.InitCustomScanPlan	= cs_init_custom_scan_plan;
+	provider.BeginCustomScan	= cs_begin_custom_scan;
+	provider.ExecCustomScan		= cs_exec_custom_scan;
+	provider.EndCustomScan		= cs_end_custom_scan;
+	provider.ReScanCustomScan	= cs_rescan_custom_scan;
+
+	register_custom_provider(&provider);
+}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 27cbac8..1fb5f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,6 +42,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bfdadc3..9775aad 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 					TransactionId OldestXmin);
 extern int heap_page_prune(Relation relation, Buffer buffer,
#2KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Kohei KaiGai (#1)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Hello,

I revisited the patch for contrib/cache_scan extension.
The previous one had a problem when T-tree node shall be rebalanced
then crashed on merging the node.

Even though contrib/cache_scan portion has more than 2KL code,
things I'd like to have a discussion first is a portion of the
core enhancements to run MVCCsnapshot on the cached tuple, and
to get callback on vacuumed pages for cache synchronization.

Any comments please.

Thanks,

(2014/01/15 0:06), Kohei KaiGai wrote:

Hello,

The attached patch is what we discussed just before the commit-fest:Nov.

It implements an alternative way to scan a particular table using on-memory
cache instead of the usual heap access method. Unlike buffer cache, this
mechanism caches a limited number of columns on the memory, so memory
consumption per tuple is much smaller than the regular heap access method,
thus it allows much larger number of tuples on the memory.

I'd like to extend this idea to implement a feature to cache data according to
column-oriented data structure to utilize parallel calculation processors like
CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to
evaluate multiple records with a single vector instruction if contents of
a particular column is put as a large array.)
However, this patch still keeps all the tuples in row-oriented data format,
because row <=> column translation makes this patch bigger than the
current form (about 2KL), and GPU integration needs to link proprietary
library (cuda or opencl) thus I thought it is not preferable for the upstream
code.

Also note that this patch needs part-1 ~ part-3 patches of CustomScan
APIs as prerequisites because it is implemented on top of the APIs.

One thing I have to apologize is, lack of documentation and source code
comments around the contrib/ code. Please give me a couple of days to
clean-up the code.
Aside from the extension code, I put two enhancement on the core code
as follows. I'd like to have a discussion about adequacy of these enhancement.

The first enhancement is a hook on heap_page_prune() to synchronize
internal state of extension with changes of heap image on the disk.
It is not avoidable to hold garbage, increasing time by time, on the cache,
thus needs to clean up as vacuum process doing. The best timing to do
is when dead tuples are reclaimed because it is certain nobody will
reference the tuples any more.

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
bool        marked[MaxHeapTuplesPerPage + 1];
} PruneState;
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
/* Local functions */
static int heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, Transacti
onId OldestXmin,
* and update FSM with the remaining space.
*/
+   /*
+    * This callback allows extensions to synchronize their own status with
+    * heap image on the disk, when this buffer page is vacuumed.
+    */
+   if (heap_page_prune_hook)
+       (*heap_page_prune_hook)(relation,
+                               buffer,
+                               ndeleted,
+                               OldestXmin,
+                               prstate.latestRemovedXid);
return ndeleted;
}

The second enhancement makes SetHintBits() accepts InvalidBuffer to
ignore all the jobs. We need to check visibility of cached tuples when
custom-scan node scans cached table instead of the heap.
Even though we can use MVCC snapshot to check tuple's visibility,
it may internally set hint bit of tuples thus we always needs to give
a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately,
it kills all the benefit of table cache if it takes to load the heap buffer
being associated with the cached tuple.
So, I'd like to have a special case handling on the SetHintBits() for
dry-run when InvalidBuffer is given.

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid,
Snapshot snapshot);
*
* The caller should pass xid as the XID of the transaction to check, or
* InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
*/
static inline void
SetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
+   if (BufferIsInvalid(buffer))
+       return;
+
if (TransactionIdIsValid(xid))
{
/* NB: xid must be known committed here! */

Thanks,

2013/11/13 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

So, are you thinking it is a feasible approach to focus on custom-scan
APIs during the upcoming CF3, then table-caching feature as use-case
of this APIs on CF4?

Sure. If you work on this extension after CF3, and it reveals that the
custom scan stuff needs some adjustments, there would be time to do that
in CF4. The policy about what can be submitted in CF4 is that we don't
want new major features that no one has seen before, not that you can't
make fixes to previously submitted stuff. Something like a new hook
in vacuum wouldn't be a "major feature", anyway.

Thanks for this clarification.
3 days are too short to write a patch, however, 2 month may be sufficient
to develop a feature on top of the scheme being discussed in the previous
comitfest.

Best regards,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.4-custom-scan.part-4.v5.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-4.v5.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    3 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   68 +
 contrib/cache_scan/ccache.c                        | 1410 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         |  761 +++++++++++
 doc/src/sgml/cache-scan.sgml                       |  224 ++++
 doc/src/sgml/contrib.sgml                          |    1 +
 doc/src/sgml/custom-scan.sgml                      |   14 +
 doc/src/sgml/filelist.sgml                         |    1 +
 src/backend/access/heap/pruneheap.c                |   13 +
 src/backend/utils/time/tqual.c                     |    7 +
 src/include/access/heapam.h                        |    7 +
 14 files changed, 2559 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..4e68b68
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/dbcache/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..4bd04d1
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_synchronizer()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..718a2de
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,3 @@
+DROP FUNCTION public.cache_scan_synchronizer() CASCADE;
+DROP FUNCTION public.cache_scan_debuginfo() CASCADE;
+DROP TYPE public.__cache_scan_debuginfo;
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..d06156e
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,68 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLockId		lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern void ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_synchronizer(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..0bb9ff4
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1410 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	volatile int	lwlocks_usage;
+	LWLockId	   *lwlocks;
+	dlist_head	   *slots;
+} ccache_hash;
+
+/*
+ * Data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory, then it shall be split into the fixed-length memory blocks.
+ * All the memory allocation and relase are done by block, to avoid memory
+ * fragmentation that eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Its blocks[] array contains shmseg_block structures that points a particular
+ * address of the associated memory block.
+ * The shmseg_block being chained in the free_list of shmseg_head are available
+ * to allocate. Elsewhere, this block is already allocated on somewhere.
+ */
+typedef struct {
+	dlist_node		chain;
+	Size			address;
+} shmseg_block;
+
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	shmseg_block	blocks[FLEXIBLE_ARRAY_MEMBER];	
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define AssertIfNotShmem(addr)										\
+	Assert((addr) == NULL ||										\
+		   (((Size)(addr)) >= cs_shmseg_head->base_address &&		\
+			((Size)(addr)) < (cs_shmseg_head->base_address +		\
+							  shmseg_num_blocks * shmseg_blocksize)))
+
+#define TTREE_DEPTH(chunk)	\
+	((chunk) == 0 ? 0 : Max((chunk)->l_depth, (chunk)->r_depth) + 1)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		cchunk->upper = rchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = TTREE_DEPTH(rchunk->left);
+		rchunk->upper = upper;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = TTREE_DEPTH(rchunk);
+		}
+		else
+		{
+			upper->right = rchunk;
+			upper->r_depth = TTREE_DEPTH(rchunk);
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(rchunk->left);
+		AssertIfNotShmem(rchunk->right);
+		AssertIfNotShmem(rchunk->upper);
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		cchunk->upper = lchunk;
+
+		lchunk->right = cchunk;
+		lchunk->l_depth = TTREE_DEPTH(lchunk->right);
+		lchunk->upper = upper;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			upper->left = lchunk;
+			upper->l_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(lchunk->left);
+		AssertIfNotShmem(lchunk->right);
+		AssertIfNotShmem(lchunk->upper);
+	}
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups + 1]))
+
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	int			i, required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);
+
+	Assert(required <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		HeapTuple	movtup = cchunk->tuples[i_min];
+		Size		movlen = HEAPTUPLESIZE + MAXALIGN(movtup->t_len);
+		char	   *destaddr = (char *)movtup + movlen - required;
+
+		Assert(ItemPointerCompare(&tuple->t_self, &movtup->t_self) < 0);
+
+		memmove((char *)cchunk + cchunk->usage - required,
+				(char *)cchunk + cchunk->usage,
+				((Size)movtup + movlen) - ((Size)cchunk + cchunk->usage));
+		for (i=cchunk->ntups; i > i_min; i--)
+		{
+			HeapTuple	temp;
+
+			temp = (HeapTuple)((char *)cchunk->tuples[i-1] - required);
+			cchunk->tuples[i] = temp;
+			temp->t_data = (HeapTupleHeader)((char *)temp->t_data - required);
+		}
+		cchunk->tuples[i_min] = newtup = (HeapTuple)destaddr;
+		memcpy(newtup, tuple, HEAPTUPLESIZE);
+		newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+		memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+		cchunk->usage -= required;
+		cchunk->ntups++;
+
+		Assert(cchunk->usage >= offsetof(ccache_chunk, tuples[cchunk->ntups]));
+	}
+	else
+	{
+		cchunk->usage -= required;
+		newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+		memcpy(newtup, tuple, HEAPTUPLESIZE);
+		newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+		memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+		cchunk->tuples[i_min] = newtup;
+		cchunk->ntups++;
+
+		Assert(cchunk->usage >= offsetof(ccache_chunk, tuples[cchunk->ntups]));
+	}
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+
+		return true;
+	}
+
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left && required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+			cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right && required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		}
+	}
+	else
+	{
+		if (required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->usage += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	int			i, j;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	return ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion. 
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_chunk
+ *
+ * It merges two chunks if these have enough free space to consolidate
+ * its contents into one.
+ */
+static void
+ccache_merge_chunk(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	ccache_chunk   *curr;
+	ccache_chunk  **upper;
+	int			   *p_depth;
+	int				i;
+	bool			needs_rebalance = false;
+
+	/* find the least right node that has no left node */
+	upper = &cchunk->right;
+	p_depth = &cchunk->r_depth;
+	curr = cchunk->right;
+	while (curr != NULL)
+	{
+		if (!curr->left)
+		{
+			Size	shift = shmseg_blocksize - curr->usage;
+			long	total_usage = cchunk->usage - shift;
+			int		total_ntups = cchunk->ntups + curr->ntups;
+
+			if ((long)offsetof(ccache_chunk, tuples[total_ntups]) < total_usage)
+			{
+				ccache_chunk   *rchunk = curr->right;
+
+				/* merge contents */
+				for (i=0; i < curr->ntups; i++)
+				{
+					HeapTuple	oldtup = curr->tuples[i];
+					HeapTuple	newtup;
+
+					cchunk->usage -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+					newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+					memcpy(newtup, oldtup, HEAPTUPLESIZE);
+					newtup->t_data
+						= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+					memcpy(newtup->t_data, oldtup->t_data,
+						   MAXALIGN(oldtup->t_len));
+
+					cchunk->tuples[cchunk->ntups++] = newtup;
+				}
+
+				/* detach the current chunk */
+				*upper = curr->right;
+				*p_depth = curr->r_depth;
+				if (rchunk)
+					rchunk->upper = curr->upper;
+
+				/* release it */
+				cs_free_shmblock(curr);
+				needs_rebalance = true;
+			}
+			break;
+		}
+		upper = &curr->left;
+		p_depth = &curr->l_depth;
+		curr = curr->left;
+	}
+
+	/* find the greatest left node that has no right node */
+	upper = &cchunk->left;
+	p_depth = &cchunk->l_depth;
+	curr = cchunk->left;
+
+	while (curr != NULL)
+	{
+		if (!curr->right)
+		{
+			Size	shift = shmseg_blocksize - curr->usage;
+			long	total_usage = cchunk->usage - shift;
+			int		total_ntups = cchunk->ntups + curr->ntups;
+
+			if ((long)offsetof(ccache_chunk, tuples[total_ntups]) < total_usage)
+			{
+				ccache_chunk   *lchunk = curr->left;
+				Size			offset;
+
+				/* merge contents */
+				memmove((char *)cchunk + cchunk->usage - shift,
+						(char *)cchunk + cchunk->usage,
+						shmseg_blocksize - cchunk->usage);
+				for (i=cchunk->ntups - 1; i >= 0; i--)
+				{
+					HeapTuple	temp
+						= (HeapTuple)((char *)cchunk->tuples[i] - shift);
+
+					cchunk->tuples[curr->ntups + i] = temp;
+					temp->t_data = (HeapTupleHeader)((char *)temp +
+													 HEAPTUPLESIZE);
+				}
+				cchunk->usage -= shift;
+				cchunk->ntups += curr->ntups;
+
+				/* merge contents */
+				offset = shmseg_blocksize;
+				for (i=0; i < curr->ntups; i++)
+				{
+					HeapTuple	oldtup = curr->tuples[i];
+					HeapTuple	newtup;
+
+					offset -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+					newtup = (HeapTuple)((char *)cchunk + offset);
+					memcpy(newtup, oldtup, HEAPTUPLESIZE);
+					newtup->t_data
+						= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+					memcpy(newtup->t_data, oldtup->t_data,
+						   MAXALIGN(oldtup->t_len));
+					cchunk->tuples[i] = newtup;
+				}
+
+				/* detach the current chunk */
+				*upper = curr->left;
+				*p_depth = curr->l_depth;
+				if (lchunk)
+					lchunk->upper = curr->upper;
+				/* release it */
+				cs_free_shmblock(curr);
+				needs_rebalance = true;
+			}
+			break;
+		}
+		upper = &curr->right;
+		p_depth = &curr->r_depth;
+		curr = curr->right;
+	}
+	/* Rebalance the tree, if needed */
+	if (needs_rebalance)
+		ccache_rebalance_tree(ccache, cchunk);
+}
+
+/*
+ * ccache_vacuum_page
+ *
+ * It reclaims the tuples being already vacuumed. It shall be kicked on
+ * the callback function of heap_page_prune_hook to synchronize contents
+ * of the cache with on-disk image.
+ */
+static void
+ccache_vacuum_tuple(ccache_head *ccache,
+					ccache_chunk *cchunk,
+					ItemPointer ctid)
+{
+	ItemPointer	min_ctid;
+	ItemPointer	max_ctid;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+
+	if (cchunk->ntups == 0)
+		return;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (cchunk->left)
+			ccache_vacuum_tuple(ccache, cchunk->left, ctid);
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (cchunk->right)
+			ccache_vacuum_tuple(ccache, cchunk->right, ctid);
+	}
+	else
+	{
+		while (i_min < i_max)
+		{
+			int	i_mid = (i_min + i_max) / 2;
+
+			if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+				i_max = i_mid;
+			else
+				i_min = i_mid + 1;
+		}
+		Assert(i_min == i_max);
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+		{
+			HeapTuple	tuple = cchunk->tuples[i_min];
+			int			length = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+
+			if (i_min < cchunk->ntups - 1)
+			{
+				int		j;
+
+				memmove((char *)cchunk + cchunk->usage + length,
+						(char *)cchunk + cchunk->usage,
+						(Size)tuple - ((Size)cchunk + cchunk->usage));
+				for (j=i_min + 1; j < cchunk->ntups; j++)
+				{
+					HeapTuple	temp;
+
+					temp = (HeapTuple)((char *)cchunk->tuples[j] + length);
+					cchunk->tuples[j-1] = temp;
+					temp->t_data
+						= (HeapTupleHeader)((char *)temp->t_data + length);
+				}
+			}
+			cchunk->usage += length;
+			cchunk->ntups--;
+		}
+	}
+	/* merge chunks if this chunk has enough space to merge */
+	ccache_merge_chunk(ccache, cchunk);
+}
+
+void
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* XXX it needs buffer is valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	Page			page = BufferGetPage(buffer);
+	OffsetNumber	maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemPointerData	ctid;
+		ItemId			itemid = PageGetItemId(page, offnum);
+
+		if (ItemIdIsNormal(itemid))
+			continue;
+
+		ItemPointerSetBlockNumber(&ctid, blknum);
+		ItemPointerSetOffsetNumber(&ctid, offnum);
+
+		ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+	}
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		int		i;
+
+		PG_TRY();
+		{
+			for (i=0; i < 20; i++)
+			{
+				entry = MemoryContextAlloc(TopMemoryContext,
+										   sizeof(ccache_entry));
+				dlist_push_tail(&ccache_free_list, &entry->chain);
+			}
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+static void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+	int				i;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	i = cs_ccache_hash->lwlocks_usage++ % ccache_hash_size;
+	new_cache->lock = cs_ccache_hash->lwlocks[i];
+	new_cache->refcnt = 2;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			if (old_cache)
+				attrs_used = bms_union(attrs_used, &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	shmseg_block   *block;
+	void		   *address = NULL;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			cs_put_ccache_nolock(ccache);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+		block = dlist_container(shmseg_block, chain, dnode);
+
+		memset(&block->chain, 0, sizeof(dlist_node));
+
+		address = (void *) block->address;
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size	curr = (Size) address;
+	Size	base = cs_shmseg_head->base_address;
+	ulong	index;
+	shmseg_block *block;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	block = &cs_shmseg_head->blocks[index];
+
+	dlist_push_head(&cs_shmseg_head->free_list, &block->chain);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	Size	curr_address;
+	ulong	i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash = ShmemInitStruct("cache_scan: hash of columnar cache",
+									 MAXALIGN(sizeof(ccache_hash)) +
+									 MAXALIGN(sizeof(LWLockId) *
+											  ccache_hash_size) +
+									 MAXALIGN(sizeof(dlist_node) *
+											  ccache_hash_size),
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	cs_ccache_hash->lwlocks = (void *)(&cs_ccache_hash[1]);
+	cs_ccache_hash->slots
+		= (void *)(&cs_ccache_hash->lwlocks[ccache_hash_size]);
+
+	for (i=0; i < ccache_hash_size; i++)
+		cs_ccache_hash->lwlocks[i] = LWLockAssign();
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 shmseg_num_blocks * shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+
+	curr_address = MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+
+	cs_shmseg_head->base_address = curr_address;
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		shmseg_block   *block = &cs_shmseg_head->blocks[i];
+
+		block->address = curr_address;
+		dlist_push_tail(&cs_shmseg_head->free_list, &block->chain);
+
+		curr_address += shmseg_blocksize;
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							256,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   shmseg_num_blocks * shmseg_blocksize);
+	RequestAddinLWLocks(ccache_hash_size);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..0a63c2e
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,761 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/trigger.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/var.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+
+static bool cache_scan_disabled;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	int				total_width = 0;
+	int				tuple_width = 0;
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Estimation of average width of cached tuples - it does not make
+	 * sense to construct a new cache if its average width is more than
+	 * 30% of the raw data.
+	 */
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		Form_pg_attribute attr = tupdesc->attrs[i];
+		int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+		int		width;
+
+		if (attr->attlen > 0)
+			width = attr->attlen;
+		else
+			width = get_attavgwidth(tableoid, attr->attnum);
+
+		total_width += width;
+		if (bms_is_member(attidx, *attrs_used))
+			tuple_width += width;
+	}
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		if ((double)tuple_width / (double)total_width > 0.3)
+			return false;
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+/*
+ * cs_relation_has_synchronizer
+ *
+ * A table that can have columner-cache also needs to have trigger for
+ * synchronization, to ensure the on-memory cache keeps the latest contents
+ * of the heap. It returns TRUE, if supplied relation has triggers that
+ * invokes cache_scan_synchronizer on appropriate context. Elsewhere, FALSE
+ * shall be returned.
+ */
+static bool
+cs_relation_has_synchronizer(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_synchronizer = false;
+	bool	has_on_update_synchronizer = false;
+	bool	has_on_delete_synchronizer = false;
+	bool	has_on_truncate_synchronizer = false;
+
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_synchronizer") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0)
+			{
+				int16		tgtype = trig->tgtype;
+
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_INSERT))
+					has_on_insert_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_UPDATE))
+					has_on_update_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_DELETE))
+					has_on_delete_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_STATEMENT,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_TRUNCATE))
+					has_on_truncate_synchronizer = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+
+	if (has_on_insert_synchronizer &&
+		has_on_update_synchronizer &&
+		has_on_delete_synchronizer &&
+		has_on_truncate_synchronizer)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (cache_scan_disabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_synchronizer(rel))
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		if (cs_estimate_costs(root, baserel, rel, cpath, &attrs_used))
+		{
+			cpath->custom_name = pstrdup("cache scan");
+			cpath->custom_flags = 0;
+			cpath->custom_private
+				= list_make1(makeString(bms_to_string(attrs_used)));
+
+			add_path(baserel, &cpath->path);
+		}
+	}
+	heap_close(rel, NoLock);
+}
+
+static void
+cs_init_custom_scan_plan(PlannerInfo *root,
+						 CustomScan *cscan_plan,
+						 CustomPath *cscan_path,
+						 List *tlist,
+						 List *scan_clauses)
+{
+	List	   *quals = NIL;
+	ListCell   *lc;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* extract the supplied RestrictInfo */
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst(lc);
+		quals = lappend(quals, rinfo->clause);
+	}
+
+	/* do nothing something special pushing-down */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = quals;
+	cscan_plan->custom_private = cscan_path->custom_private;
+}
+
+typedef struct
+{
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	bool			normal_seqscan;
+	bool			with_construction;
+} cs_state;
+
+static void
+cs_begin_custom_scan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Relation		rel = node->ss.ss_currentRelation;
+	EState		   *estate = node->ss.ps.state;
+	HeapScanDesc	scandesc = NULL;
+	cs_state	   *csstate;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	csstate = palloc0(sizeof(cs_state));
+
+	attrs_used = bms_from_string(strVal(linitial(cscan->custom_private)));
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(ccache->lock, LW_SHARED);
+		if (ccache->status != CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(ccache->lock);
+			LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	node->ss.ss_currentScanDesc = scandesc;
+
+	node->custom_state = csstate;
+}
+
+/*
+ * cache_scan_needs_next
+ *
+ * We may fetch a tuple to be invisible because columner cache stores
+ * all the living tuples, including ones updated / deleted by concurrent
+ * sessions. So, it is a job of the caller to check MVCC visibility.
+ * It decides whether we need to move the next tuple due to the visibility
+ * condition, or not. If given tuple was NULL, it is obviously a time to
+ * break searching because it means no more tuples on the cache.
+ */
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+	Relation		rel = node->ss.ss_currentRelation;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	/* in case of fallback path, we don't need to something special. */
+	if (csstate->normal_seqscan)
+	{
+		tuple = heap_getnext(scan, estate->es_direction);
+		if (HeapTupleIsValid(tuple))
+			ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+		else
+			ExecClearTuple(slot);
+		return slot;
+	}
+	Assert(csstate->ccache != NULL);
+
+	/* elsewhere, we either run or construct the columner cache */
+	do {
+		ccache_head	   *ccache = csstate->ccache;
+
+		/*
+		 * "with_construction" means the columner cache is under construction,
+		 * so we need to fetch a tuple from heap of the target relation and
+		 * insert it into the cache. Note that we use SnapshotAny to fetch
+		 * all the tuples both of visible and invisible ones, so it is our
+		 * responsibility to check tuple visibility according to snapshot or
+		 * the current estate.
+		 * It is same even when we fetch tuples from the cache, without
+		 * referencing heap buffer.
+		 */
+		if (csstate->with_construction)
+		{
+			tuple = heap_getnext(scan, estate->es_direction);
+
+			LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+			if (HeapTupleIsValid(tuple))
+			{
+				if (ccache_insert_tuple(ccache, rel, tuple))
+					LWLockRelease(ccache->lock);
+				else
+				{
+					/*
+					 * If ccache_insert_tuple got failed, it usually means
+					 * lack of shared memory and unable to continue
+					 * construction of the columner cacher.
+					 * So, we put is twice to reset its reference counter
+					 * to zero and release shared memory blocks.
+					 */
+					LWLockRelease(ccache->lock);
+					cs_put_ccache(ccache);
+					cs_put_ccache(ccache);
+					csstate->ccache = NULL;
+				}
+			}
+			else
+			{
+				/*
+				 * If we reached end of the relation, it means the columner-
+				 * cache become constructed.
+				 */
+				ccache->status = CCACHE_STATUS_CONSTRUCTED;
+				LWLockRelease(ccache->lock);
+			}
+			buffer = scan->rs_cbuf;
+		}
+		else
+		{
+			LWLockAcquire(ccache->lock, LW_SHARED);
+			tuple = ccache_find_tuple(ccache->root_chunk,
+									  &csstate->curr_ctid,
+									  estate->es_direction);
+			if (HeapTupleIsValid(tuple))
+			{
+				ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+				tuple = heap_copytuple(tuple);
+			}
+			LWLockRelease(ccache->lock);
+			buffer = InvalidBuffer;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_scan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_scan(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+
+	if (csstate->ccache)
+	{
+		ccache_head	   *ccache = csstate->ccache;
+		bool			needs_remove = false;
+
+		LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+		if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			needs_remove = true;
+		LWLockRelease(ccache->lock);
+		cs_put_ccache(ccache);
+		if (needs_remove)
+			cs_put_ccache(ccache);
+	}
+	if (node->ss.ss_currentScanDesc)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+static void
+cs_rescan_custom_scan(CustomScanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+/*
+ * cache_scan_synchronizer
+ *
+ * trigger function to synchronize the columner-cache with heap contents.
+ */
+Datum
+cache_scan_synchronizer(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		TriggerEvent	tg_event = trigdata->tg_event;
+
+		if (TRIGGER_FIRED_AFTER(tg_event) &&
+			TRIGGER_FIRED_FOR_ROW(tg_event) &&
+			TRIGGER_FIRED_BY_INSERT(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+				 TRIGGER_FIRED_BY_UPDATE(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+                 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+                 TRIGGER_FIRED_BY_DELETE(tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_STATEMENT(tg_event) &&
+				 TRIGGER_FIRED_BY_TRUNCATE(tg_event))
+		{
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache(ccache);
+		}
+		else
+			elog(ERROR, "%s: fired by unexpected context (%08x)",
+				 tg_name, tg_event);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_synchronizer);
+
+/*
+ * ccache_on_object_access
+ *
+ * It dropps an existing columner-cache if the cached table was altered or
+ * dropped.
+ */
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	if (!ccache)
+		return;
+
+	LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+/*
+ * ccache_on_page_prune
+ *
+ * It is a callback function when a particular heap block got vacuumed.
+ * On vacuuming, its dead space, being allocated by dead tuples, got
+ * reclaimed and tuple's location was ought to be moved.
+ * This routine also reclaims the space by dead tuples on the columner
+ * cache according to layout changes on the heap.
+ */
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+
+		ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(ccache->lock);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.disabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_disabled,
+							 false,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	memset(&provider, 0, sizeof(provider));
+	strncpy(provider.name, "cache scan", sizeof(provider.name));
+	provider.InitCustomScanPlan	= cs_init_custom_scan_plan;
+	provider.BeginCustomScan	= cs_begin_custom_scan;
+	provider.ExecCustomScan		= cs_exec_custom_scan;
+	provider.EndCustomScan		= cs_end_custom_scan;
+	provider.ReScanCustomScan	= cs_rescan_custom_scan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/cache-scan.sgml b/doc/src/sgml/cache-scan.sgml
new file mode 100644
index 0000000..c4cc165
--- /dev/null
+++ b/doc/src/sgml/cache-scan.sgml
@@ -0,0 +1,224 @@
+<!-- doc/src/sgml/cache-scan.sgml -->
+
+<sect1 id="cache-scan" xreflabel="cache-scan">
+ <title>cache-scan</title>
+
+ <indexterm zone="cache-scan">
+  <primary>cache-scan</primary>
+ </indexterm>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   The <filename>cache-scan</> module provides an alternative way to scan
+   relations using on-memory columner cache, instead of usual heap scan,
+   in case when previous scan already holds contents of the table on the
+   cache.
+   Unlike buffer cache, it holds contents of the limited number of columns,
+   but not whole of the record, thus it allows to hold larger number of records
+   per same amount of RAM. Probably, this characteristic makes sense to run
+   analytic queries on a table with many columns and records.
+  </para>
+  <para>
+   Once this module gets loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   on-memory columner cache, instead of regular heap scan.
+   It also performs as a proof-of-concept implementation that works on
+   the custom-scan API that enables to extend the core executor system.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Installation</title>
+  <para>
+   This module has to be loaded using
+   <xref linkend="guc-shared-preload-libraries"> parameter to acquired
+   a particular amount of shared memory on startup time.
+   In addition, the relation to be cached has special triggers, called
+   synchronizer, are implemented with <literal>cache_scan_synchronizer</>
+   function that synchronizes the cache contents according to the latest
+   heap on <command>INSERT</>, <command>UPDATE</>, <command>DELETE</> or
+   <command>TRUNCATE</>.
+  </para>
+  <para>
+   You can run this extension according to the following steps.
+  </para>
+  <procedure>
+   <step>
+    <para>
+     Adjust <xref linkend="guc-shared-preload-libraries"> parameter to
+     load <filename>cache_scan</> binary on startup time, then restart
+     the postmaster.
+    </para>
+   </step>
+   <step>
+    <para>
+     Run <xref linkend="sql-createextension"> to create synchronizer
+     function of <filename>cache_scan</>.
+<programlisting>
+CREATE EXTENSION cache_scan;
+</programlisting>
+    </para>
+   </step>
+   <step>
+    <para>
+     Create triggers of synchronizer on the target relation.
+<programlisting>
+CREATE TRIGGER t1_cache_row_sync
+    AFTER INSERT OR UPDATE OR DELETE ON t1 FOR ROW
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+CREATE TRIGGER t1_cache_stmt_sync
+    AFTER TRUNCATE ON t1 FOR STATEMENT
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+</programlisting>
+    </para>
+   </step>
+  </procedure>
+ </sect2>
+
+ <sect2>
+  <title>How does it works</title>
+  <para>
+   This module performs according to the usual fashion of
+   <xref linkend="custom-scan">.
+   It offers an alternative way to scan a relation if relation has synchronizer
+   triggers and width of referenced columns are less than 30% of average
+   record width.
+   Then, query optimizer will pick up the cheapest path. If the path chosen
+   is a custom-scan path managed by <filename>cache_scan</>, it runs on the
+   target relation using columner cache.
+   On the first time running, it tries to construct relation's cache along
+   with regular sequential scan. Next time or later, it can run on
+   the columner cache without referencing the heap.
+  </para>
+  <para>
+   You can check whether the query plan uses <filename>cache_scan</> using
+   <xref linkend="sql-explain"> command, as follows:
+<programlisting>
+postgres=# EXPLAIN (costs off) SELECT a,b FROM t1 WHERE b < pi();
+                     QUERY PLAN
+----------------------------------------------------
+ Custom Scan (cache scan) on t1
+   Filter: (b < 3.14159265358979::double precision)
+(2 rows)
+</programlisting>
+  </para>
+  <para>
+   A columner cache, associated with a particular relation, has one or more chunks
+   that performs as node or leaf of t-tree structure.
+   The <literal>cache_scan_debuginfo()</> function can dump useful informationl;
+   properties of all the active chunks as follows.
+<programlisting>
+postgres=# SELECT * FROM cache_scan_debuginfo();
+ tableoid |   status    |     chunk      |     upper      | l_depth |    l_chunk     | r_depth |    r_chunk     | ntuples |  usage  | min_ctid  | max_ct
+id
+----------+-------------+----------------+----------------+---------+----------------+---------+----------------+---------+---------+-----------+-----------
+    16400 | constructed | 0x7f2b8ad84740 | 0x7f2b8af84740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (0,1)     | (677,15)
+    16400 | constructed | 0x7f2b8af84740 | (nil)          |       1 | 0x7f2b8ad84740 |       2 | 0x7f2b8b384740 |   29126 |  233088 | (677,16)  | (1354,30)
+    16400 | constructed | 0x7f2b8b184740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (1354,31) | (2032,2)
+    16400 | constructed | 0x7f2b8b384740 | 0x7f2b8af84740 |       1 | 0x7f2b8b184740 |       1 | 0x7f2b8b584740 |   29126 |  233088 | (2032,3)  | (2709,33)
+    16400 | constructed | 0x7f2b8b584740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |    3478 | 1874560 | (2709,34) | (2790,28)
+(5 rows)
+</programlisting>
+  </para>
+  <para>
+   All the cached tuples are indexed with <literal>ctid</> order, and each chunk has
+   an array of partial tuples with min- and max- values. Its left node is linked to
+   the chunks that have tuples with smaller <literal>ctid</>, and its right node is
+   linked to the chunks that have larger ones.
+   It enables to find out tuples in timely fashion when it needs to be invalidated
+   according to heap updates by DDL, DML or vacuuming.
+  </para>
+  <para>
+   The columner cache are not owned by a particular session, so it retains the cache
+   unless it does not dropped or postmaster does not restart.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>GUC Parameters</title>
+  <variablelist>
+   <varlistentry id="guc-cache-scan-block_size" xreflabel="cache_scan.block_size">
+    <term><varname>cache_scan.block_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.block_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls length of the block on shared memory segment
+      for the columner-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columner cache on demand.
+      Too large block size damages flexibility of memory assignment, and
+      too small block size consumes much management are for each block.
+      So, we recommend to keep is as the default value; that is 2MB per block.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-num_blocks" xreflabel="cache_scan.num_blocks">
+    <term><varname>cache_scan.num_blocks</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.num_blocks</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls number of the block on shared memory segment
+      for the columner-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columner cache on demand.
+      Too small number of blocks damages flexibility of memory assignment
+      and may cause undesired cache dropping.
+      So, we recommend to set enough number of blocks to keep contents of
+      the target relations on memory.
+      Its default is <literal>64</literal>; probably too small for most of
+      real use cases.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-hash_size" xreflabel="cache_scan.hash_size">
+    <term><varname>cache_scan.hash_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.hash_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls width of the internal hash table slots; that
+      link every columnar cache distributed by table's oid.
+      Its default is <literal>128</>; no need to adjust it usually.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-max_cached_attnum" xreflabel="cache_scan.max_cached_attnum">
+    <term><varname>cache_scan.max_cached_attnum</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.max_cached_attnum</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the maximum attribute number we can cache on
+      the columner cache. Because of internal data representation, a bitmap set
+      to track attributes being cached has to be fixed-length.
+      Thus, the largest attribute number needs to be fixed preliminary.
+      Its default is <literal>128</>; although most tables likely have less than
+      100 columns.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 2002f60..3d8fd05 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -107,6 +107,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &auto-explain;
  &btree-gin;
  &btree-gist;
+ &cache-scan;
  &chkpass;
  &citext;
  &ctidscan;
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index f53902d..218a5fd 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -55,6 +55,20 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term><xref linkend="cache-scan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan refering the on-memory
+      columner cache instead of the heap, if the target relation already has
+      this cache being constructed already.
+      Unlike buffer cache, it holds limited number of columns that have been
+      referenced before, but not all the columns in the table definition.
+      Thus, it allows to cache much larger number of records on-memory than
+      buffer cache.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
  </para>
  <para>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index aa2be4b..10c7666 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -103,6 +103,7 @@
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
+<!ENTITY cache-scan      SYSTEM "cache-scan.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
 <!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 27cbac8..1fb5f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,6 +42,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bfdadc3..9775aad 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 					TransactionId OldestXmin);
 extern int heap_page_prune(Relation relation, Buffer buffer,
#3Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: KaiGai Kohei (#2)
3 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Hello,

Because of time pressure in the commit-fest:Jan, I tried to simplifies the patch
for cache-only scan into three portions; (1) add a hook on heap_page_prune
for cache invalidation on vacuuming a particular page. (2) add a check to accept
InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only scan.

(1) pgsql-v9.4-heap_page_prune_hook.v1.patch
Once on-memory columnar cache is constructed, then it needs to be invalidated
if heap page on behalf of the cache is modified. In usual DML cases, extension
can get control using row-level trigger functions for invalidation,
however, we right
now have no way to get control on a page is vacuumed, usually handled by
autovacuum process.
This patch adds a callback on heap_page_prune(), to allow extensions to prune
dead entries on its cache, not only heap pages.
I'd also like to see any other scenario we need to invalidate columnar cache
entries, if exist. It seems to me object_access_hook makes sense to conver
DDL and VACUUM FULL scenario...

(2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
In case when we want to check visibility of the tuples on cache entries (thus
no particular shared buffer is associated) using HeapTupleSatisfiesVisibility,
it internally tries to update hint bits of tuples. However, it does
not make sense
onto the tuples being not associated with a particular shared buffer.
Due to its definition, tuple entries being on cache does not connected with
a particular shared buffer. If we need to load whole of the buffer page to set
hint bits, it is totally nonsense because the purpose of on-memory cache is
to reduce disk accesses.
This patch adds an exceptional condition on SetHintBits() to skip anything
if the given buffer is InvalidBuffer. It allows to check tuple
visibility using regular
visibility check functions, without re-invention of the wheel by themselves.

(3) pgsql-v9.4-contrib-cache-scan.v1.patch
Unlike (1) and (2), this patch is just a proof of the concept to
implement cache-
only scan on top of the custom-scan interface.
It tries to offer an alternative scan path on the table with row-level
triggers for
cache invalidation if total width of referenced columns are less than 30% of the
total width of table definition. Thus, it can keep larger number of records with
meaningful portion on the main memory.
This cache shall be invalidated according to the main heap update. One is
row-level trigger, second is object_access_hook on DDL, and the third is
heap_page_prune hook. Once a columns reduced tuple gets cached, it is
copied to the cache memory from the shared buffer, so it needs a feature
to ignore InvalidBuffer for visibility check functions.

Please volunteer to reviewing the patches, especially (1) and (2) that are
very small portion.

Thanks,

2014-01-21 KaiGai Kohei <kaigai@ak.jp.nec.com>:

Hello,

I revisited the patch for contrib/cache_scan extension.
The previous one had a problem when T-tree node shall be rebalanced
then crashed on merging the node.

Even though contrib/cache_scan portion has more than 2KL code,
things I'd like to have a discussion first is a portion of the
core enhancements to run MVCCsnapshot on the cached tuple, and
to get callback on vacuumed pages for cache synchronization.

Any comments please.

Thanks,

(2014/01/15 0:06), Kohei KaiGai wrote:

Hello,

The attached patch is what we discussed just before the commit-fest:Nov.

It implements an alternative way to scan a particular table using
on-memory
cache instead of the usual heap access method. Unlike buffer cache, this
mechanism caches a limited number of columns on the memory, so memory
consumption per tuple is much smaller than the regular heap access method,
thus it allows much larger number of tuples on the memory.

I'd like to extend this idea to implement a feature to cache data
according to
column-oriented data structure to utilize parallel calculation processors
like
CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to
evaluate multiple records with a single vector instruction if contents of
a particular column is put as a large array.)
However, this patch still keeps all the tuples in row-oriented data
format,
because row <=> column translation makes this patch bigger than the
current form (about 2KL), and GPU integration needs to link proprietary
library (cuda or opencl) thus I thought it is not preferable for the
upstream
code.

Also note that this patch needs part-1 ~ part-3 patches of CustomScan
APIs as prerequisites because it is implemented on top of the APIs.

One thing I have to apologize is, lack of documentation and source code
comments around the contrib/ code. Please give me a couple of days to
clean-up the code.
Aside from the extension code, I put two enhancement on the core code
as follows. I'd like to have a discussion about adequacy of these
enhancement.

The first enhancement is a hook on heap_page_prune() to synchronize
internal state of extension with changes of heap image on the disk.
It is not avoidable to hold garbage, increasing time by time, on the
cache,
thus needs to clean up as vacuum process doing. The best timing to do
is when dead tuples are reclaimed because it is certain nobody will
reference the tuples any more.

diff --git a/src/backend/utils/time/tqual.c
b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
bool        marked[MaxHeapTuplesPerPage + 1];
} PruneState;
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
/* Local functions */
static int heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer,
Transacti
onId OldestXmin,
* and update FSM with the remaining space.
*/
+   /*
+    * This callback allows extensions to synchronize their own status
with
+    * heap image on the disk, when this buffer page is vacuumed.
+    */
+   if (heap_page_prune_hook)
+       (*heap_page_prune_hook)(relation,
+                               buffer,
+                               ndeleted,
+                               OldestXmin,
+                               prstate.latestRemovedXid);
return ndeleted;
}

The second enhancement makes SetHintBits() accepts InvalidBuffer to
ignore all the jobs. We need to check visibility of cached tuples when
custom-scan node scans cached table instead of the heap.
Even though we can use MVCC snapshot to check tuple's visibility,
it may internally set hint bit of tuples thus we always needs to give
a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately,
it kills all the benefit of table cache if it takes to load the heap
buffer
being associated with the cached tuple.
So, I'd like to have a special case handling on the SetHintBits() for
dry-run when InvalidBuffer is given.

diff --git a/src/backend/utils/time/tqual.c
b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid,
Snapshot snapshot);
*
* The caller should pass xid as the XID of the transaction to check, or
* InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a
particular
+ * buffer, it just returns without any jobs. It may happen when an
extension
+ * caches tuple with their own way.
*/
static inline void
SetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
+   if (BufferIsInvalid(buffer))
+       return;
+
if (TransactionIdIsValid(xid))
{
/* NB: xid must be known committed here! */

Thanks,

2013/11/13 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

So, are you thinking it is a feasible approach to focus on custom-scan
APIs during the upcoming CF3, then table-caching feature as use-case
of this APIs on CF4?

Sure. If you work on this extension after CF3, and it reveals that the
custom scan stuff needs some adjustments, there would be time to do that
in CF4. The policy about what can be submitted in CF4 is that we don't
want new major features that no one has seen before, not that you can't
make fixes to previously submitted stuff. Something like a new hook
in vacuum wouldn't be a "major feature", anyway.

Thanks for this clarification.
3 days are too short to write a patch, however, 2 month may be sufficient
to develop a feature on top of the scheme being discussed in the previous
comitfest.

Best regards,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-contrib-cache-scan.v1.patchapplication/octet-stream; name=pgsql-v9.4-contrib-cache-scan.v1.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    3 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   68 +
 contrib/cache_scan/ccache.c                        | 1410 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         |  761 +++++++++++
 doc/src/sgml/cache-scan.sgml                       |  224 ++++
 doc/src/sgml/contrib.sgml                          |    1 +
 doc/src/sgml/custom-scan.sgml                      |   14 +
 doc/src/sgml/filelist.sgml                         |    1 +
 11 files changed, 2532 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..4e68b68
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/dbcache/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..4bd04d1
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_synchronizer()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..718a2de
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,3 @@
+DROP FUNCTION public.cache_scan_synchronizer() CASCADE;
+DROP FUNCTION public.cache_scan_debuginfo() CASCADE;
+DROP TYPE public.__cache_scan_debuginfo;
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..d06156e
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,68 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLockId		lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern void ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_synchronizer(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..0bb9ff4
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1410 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	volatile int	lwlocks_usage;
+	LWLockId	   *lwlocks;
+	dlist_head	   *slots;
+} ccache_hash;
+
+/*
+ * Data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory, then it shall be split into the fixed-length memory blocks.
+ * All the memory allocation and relase are done by block, to avoid memory
+ * fragmentation that eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Its blocks[] array contains shmseg_block structures that points a particular
+ * address of the associated memory block.
+ * The shmseg_block being chained in the free_list of shmseg_head are available
+ * to allocate. Elsewhere, this block is already allocated on somewhere.
+ */
+typedef struct {
+	dlist_node		chain;
+	Size			address;
+} shmseg_block;
+
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	shmseg_block	blocks[FLEXIBLE_ARRAY_MEMBER];	
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define AssertIfNotShmem(addr)										\
+	Assert((addr) == NULL ||										\
+		   (((Size)(addr)) >= cs_shmseg_head->base_address &&		\
+			((Size)(addr)) < (cs_shmseg_head->base_address +		\
+							  shmseg_num_blocks * shmseg_blocksize)))
+
+#define TTREE_DEPTH(chunk)	\
+	((chunk) == 0 ? 0 : Max((chunk)->l_depth, (chunk)->r_depth) + 1)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		cchunk->upper = rchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = TTREE_DEPTH(rchunk->left);
+		rchunk->upper = upper;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = TTREE_DEPTH(rchunk);
+		}
+		else
+		{
+			upper->right = rchunk;
+			upper->r_depth = TTREE_DEPTH(rchunk);
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(rchunk->left);
+		AssertIfNotShmem(rchunk->right);
+		AssertIfNotShmem(rchunk->upper);
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		cchunk->upper = lchunk;
+
+		lchunk->right = cchunk;
+		lchunk->l_depth = TTREE_DEPTH(lchunk->right);
+		lchunk->upper = upper;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			upper->left = lchunk;
+			upper->l_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(lchunk->left);
+		AssertIfNotShmem(lchunk->right);
+		AssertIfNotShmem(lchunk->upper);
+	}
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups + 1]))
+
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	int			i, required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);
+
+	Assert(required <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		HeapTuple	movtup = cchunk->tuples[i_min];
+		Size		movlen = HEAPTUPLESIZE + MAXALIGN(movtup->t_len);
+		char	   *destaddr = (char *)movtup + movlen - required;
+
+		Assert(ItemPointerCompare(&tuple->t_self, &movtup->t_self) < 0);
+
+		memmove((char *)cchunk + cchunk->usage - required,
+				(char *)cchunk + cchunk->usage,
+				((Size)movtup + movlen) - ((Size)cchunk + cchunk->usage));
+		for (i=cchunk->ntups; i > i_min; i--)
+		{
+			HeapTuple	temp;
+
+			temp = (HeapTuple)((char *)cchunk->tuples[i-1] - required);
+			cchunk->tuples[i] = temp;
+			temp->t_data = (HeapTupleHeader)((char *)temp->t_data - required);
+		}
+		cchunk->tuples[i_min] = newtup = (HeapTuple)destaddr;
+		memcpy(newtup, tuple, HEAPTUPLESIZE);
+		newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+		memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+		cchunk->usage -= required;
+		cchunk->ntups++;
+
+		Assert(cchunk->usage >= offsetof(ccache_chunk, tuples[cchunk->ntups]));
+	}
+	else
+	{
+		cchunk->usage -= required;
+		newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+		memcpy(newtup, tuple, HEAPTUPLESIZE);
+		newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+		memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+		cchunk->tuples[i_min] = newtup;
+		cchunk->ntups++;
+
+		Assert(cchunk->usage >= offsetof(ccache_chunk, tuples[cchunk->ntups]));
+	}
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+
+		return true;
+	}
+
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left && required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+			cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right && required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		}
+	}
+	else
+	{
+		if (required <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->usage += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	int			i, j;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	return ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion. 
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_chunk
+ *
+ * It merges two chunks if these have enough free space to consolidate
+ * its contents into one.
+ */
+static void
+ccache_merge_chunk(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	ccache_chunk   *curr;
+	ccache_chunk  **upper;
+	int			   *p_depth;
+	int				i;
+	bool			needs_rebalance = false;
+
+	/* find the least right node that has no left node */
+	upper = &cchunk->right;
+	p_depth = &cchunk->r_depth;
+	curr = cchunk->right;
+	while (curr != NULL)
+	{
+		if (!curr->left)
+		{
+			Size	shift = shmseg_blocksize - curr->usage;
+			long	total_usage = cchunk->usage - shift;
+			int		total_ntups = cchunk->ntups + curr->ntups;
+
+			if ((long)offsetof(ccache_chunk, tuples[total_ntups]) < total_usage)
+			{
+				ccache_chunk   *rchunk = curr->right;
+
+				/* merge contents */
+				for (i=0; i < curr->ntups; i++)
+				{
+					HeapTuple	oldtup = curr->tuples[i];
+					HeapTuple	newtup;
+
+					cchunk->usage -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+					newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+					memcpy(newtup, oldtup, HEAPTUPLESIZE);
+					newtup->t_data
+						= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+					memcpy(newtup->t_data, oldtup->t_data,
+						   MAXALIGN(oldtup->t_len));
+
+					cchunk->tuples[cchunk->ntups++] = newtup;
+				}
+
+				/* detach the current chunk */
+				*upper = curr->right;
+				*p_depth = curr->r_depth;
+				if (rchunk)
+					rchunk->upper = curr->upper;
+
+				/* release it */
+				cs_free_shmblock(curr);
+				needs_rebalance = true;
+			}
+			break;
+		}
+		upper = &curr->left;
+		p_depth = &curr->l_depth;
+		curr = curr->left;
+	}
+
+	/* find the greatest left node that has no right node */
+	upper = &cchunk->left;
+	p_depth = &cchunk->l_depth;
+	curr = cchunk->left;
+
+	while (curr != NULL)
+	{
+		if (!curr->right)
+		{
+			Size	shift = shmseg_blocksize - curr->usage;
+			long	total_usage = cchunk->usage - shift;
+			int		total_ntups = cchunk->ntups + curr->ntups;
+
+			if ((long)offsetof(ccache_chunk, tuples[total_ntups]) < total_usage)
+			{
+				ccache_chunk   *lchunk = curr->left;
+				Size			offset;
+
+				/* merge contents */
+				memmove((char *)cchunk + cchunk->usage - shift,
+						(char *)cchunk + cchunk->usage,
+						shmseg_blocksize - cchunk->usage);
+				for (i=cchunk->ntups - 1; i >= 0; i--)
+				{
+					HeapTuple	temp
+						= (HeapTuple)((char *)cchunk->tuples[i] - shift);
+
+					cchunk->tuples[curr->ntups + i] = temp;
+					temp->t_data = (HeapTupleHeader)((char *)temp +
+													 HEAPTUPLESIZE);
+				}
+				cchunk->usage -= shift;
+				cchunk->ntups += curr->ntups;
+
+				/* merge contents */
+				offset = shmseg_blocksize;
+				for (i=0; i < curr->ntups; i++)
+				{
+					HeapTuple	oldtup = curr->tuples[i];
+					HeapTuple	newtup;
+
+					offset -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+					newtup = (HeapTuple)((char *)cchunk + offset);
+					memcpy(newtup, oldtup, HEAPTUPLESIZE);
+					newtup->t_data
+						= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+					memcpy(newtup->t_data, oldtup->t_data,
+						   MAXALIGN(oldtup->t_len));
+					cchunk->tuples[i] = newtup;
+				}
+
+				/* detach the current chunk */
+				*upper = curr->left;
+				*p_depth = curr->l_depth;
+				if (lchunk)
+					lchunk->upper = curr->upper;
+				/* release it */
+				cs_free_shmblock(curr);
+				needs_rebalance = true;
+			}
+			break;
+		}
+		upper = &curr->right;
+		p_depth = &curr->r_depth;
+		curr = curr->right;
+	}
+	/* Rebalance the tree, if needed */
+	if (needs_rebalance)
+		ccache_rebalance_tree(ccache, cchunk);
+}
+
+/*
+ * ccache_vacuum_page
+ *
+ * It reclaims the tuples being already vacuumed. It shall be kicked on
+ * the callback function of heap_page_prune_hook to synchronize contents
+ * of the cache with on-disk image.
+ */
+static void
+ccache_vacuum_tuple(ccache_head *ccache,
+					ccache_chunk *cchunk,
+					ItemPointer ctid)
+{
+	ItemPointer	min_ctid;
+	ItemPointer	max_ctid;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+
+	if (cchunk->ntups == 0)
+		return;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (cchunk->left)
+			ccache_vacuum_tuple(ccache, cchunk->left, ctid);
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (cchunk->right)
+			ccache_vacuum_tuple(ccache, cchunk->right, ctid);
+	}
+	else
+	{
+		while (i_min < i_max)
+		{
+			int	i_mid = (i_min + i_max) / 2;
+
+			if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+				i_max = i_mid;
+			else
+				i_min = i_mid + 1;
+		}
+		Assert(i_min == i_max);
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+		{
+			HeapTuple	tuple = cchunk->tuples[i_min];
+			int			length = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+
+			if (i_min < cchunk->ntups - 1)
+			{
+				int		j;
+
+				memmove((char *)cchunk + cchunk->usage + length,
+						(char *)cchunk + cchunk->usage,
+						(Size)tuple - ((Size)cchunk + cchunk->usage));
+				for (j=i_min + 1; j < cchunk->ntups; j++)
+				{
+					HeapTuple	temp;
+
+					temp = (HeapTuple)((char *)cchunk->tuples[j] + length);
+					cchunk->tuples[j-1] = temp;
+					temp->t_data
+						= (HeapTupleHeader)((char *)temp->t_data + length);
+				}
+			}
+			cchunk->usage += length;
+			cchunk->ntups--;
+		}
+	}
+	/* merge chunks if this chunk has enough space to merge */
+	ccache_merge_chunk(ccache, cchunk);
+}
+
+void
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* XXX it needs buffer is valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	Page			page = BufferGetPage(buffer);
+	OffsetNumber	maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemPointerData	ctid;
+		ItemId			itemid = PageGetItemId(page, offnum);
+
+		if (ItemIdIsNormal(itemid))
+			continue;
+
+		ItemPointerSetBlockNumber(&ctid, blknum);
+		ItemPointerSetOffsetNumber(&ctid, offnum);
+
+		ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+	}
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		int		i;
+
+		PG_TRY();
+		{
+			for (i=0; i < 20; i++)
+			{
+				entry = MemoryContextAlloc(TopMemoryContext,
+										   sizeof(ccache_entry));
+				dlist_push_tail(&ccache_free_list, &entry->chain);
+			}
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+static void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+	int				i;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	i = cs_ccache_hash->lwlocks_usage++ % ccache_hash_size;
+	new_cache->lock = cs_ccache_hash->lwlocks[i];
+	new_cache->refcnt = 2;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			if (old_cache)
+				attrs_used = bms_union(attrs_used, &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	shmseg_block   *block;
+	void		   *address = NULL;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			cs_put_ccache_nolock(ccache);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+		block = dlist_container(shmseg_block, chain, dnode);
+
+		memset(&block->chain, 0, sizeof(dlist_node));
+
+		address = (void *) block->address;
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size	curr = (Size) address;
+	Size	base = cs_shmseg_head->base_address;
+	ulong	index;
+	shmseg_block *block;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	block = &cs_shmseg_head->blocks[index];
+
+	dlist_push_head(&cs_shmseg_head->free_list, &block->chain);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	Size	curr_address;
+	ulong	i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash = ShmemInitStruct("cache_scan: hash of columnar cache",
+									 MAXALIGN(sizeof(ccache_hash)) +
+									 MAXALIGN(sizeof(LWLockId) *
+											  ccache_hash_size) +
+									 MAXALIGN(sizeof(dlist_node) *
+											  ccache_hash_size),
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	cs_ccache_hash->lwlocks = (void *)(&cs_ccache_hash[1]);
+	cs_ccache_hash->slots
+		= (void *)(&cs_ccache_hash->lwlocks[ccache_hash_size]);
+
+	for (i=0; i < ccache_hash_size; i++)
+		cs_ccache_hash->lwlocks[i] = LWLockAssign();
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 shmseg_num_blocks * shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+
+	curr_address = MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+
+	cs_shmseg_head->base_address = curr_address;
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		shmseg_block   *block = &cs_shmseg_head->blocks[i];
+
+		block->address = curr_address;
+		dlist_push_tail(&cs_shmseg_head->free_list, &block->chain);
+
+		curr_address += shmseg_blocksize;
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							256,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   shmseg_num_blocks * shmseg_blocksize);
+	RequestAddinLWLocks(ccache_hash_size);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..0a63c2e
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,761 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/trigger.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/var.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+
+static bool cache_scan_disabled;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	int				total_width = 0;
+	int				tuple_width = 0;
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Estimation of average width of cached tuples - it does not make
+	 * sense to construct a new cache if its average width is more than
+	 * 30% of the raw data.
+	 */
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		Form_pg_attribute attr = tupdesc->attrs[i];
+		int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+		int		width;
+
+		if (attr->attlen > 0)
+			width = attr->attlen;
+		else
+			width = get_attavgwidth(tableoid, attr->attnum);
+
+		total_width += width;
+		if (bms_is_member(attidx, *attrs_used))
+			tuple_width += width;
+	}
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		if ((double)tuple_width / (double)total_width > 0.3)
+			return false;
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+/*
+ * cs_relation_has_synchronizer
+ *
+ * A table that can have columner-cache also needs to have trigger for
+ * synchronization, to ensure the on-memory cache keeps the latest contents
+ * of the heap. It returns TRUE, if supplied relation has triggers that
+ * invokes cache_scan_synchronizer on appropriate context. Elsewhere, FALSE
+ * shall be returned.
+ */
+static bool
+cs_relation_has_synchronizer(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_synchronizer = false;
+	bool	has_on_update_synchronizer = false;
+	bool	has_on_delete_synchronizer = false;
+	bool	has_on_truncate_synchronizer = false;
+
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_synchronizer") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0)
+			{
+				int16		tgtype = trig->tgtype;
+
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_INSERT))
+					has_on_insert_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_UPDATE))
+					has_on_update_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_DELETE))
+					has_on_delete_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_STATEMENT,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_TRUNCATE))
+					has_on_truncate_synchronizer = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+
+	if (has_on_insert_synchronizer &&
+		has_on_update_synchronizer &&
+		has_on_delete_synchronizer &&
+		has_on_truncate_synchronizer)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (cache_scan_disabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_synchronizer(rel))
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		if (cs_estimate_costs(root, baserel, rel, cpath, &attrs_used))
+		{
+			cpath->custom_name = pstrdup("cache scan");
+			cpath->custom_flags = 0;
+			cpath->custom_private
+				= list_make1(makeString(bms_to_string(attrs_used)));
+
+			add_path(baserel, &cpath->path);
+		}
+	}
+	heap_close(rel, NoLock);
+}
+
+static void
+cs_init_custom_scan_plan(PlannerInfo *root,
+						 CustomScan *cscan_plan,
+						 CustomPath *cscan_path,
+						 List *tlist,
+						 List *scan_clauses)
+{
+	List	   *quals = NIL;
+	ListCell   *lc;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* extract the supplied RestrictInfo */
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst(lc);
+		quals = lappend(quals, rinfo->clause);
+	}
+
+	/* do nothing something special pushing-down */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = quals;
+	cscan_plan->custom_private = cscan_path->custom_private;
+}
+
+typedef struct
+{
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	bool			normal_seqscan;
+	bool			with_construction;
+} cs_state;
+
+static void
+cs_begin_custom_scan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Relation		rel = node->ss.ss_currentRelation;
+	EState		   *estate = node->ss.ps.state;
+	HeapScanDesc	scandesc = NULL;
+	cs_state	   *csstate;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	csstate = palloc0(sizeof(cs_state));
+
+	attrs_used = bms_from_string(strVal(linitial(cscan->custom_private)));
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(ccache->lock, LW_SHARED);
+		if (ccache->status != CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(ccache->lock);
+			LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	node->ss.ss_currentScanDesc = scandesc;
+
+	node->custom_state = csstate;
+}
+
+/*
+ * cache_scan_needs_next
+ *
+ * We may fetch a tuple to be invisible because columner cache stores
+ * all the living tuples, including ones updated / deleted by concurrent
+ * sessions. So, it is a job of the caller to check MVCC visibility.
+ * It decides whether we need to move the next tuple due to the visibility
+ * condition, or not. If given tuple was NULL, it is obviously a time to
+ * break searching because it means no more tuples on the cache.
+ */
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+	Relation		rel = node->ss.ss_currentRelation;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	/* in case of fallback path, we don't need to something special. */
+	if (csstate->normal_seqscan)
+	{
+		tuple = heap_getnext(scan, estate->es_direction);
+		if (HeapTupleIsValid(tuple))
+			ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+		else
+			ExecClearTuple(slot);
+		return slot;
+	}
+	Assert(csstate->ccache != NULL);
+
+	/* elsewhere, we either run or construct the columner cache */
+	do {
+		ccache_head	   *ccache = csstate->ccache;
+
+		/*
+		 * "with_construction" means the columner cache is under construction,
+		 * so we need to fetch a tuple from heap of the target relation and
+		 * insert it into the cache. Note that we use SnapshotAny to fetch
+		 * all the tuples both of visible and invisible ones, so it is our
+		 * responsibility to check tuple visibility according to snapshot or
+		 * the current estate.
+		 * It is same even when we fetch tuples from the cache, without
+		 * referencing heap buffer.
+		 */
+		if (csstate->with_construction)
+		{
+			tuple = heap_getnext(scan, estate->es_direction);
+
+			LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+			if (HeapTupleIsValid(tuple))
+			{
+				if (ccache_insert_tuple(ccache, rel, tuple))
+					LWLockRelease(ccache->lock);
+				else
+				{
+					/*
+					 * If ccache_insert_tuple got failed, it usually means
+					 * lack of shared memory and unable to continue
+					 * construction of the columner cacher.
+					 * So, we put is twice to reset its reference counter
+					 * to zero and release shared memory blocks.
+					 */
+					LWLockRelease(ccache->lock);
+					cs_put_ccache(ccache);
+					cs_put_ccache(ccache);
+					csstate->ccache = NULL;
+				}
+			}
+			else
+			{
+				/*
+				 * If we reached end of the relation, it means the columner-
+				 * cache become constructed.
+				 */
+				ccache->status = CCACHE_STATUS_CONSTRUCTED;
+				LWLockRelease(ccache->lock);
+			}
+			buffer = scan->rs_cbuf;
+		}
+		else
+		{
+			LWLockAcquire(ccache->lock, LW_SHARED);
+			tuple = ccache_find_tuple(ccache->root_chunk,
+									  &csstate->curr_ctid,
+									  estate->es_direction);
+			if (HeapTupleIsValid(tuple))
+			{
+				ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+				tuple = heap_copytuple(tuple);
+			}
+			LWLockRelease(ccache->lock);
+			buffer = InvalidBuffer;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_scan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_scan(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+
+	if (csstate->ccache)
+	{
+		ccache_head	   *ccache = csstate->ccache;
+		bool			needs_remove = false;
+
+		LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+		if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			needs_remove = true;
+		LWLockRelease(ccache->lock);
+		cs_put_ccache(ccache);
+		if (needs_remove)
+			cs_put_ccache(ccache);
+	}
+	if (node->ss.ss_currentScanDesc)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+static void
+cs_rescan_custom_scan(CustomScanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+/*
+ * cache_scan_synchronizer
+ *
+ * trigger function to synchronize the columner-cache with heap contents.
+ */
+Datum
+cache_scan_synchronizer(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		TriggerEvent	tg_event = trigdata->tg_event;
+
+		if (TRIGGER_FIRED_AFTER(tg_event) &&
+			TRIGGER_FIRED_FOR_ROW(tg_event) &&
+			TRIGGER_FIRED_BY_INSERT(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+				 TRIGGER_FIRED_BY_UPDATE(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+                 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+                 TRIGGER_FIRED_BY_DELETE(tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_STATEMENT(tg_event) &&
+				 TRIGGER_FIRED_BY_TRUNCATE(tg_event))
+		{
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache(ccache);
+		}
+		else
+			elog(ERROR, "%s: fired by unexpected context (%08x)",
+				 tg_name, tg_event);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_synchronizer);
+
+/*
+ * ccache_on_object_access
+ *
+ * It dropps an existing columner-cache if the cached table was altered or
+ * dropped.
+ */
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	if (!ccache)
+		return;
+
+	LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+/*
+ * ccache_on_page_prune
+ *
+ * It is a callback function when a particular heap block got vacuumed.
+ * On vacuuming, its dead space, being allocated by dead tuples, got
+ * reclaimed and tuple's location was ought to be moved.
+ * This routine also reclaims the space by dead tuples on the columner
+ * cache according to layout changes on the heap.
+ */
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(ccache->lock, LW_EXCLUSIVE);
+
+		ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(ccache->lock);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.disabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_disabled,
+							 false,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	memset(&provider, 0, sizeof(provider));
+	strncpy(provider.name, "cache scan", sizeof(provider.name));
+	provider.InitCustomScanPlan	= cs_init_custom_scan_plan;
+	provider.BeginCustomScan	= cs_begin_custom_scan;
+	provider.ExecCustomScan		= cs_exec_custom_scan;
+	provider.EndCustomScan		= cs_end_custom_scan;
+	provider.ReScanCustomScan	= cs_rescan_custom_scan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/cache-scan.sgml b/doc/src/sgml/cache-scan.sgml
new file mode 100644
index 0000000..c4cc165
--- /dev/null
+++ b/doc/src/sgml/cache-scan.sgml
@@ -0,0 +1,224 @@
+<!-- doc/src/sgml/cache-scan.sgml -->
+
+<sect1 id="cache-scan" xreflabel="cache-scan">
+ <title>cache-scan</title>
+
+ <indexterm zone="cache-scan">
+  <primary>cache-scan</primary>
+ </indexterm>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   The <filename>cache-scan</> module provides an alternative way to scan
+   relations using on-memory columner cache, instead of usual heap scan,
+   in case when previous scan already holds contents of the table on the
+   cache.
+   Unlike buffer cache, it holds contents of the limited number of columns,
+   but not whole of the record, thus it allows to hold larger number of records
+   per same amount of RAM. Probably, this characteristic makes sense to run
+   analytic queries on a table with many columns and records.
+  </para>
+  <para>
+   Once this module gets loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   on-memory columner cache, instead of regular heap scan.
+   It also performs as a proof-of-concept implementation that works on
+   the custom-scan API that enables to extend the core executor system.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Installation</title>
+  <para>
+   This module has to be loaded using
+   <xref linkend="guc-shared-preload-libraries"> parameter to acquired
+   a particular amount of shared memory on startup time.
+   In addition, the relation to be cached has special triggers, called
+   synchronizer, are implemented with <literal>cache_scan_synchronizer</>
+   function that synchronizes the cache contents according to the latest
+   heap on <command>INSERT</>, <command>UPDATE</>, <command>DELETE</> or
+   <command>TRUNCATE</>.
+  </para>
+  <para>
+   You can run this extension according to the following steps.
+  </para>
+  <procedure>
+   <step>
+    <para>
+     Adjust <xref linkend="guc-shared-preload-libraries"> parameter to
+     load <filename>cache_scan</> binary on startup time, then restart
+     the postmaster.
+    </para>
+   </step>
+   <step>
+    <para>
+     Run <xref linkend="sql-createextension"> to create synchronizer
+     function of <filename>cache_scan</>.
+<programlisting>
+CREATE EXTENSION cache_scan;
+</programlisting>
+    </para>
+   </step>
+   <step>
+    <para>
+     Create triggers of synchronizer on the target relation.
+<programlisting>
+CREATE TRIGGER t1_cache_row_sync
+    AFTER INSERT OR UPDATE OR DELETE ON t1 FOR ROW
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+CREATE TRIGGER t1_cache_stmt_sync
+    AFTER TRUNCATE ON t1 FOR STATEMENT
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+</programlisting>
+    </para>
+   </step>
+  </procedure>
+ </sect2>
+
+ <sect2>
+  <title>How does it works</title>
+  <para>
+   This module performs according to the usual fashion of
+   <xref linkend="custom-scan">.
+   It offers an alternative way to scan a relation if relation has synchronizer
+   triggers and width of referenced columns are less than 30% of average
+   record width.
+   Then, query optimizer will pick up the cheapest path. If the path chosen
+   is a custom-scan path managed by <filename>cache_scan</>, it runs on the
+   target relation using columner cache.
+   On the first time running, it tries to construct relation's cache along
+   with regular sequential scan. Next time or later, it can run on
+   the columner cache without referencing the heap.
+  </para>
+  <para>
+   You can check whether the query plan uses <filename>cache_scan</> using
+   <xref linkend="sql-explain"> command, as follows:
+<programlisting>
+postgres=# EXPLAIN (costs off) SELECT a,b FROM t1 WHERE b < pi();
+                     QUERY PLAN
+----------------------------------------------------
+ Custom Scan (cache scan) on t1
+   Filter: (b < 3.14159265358979::double precision)
+(2 rows)
+</programlisting>
+  </para>
+  <para>
+   A columner cache, associated with a particular relation, has one or more chunks
+   that performs as node or leaf of t-tree structure.
+   The <literal>cache_scan_debuginfo()</> function can dump useful informationl;
+   properties of all the active chunks as follows.
+<programlisting>
+postgres=# SELECT * FROM cache_scan_debuginfo();
+ tableoid |   status    |     chunk      |     upper      | l_depth |    l_chunk     | r_depth |    r_chunk     | ntuples |  usage  | min_ctid  | max_ct
+id
+----------+-------------+----------------+----------------+---------+----------------+---------+----------------+---------+---------+-----------+-----------
+    16400 | constructed | 0x7f2b8ad84740 | 0x7f2b8af84740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (0,1)     | (677,15)
+    16400 | constructed | 0x7f2b8af84740 | (nil)          |       1 | 0x7f2b8ad84740 |       2 | 0x7f2b8b384740 |   29126 |  233088 | (677,16)  | (1354,30)
+    16400 | constructed | 0x7f2b8b184740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (1354,31) | (2032,2)
+    16400 | constructed | 0x7f2b8b384740 | 0x7f2b8af84740 |       1 | 0x7f2b8b184740 |       1 | 0x7f2b8b584740 |   29126 |  233088 | (2032,3)  | (2709,33)
+    16400 | constructed | 0x7f2b8b584740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |    3478 | 1874560 | (2709,34) | (2790,28)
+(5 rows)
+</programlisting>
+  </para>
+  <para>
+   All the cached tuples are indexed with <literal>ctid</> order, and each chunk has
+   an array of partial tuples with min- and max- values. Its left node is linked to
+   the chunks that have tuples with smaller <literal>ctid</>, and its right node is
+   linked to the chunks that have larger ones.
+   It enables to find out tuples in timely fashion when it needs to be invalidated
+   according to heap updates by DDL, DML or vacuuming.
+  </para>
+  <para>
+   The columner cache are not owned by a particular session, so it retains the cache
+   unless it does not dropped or postmaster does not restart.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>GUC Parameters</title>
+  <variablelist>
+   <varlistentry id="guc-cache-scan-block_size" xreflabel="cache_scan.block_size">
+    <term><varname>cache_scan.block_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.block_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls length of the block on shared memory segment
+      for the columner-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columner cache on demand.
+      Too large block size damages flexibility of memory assignment, and
+      too small block size consumes much management are for each block.
+      So, we recommend to keep is as the default value; that is 2MB per block.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-num_blocks" xreflabel="cache_scan.num_blocks">
+    <term><varname>cache_scan.num_blocks</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.num_blocks</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls number of the block on shared memory segment
+      for the columner-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columner cache on demand.
+      Too small number of blocks damages flexibility of memory assignment
+      and may cause undesired cache dropping.
+      So, we recommend to set enough number of blocks to keep contents of
+      the target relations on memory.
+      Its default is <literal>64</literal>; probably too small for most of
+      real use cases.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-hash_size" xreflabel="cache_scan.hash_size">
+    <term><varname>cache_scan.hash_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.hash_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls width of the internal hash table slots; that
+      link every columnar cache distributed by table's oid.
+      Its default is <literal>128</>; no need to adjust it usually.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-max_cached_attnum" xreflabel="cache_scan.max_cached_attnum">
+    <term><varname>cache_scan.max_cached_attnum</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.max_cached_attnum</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the maximum attribute number we can cache on
+      the columner cache. Because of internal data representation, a bitmap set
+      to track attributes being cached has to be fixed-length.
+      Thus, the largest attribute number needs to be fixed preliminary.
+      Its default is <literal>128</>; although most tables likely have less than
+      100 columns.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 2002f60..3d8fd05 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -107,6 +107,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &auto-explain;
  &btree-gin;
  &btree-gist;
+ &cache-scan;
  &chkpass;
  &citext;
  &ctidscan;
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index f53902d..218a5fd 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -55,6 +55,20 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term><xref linkend="cache-scan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan refering the on-memory
+      columner cache instead of the heap, if the target relation already has
+      this cache being constructed already.
+      Unlike buffer cache, it holds limited number of columns that have been
+      referenced before, but not all the columns in the table definition.
+      Thus, it allows to cache much larger number of records on-memory than
+      buffer cache.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
  </para>
  <para>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index aa2be4b..10c7666 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -103,6 +103,7 @@
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
+<!ENTITY cache-scan      SYSTEM "cache-scan.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
 <!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
pgsql-v9.4-heap_page_prune_hook.v1.patchapplication/octet-stream; name=pgsql-v9.4-heap_page_prune_hook.v1.patchDownload
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 27cbac8..1fb5f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,6 +42,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bfdadc3..9775aad 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 					TransactionId OldestXmin);
 extern int heap_page_prune(Relation relation, Buffer buffer,
pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patchapplication/octet-stream; name=pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patchDownload
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
#4Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kohei KaiGai (#3)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Hello,

Because of time pressure in the commit-fest:Jan, I tried to simplifies the
patch
for cache-only scan into three portions; (1) add a hook on heap_page_prune
for cache invalidation on vacuuming a particular page. (2) add a check to
accept
InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only
scan.

(1) pgsql-v9.4-heap_page_prune_hook.v1.patch
Once on-memory columnar cache is constructed, then it needs to be
invalidated
if heap page on behalf of the cache is modified. In usual DML cases,
extension
can get control using row-level trigger functions for invalidation,
however, we right
now have no way to get control on a page is vacuumed, usually handled by
autovacuum process.
This patch adds a callback on heap_page_prune(), to allow extensions to
prune
dead entries on its cache, not only heap pages.
I'd also like to see any other scenario we need to invalidate columnar
cache
entries, if exist. It seems to me object_access_hook makes sense to conver
DDL and VACUUM FULL scenario...

(2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
In case when we want to check visibility of the tuples on cache entries
(thus
no particular shared buffer is associated) using
HeapTupleSatisfiesVisibility,
it internally tries to update hint bits of tuples. However, it does
not make sense
onto the tuples being not associated with a particular shared buffer.
Due to its definition, tuple entries being on cache does not connected with
a particular shared buffer. If we need to load whole of the buffer page to
set
hint bits, it is totally nonsense because the purpose of on-memory cache is
to reduce disk accesses.
This patch adds an exceptional condition on SetHintBits() to skip anything
if the given buffer is InvalidBuffer. It allows to check tuple
visibility using regular
visibility check functions, without re-invention of the wheel by
themselves.

(3) pgsql-v9.4-contrib-cache-scan.v1.patch
Unlike (1) and (2), this patch is just a proof of the concept to
implement cache-
only scan on top of the custom-scan interface.
It tries to offer an alternative scan path on the table with row-level
triggers for
cache invalidation if total width of referenced columns are less than 30%
of the
total width of table definition. Thus, it can keep larger number of
records with
meaningful portion on the main memory.
This cache shall be invalidated according to the main heap update. One is
row-level trigger, second is object_access_hook on DDL, and the third is
heap_page_prune hook. Once a columns reduced tuple gets cached, it is
copied to the cache memory from the shared buffer, so it needs a feature
to ignore InvalidBuffer for visibility check functions.

I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
are fine.
And I have comments in the third patch related to cache scan.

1. +# contrib/dbcache/Makefile

Makefile header comment is not matched with file name location.

2.+   /*
+   * Estimation of average width of cached tuples - it does not make
+   * sense to construct a new cache if its average width is more than
+   * 30% of the raw data.
+   */

Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
otherwise it is an overhead for cache hit scenario.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, &old_cache->attrs_used);

can't we need the check to see the average width is more than 30%?
During estimation it doesn't
include the existing other attributes.

4. + lchunk->right = cchunk;
+ lchunk->l_depth = TTREE_DEPTH(lchunk->right);

I think it should be lchunk->r_depth needs to be set in a clock wise
rotation.

5. can you add some comments in the code with how the block is used?

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
in the starting of the block in an array. As and when the new tuple
inserts this array increases from block start and tuples from block end.
Just need to sort this array based on item pointers, no need to update
their reference pointers.

In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
free space is not available to insert the new tuple. you can decide
based on how frequent the sorting will happen in general.

7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups);
can go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the forward scan
direction.

8. I am not able to find a protection mechanism in insert/delete and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.

9. + /* merge chunks if this chunk has enough space to merge */
+ ccache_merge_chunk(ccache, cchunk);

calling the merge chunks for every call back of heap page prune is a
overhead for vacuum. After the merge which may again leads
to node splits because of new data.

10. "columner" is present in some places of the patch. correct it.

11. In cache_scan_next function, incase of cache insert fails because of
shared memory the tuple pointer is not reset and cache is NULL.
Because of this during next record fetch it leads to assert as cache !=
NULL.

12. + if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+ cs_put_ccache(ccache);

The cache is created with refcnt as 2 and in some times two times put
cache is called to eliminate it and in some times with a different approach.
It is little bit confusing, can you explain in with comments with why 2
is required and how it maintains?

13. A performance report is required to see how much impact it can cause on
insert/delete/update operations because of cache synchronizer.

14. The Guc variable "cache_scan_disabled" is missed in docs description.

please let me know if you need any support.

Regards,
Hari Babu
Fujitsu Australia

#5Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Haribabu Kommi (#4)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

2014-02-12 14:59 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:

I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
are fine.
And I have comments in the third patch related to cache scan.

Thanks for your volunteering.

1. +# contrib/dbcache/Makefile

Makefile header comment is not matched with file name location.

Ahh, it's an old name when I started to implement.

2.+   /*
+   * Estimation of average width of cached tuples - it does not make
+   * sense to construct a new cache if its average width is more than
+   * 30% of the raw data.
+   */

Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
otherwise it is an overhead for cache hit scenario.

You are right. If and when existing cache is found and match, its width is
obviously less than 30% of total width.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, &old_cache->attrs_used);

can't we need the check to see the average width is more than 30%? During
estimation it doesn't
include the existing other attributes.

Indeed. It should drop some attributes on the existing cache if total average
width grows more than the threshold. Probably, we need to have a statistical
variable to track how many times or how recently referenced.

4. + lchunk->right = cchunk;
+ lchunk->l_depth = TTREE_DEPTH(lchunk->right);

I think it should be lchunk->r_depth needs to be set in a clock wise
rotation.

Oops, nice cache.

5. can you add some comments in the code with how the block is used?

Sorry, I'll add it. A block is consumed from the head to store pointers of
tuples, and from the tail to store contents of the tuples. A block can hold
multiple tuples unless usage of tuple pointers from the head does not cross
the area for tuple contents. Anyway, I'll put it on the source code.

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
in the starting of the block in an array. As and when the new tuple
inserts this array increases from block start and tuples from block end.
Just need to sort this array based on item pointers, no need to update
their reference pointers.

In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
free space is not available to insert the new tuple. you can decide based
on how frequent the sorting will happen in general.

It seems to me a reasonable suggestion.
Probably, an easier implementation is replacing an old block with dead-
spaces by a new block that contains only valid tuples, if and when dead-
space grows threshold of block-usage.

7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups); can
go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the forward scan
direction.

It shouldn't happen, because the first or second ItemPointerCompare will
handle the condition. Please assume the cchunk->ntups == 1. In this case,
any given ctid shall match either of them, because any ctid is less, equal or
larger to the tuple being only cached, thus, it moves to the right or left node
according to the scan direction.

8. I am not able to find a protection mechanism in insert/delete and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to invocation of
ccache_insert_tuple / ccache_delete_tuple.

9. + /* merge chunks if this chunk has enough space to merge */
+ ccache_merge_chunk(ccache, cchunk);

calling the merge chunks for every call back of heap page prune is a
overhead for vacuum. After the merge which may again leads
to node splits because of new data.

OK, I'll check the condition to merge the chunks, to prevent too frequent
merge / split.

10. "columner" is present in some places of the patch. correct it.

Ahh, it should be "columnar".

11. In cache_scan_next function, incase of cache insert fails because of
shared memory the tuple pointer is not reset and cache is NULL.
Because of this during next record fetch it leads to assert as cache !=
NULL.

You are right. I had to modify the state of scan as if normal-seqscan path,
not just setting NULL on csstate->ccache.
I left an older manner during try & error during implementation.

12. + if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+ cs_put_ccache(ccache);

The cache is created with refcnt as 2 and in some times two times put
cache is called to eliminate it and in some times with a different approach.
It is little bit confusing, can you explain in with comments with why 2
is required and how it maintains?

I thought, 2 is same as create + get, so putting the cache at end of the scan
does not release the cache. However, it might be confusing as you pointed out.
The process who create the cache knows it is the creator process. So, all it
needs to do is exiting the scan without putting the cache, if it successfully
create the cache and wants to leave the cache for later scan.

13. A performance report is required to see how much impact it can cause on
insert/delete/update operations because of cache synchronizer.

OK, I'll try to measure the difference between them on next patch submission.

14. The Guc variable "cache_scan_disabled" is missed in docs description.

OK,

Thanks, I'll submit a revised one within a couple of days.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kohei KaiGai (#5)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2014-02-12 14:59 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:

7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups);

can

go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the forward scan
direction.

It shouldn't happen, because the first or second ItemPointerCompare will
handle the condition. Please assume the cchunk->ntups == 1. In this case,
any given ctid shall match either of them, because any ctid is less, equal
or
larger to the tuple being only cached, thus, it moves to the right or left
node
according to the scan direction.

yes you are correct. sorry for the noise.

8. I am not able to find a protection mechanism in insert/delete and

etc of

a tuple in Ttree. As this is a shared memory it can cause problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to invocation of
ccache_insert_tuple / ccache_delete_tuple.

Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Regards,
Hari Babu
Fujitsu Australia

#7Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#6)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

8. I am not able to find a protection mechanism in insert/delete

and etc of

a tuple in Ttree. As this is a shared memory it can cause problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to
invocation of
ccache_insert_tuple / ccache_delete_tuple.

Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Well, concurrent updates towards a particular table may cause lock contention
due to a giant lock.
On the other hands, one my headache is how to avoid dead-locking if we try to
implement it using finer granularity locking. Please assume per-chunk locking.
It also needs to take a lock on the neighbor nodes when a record is moved out.
Concurrently, some other process may try to move another record with inverse
order. That is a ticket for dead-locking.

Is there idea or reference to implement concurrent tree structure updating?

Anyway, it is a good idea to measure the impact of concurrent updates on
cached tables, to find out the significance of lock splitting.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
Sent: Thursday, February 13, 2014 8:31 AM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2014-02-12 14:59 GMT+09:00 Haribabu Kommi
<kommi.haribabu@gmail.com>:

7. In ccache_find_tuple function this Assert(i_min + 1 <

cchunk->ntups); can

go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the

forward scan

direction.

It shouldn't happen, because the first or second ItemPointerCompare
will
handle the condition. Please assume the cchunk->ntups == 1. In this
case,
any given ctid shall match either of them, because any ctid is less,
equal or
larger to the tuple being only cached, thus, it moves to the right
or left node
according to the scan direction.

yes you are correct. sorry for the noise.

8. I am not able to find a protection mechanism in insert/delete

and etc of

a tuple in Ttree. As this is a shared memory it can cause problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to
invocation of
ccache_insert_tuple / ccache_delete_tuple.

Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Regards,
Hari Babu

Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kouhei Kaigai (#7)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Thu, Feb 13, 2014 at 3:27 PM, Kouhei Kaigai wrote:

8. I am not able to find a protection mechanism in insert/delete

and etc of

a tuple in Ttree. As this is a shared memory it can cause

problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to
invocation of
ccache_insert_tuple / ccache_delete_tuple.

Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Well, concurrent updates towards a particular table may cause lock
contention
due to a giant lock.
On the other hands, one my headache is how to avoid dead-locking if we try
to
implement it using finer granularity locking. Please assume per-chunk
locking.
It also needs to take a lock on the neighbor nodes when a record is moved
out.
Concurrently, some other process may try to move another record with
inverse
order. That is a ticket for dead-locking.

Is there idea or reference to implement concurrent tree structure updating?

Anyway, it is a good idea to measure the impact of concurrent updates on
cached tables, to find out the significance of lock splitting.

we can do some of the following things,
1. Let only insert can take the exclusive lock.
2. Always follow the locking order from root to the children.
3. For delete take exclusive lock only on the exact node where the delete
is happening.
and etc, we will identify some more based on the performance data.

And one more interesting document i found in the net while searching for
the concurrency in Ttree,
which says that B-tree can outperform Ttree as an in-memory index also with
an little bit expensive of more memory
usage. The document is attached in the mail.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

T-Tree_or_B-Tree_Main_Memory_Database_Index_Structure_Revisited.pdfapplication/pdf; name=T-Tree_or_B-Tree_Main_Memory_Database_Index_Structure_Revisited.pdfDownload
%PDF-1.3
%����
5 0 obj
<</Length 6 0 R/Filter /FlateDecode>>
stream
x��}[��u����+��i�����>�(�%��p(lY���p!;��;��2���TVu7��bc�����<��������*�����W�~�e�{�������8�^M������|��E������G��?���}���o��Q�C��o������g���CU5m�{|�&v�>t����#3�����0���rlc���zt������|8W�������
���������x���m=���<j�U��m��	����U�N?����n�8t;���nSi���{c�������V�������`�.E��M3������Cv����f���������������6}�<�,�[��t#V���
��
���W��W�����x�n�z��-��q8����}��!����m�5 ���
=��H4���U3
V�������W��_������[r�����j*�����y L����[}8��[z��D�����Zu������i�������3xG��g���-|8�)���o2D����'}O���K*d_�~����|I�q�jS��~�
{����SJ@�9�T��BO>�Us������]���@]C���ur�����5�V/��d��NJ�����f�9��Q��v<�i0�{�&���������d\s��}w�[���2�_���~��N����������;�l���{��������m7L3�C�O8���E���G��?��BV�����y^���w�������Z�[������c����}�%<_��_���;8�}��< �n�����!����O�����u�s�Mc�C����uw�85��3��e�I3��O�W�'[U�������z�h��w];:J;j=��;P�0�����!��9u����v��$��3�(��{;��\��~5TM����}��~���|�(�4F�	�~.~y3�;$`���77�I�	{@BN+if���.h�@���{T��{AA��5��������Q��G�4�E������F[G���:@<�g�'^�����t��u5��O<w �9�2�'��m~O�����jW�
�!�e�������lwh���^aR�����kU<<�b����i�n�
�`:��)��Z��=��uW���J���r�i����\���\i��+H� ��2��������[����ss��	���D���&DSG�<X�-`��S�����S�)�����Fv;�a����i�.�zIc�g���Aj[5����{���Uae�y��M�����7��P�;j
��5=6����~������+_���������1��B*"�o���:z�vp��kw�Gh�>�y���?&�a�4�*=�@n	���5V|�u	dF?���rNH�z�����f����_A��CTK�'M~n����=4��(wk���n��W����B��h����5��m�������Cc5� �����SCv��?\9/�=�jZ����������
`����\$�z�le]�[l��P���#������{�,�OS�?���M|Q_�+������)�q�hZ����(cQ>���0��j�������MO��m�ERwc�3,�	��]�����g�����lePnj��=�x�v8�1N�[�hhV��N7y������������*�tV���W�&7�0�[x<�7����:�Ei���K�w���'{�"����Z;���zO����v��]�'����iTR�1{+;�`��7\R���J�F��+!oH�����<{�|��=B�g����/�j6<����Z����2w�|9u�8�:��z?���{g{������}������k�8�w���r���������*�PN�����=��Q��o{����'6�d�(���g�9��������M���W���A�?�	��*������O��W��qg�x�������GMSwQU~����p6��9�}�����N�6�k��U�}�����������	��?�����kC.�KZ�'�{H��[A"LB�=B�tx��{w=�=L��H���^;t	B��L��W�0D1at��b8�F���9���.����YG����Cd�N
�u����6������+x�CXA}�i�������'�h
��&f}���&��9�$��o���(����a��H_/v������X��}"E����X�P��4V��7dq�o6�x�=�d�L����	�)%h�Nv7�]����I+��� ��pg���M '1�E �����z�2a�3�c��7�{>e�BD����3N��vG
��!�f3-�#�����d���h��T`u����tw��:������d,�3rB������!�Vwi���"$7���vx-=���qY�����{p���y8��Kzs�V"��)n���:���]��F�������1�[�q��e�C��DK9�stL@b����W��� ��'���#�7�����ID_�9�$��O<���E:o����i-����}���|�vOV6��9=%2��?����m�D���_�7�2L��O���\p���{N��Yo��S��)������Kl�CY�5�-B)���$Y��J����U�R.����,���3t
g�I������QH��_`O��a���K�Q-�0Pg��Y���&*��UP@OK6�5Q�H��+�y,"��{p.����r �m��Nq��n?�X=�V����8$����[|����-����0�"�)���@�;������s"I7��l���\

�����Ew"�m�|��
.�eU�^i:��q�������u���A�v�d�-S�%���.i�'Q�r�F0����8l�y�u;|L��K���k|Q��,Rn��Y������D��
��)�~2�9z�����0hX�1������N�Q�
D�j��4tf��4��
����
�,t�"HB�������%c�vy�D>���f�SC�[���K��o)Iu�8JFk��Cf���sL?n�1�)��x�y*"z]����
R	��&��4��'��R?�a�T�=b�`�S���X��������W�2q^���.!i*w9��F0�C�bx��v���^���r����[lU���,~xRM�>��:��m��
�B�f�
��<����G`�n��`�/0�����v����MK+� ���<��:�~AW7���_�7��S�
��N/1����(���w����7t>J38�W���'�E;�l�r�hS�q��1�T���	����"����J���
>���d^�Y�1cZ}�t9A�)��[Q��������������\�3Og-D
�
�@!�x��]����
�
�OZdK}O��j�[B��zZ�����Y���T��	����%��h��1�����I���������P���X�dA��������'I�Gw^u������h5U�j��y�VczE��/k�P	�?��������/_���8���
/Tt�b�m��_���W�|C^h��#������CAd�p�V 8����q��:�P9�A������L�Yh8gF{�eoHT\�^��n�� 
)�8�n��"�� 5���zpM���_�S������+��X
��w�g�C��HBU���A���AC�nI���,(A\��d�l
�����|��&z�5T	����iL
t��'������j*������k�@d�=~����������jE.�$D���(�(dy}2�~=��+���2���U��b�k���������%P
����nF�u��A���+�"��0cp42d�U
B��%�c��C���!�CR��m2�`,�6�\;�go��$^��M�]���������Jw�L�a����`l9�Wt�?o���0H�"��Mm��Y;[�Q����jb���������S�}{L��\+.�4-�����S��Zp_�"����X��:4 ���C�G��
���������h��'p��%ZX��X��H���"u"3��fR4Me���
�?Fj�d�*n�%��z�h'���.��(��i�x!E11u	����K��K��~u7d��?���O��_�����/��YI/m��]~��O;�\6���
wg��HTP����_��k�RK*������stRXT�$��[@��qdE��P�{M,&��� Y�~O&%sX:�kmO�W��>�NT��u���V{m�
���;�����WO'0,_A�[�;/d��ZW�KX��j����):���m�Rs$�����7�o�>	8�i���Th��� L�2�_�ePA}M��>�Bp5�����a��������W^��D%��$+�
����{:�H������w����=��������B���{��j���������}��&��wIT&�Wdb��y��9�x�5c�����S|fI���
�P�#����g�
UX+}+����T�����k�G�U���$.a��z����@��h��2�	fQH��	��@Z��M��CD �R����T��Y�QC	��N�
G����,p����
�#|����kB;"��b�h[��og�@��t��]�^��x�����f��L-����w�"G�s�mQ��*T^��x��rt��s��	dlC!I�&�1��(�����9�?�����	��)����]�vT��������a�&XX�~�'?=����[~E�	O_(�W������C d|7�',������-�r�Z������y�*oW�
d$f!k�"���:��a-�{v��f��%Y1aw��t�����c,t��oU�B;���	�K~g���t�tzs�F�J�i\�V�3X����gv��p8�8��$�=�A�w�ah�	()TO$6�;RO|m?V���Yo�k��j=G�������,������ @����0��H[���<V:���,�Hr��@D"@*�I���!��C�h�c�	�7�S�2��2�u+�j
��-����c&g�F�C��z}�
%m��Nf�d�m0&��-��+E��"AFb����sp}`���Z�d�Og��)�Q�3b�E�_�cm����ll�����X�C���W����xXm�jZ�*~/H���z��1>qA�����b@���='Ra����	�b{�
&�������Z���J�`�!���p;�X,k�����R*"!��S
�t���;��C��%��^<��0�����5���]8wz�8Gy���yz" #=�.?�}#����)
m��0��P(\����{I�|���f�k_�>F�<��c�E!�HJ:��]*�'0� ����1����t�s�|-��8��sD���1��ddI'Ek�Dgf����]*�t���k�i���5�[�,���k��$��="Ro��o��S��b��9T����F{2���"TP +e�#%A#T/���51�{[�H�O0U6(�}�JIC^����lM�������Yx���Kf�[����<��t
�L��_�:l��R9'y�� "��];c�Mr�]W0�����wS*���7�F���9��w�l2�Ung}';�q�S�h��m`W�7����I��eM���;�A��sY�eU��Q�t���T����9�\��M_�Z �l���o��!]: �R[,E,*��6KF��Xf\�<���r�,+�\i�	xx ��-e�D�H�`M��j�5�n�_���e
�&dMc[��2t�Z5w�����d���	���78����gU�26�n�=fV��"m���R2F3nT�C{��Cl&l��(�2��By�,�W�B�B��Rb9(��#�����G�)'+��FN���LI�["6��3�E������=a	H����9��`�^`[pB�SR]>���Q��
�Gli�<}A|�$7<tL�&��9��P�B�u��&�>�%���u(��%�E	�o%��b��E�8���y�){�)8^��W9e<�{����Z�vR���v�V?��R��55���s��gE3�J�35���0����pty�:(�5�-�@%cP�|+!�<n��-�>�0<�[�>���]X����5���a�������s5e���>�	I�aS���];��u�o���C���H���d��Di/��K�<���F�A�hr������p��7h�9�����rP���d�b�'�;#�K��-p���m,T����6��q�!�fB�}������0Y������4�v����QTp��_rabY��U��\����h��y�hx4O�Py���L.u�L9� Y-`�9p���
���k
���\���v,N��3���������q��D�����-�a\�����4��������)%H�����=���������D���Up��D�s��8�O�!/�eem�&��R|+Vc.���7����j��tG�L�7I�pT��#�����@ft���r���3IH�TO��3)T�/�����B��# �����j\-/�8.��/����������&�f�:�I��j�W�W�c�q���5@-�� ����k�3��y%nSO��
�:�Wx��,����Z(������^3�f���UOl��?G]���` �@�{s$��tu��K]��W�2c�\��d1�"�0R�b�"�ba��3WuBe��	!�PL��K����,[IQ#�+��������>���B�.bc�X��L@(�hd	����a7��L�Y����>5�0�ot�
�ux�
|#���T%�O��aX�\�HZ���*V�v��;�x�sjV��O�y��)��c$������K�������J!0}�7/��ty@���c(�r}�{O��z��y�<����0dK����b3��T�U��4V�������cI��=t >aj����Z�i��j�nR��a��������������n0��Gq���n*>l�����A��\p�0����}<���8�4�\����0����2�d������0�5d
�_C�����WB�q,���1"�x��������>��K-OMeFNP����ogN�O�p�����"
��b�7L����N�:�"������U\������	k�ya��������$��E���VxB��m��R��6�|jG*��`�w"�|��,�_S�3����n��v�����L���HES���	2����~�_�vdLdFl��|���cwB'�^x��~G���p)=y�A��rR���>X���S�17\'��yR��5�X��!O���bh�j^����_NJz����w�$���E�}�nOL�IFSE_�G%G�'���I��->��V�������D4��%.@�0�������.�s�l��Wtv�l�
���v1�XP��r�����T���g�X��:��_]�����B�-�!��������w�{+�V�c���Q�=f�u���zg+���A9^ s�<�Z���y/��R�?�����~���m
;D�
��,�Hx���j#�'I�L9p��	�[�J��9B�������w����tIi�5K��_�`��^�g�=P��}v��C�,�!�km�x
�@���e��P�)|l���&k�`���mLf;z {Y2�H�`d�0b�q]%b|�t��KpN�)�`��S(E#"{|���{{�-��[�AE����7FD|��	OYV��j�sF�=S[�PL��;P����f�z��m��sey#"e� �_R�jC��/��0hb���$i��V���������N3��
���p4����1��> �kk��������F�ze�)\�\/%A����qZ�r|���'��8�\��]���{�����W`f`�je�|`"K���E�h����9D[P��	:�;LX�J�l������P�ZWc�T�,6-��{59���������}8q^�D�����6�K��hSHZ��P���`[�'��d=Q'U�{2$p6d$�S�VQB���Ozb���&�l�_�iN�!�9�%��k���
�q��+#YW��A�4��t�J�J02�OfBU�E��9�]=4O}*���t���eD����D\��K�c�:�B�e��B��M�\L���f�]P?=3d�������o-��g{Qv�3��}�R��#���t���POe@"Bel���f��T;Lp����o���
\%��=G�f"�jA�l���Q�w�k4��/�LLhp���'�x��0�\���
~�gc1irn���~ �x�B�1��B����OeBt`|�����6]~����<�S���,�32y��@cc@�P�f5p"�F�~����^,9�`mJ#�&%sT�'6����	�p�i�)�(��Xo9�?S���d���lJe�M�"d�u��2���Fs{d���Ij�/!l�'�^=}Sf�y=�l�����
�PmQ��T���7�F^S c��������2pv���?C
_�������wlARRm�v����&�[+�*�x,
��Ab��Bo�~��������$����:*��NdD*����^M��
3�2�W	T[���a!�C�te��BS2��I��c��T')�jAF����������
Ad.���
�o���]t��%���3��!�^*Z��$�8���=���
�n9��:P�
~�o����L����`b����^34�T�$B04�N34���\=�3��Zg�h�`X������q��H�IY��nAY��!����L�����1{��^���d*�c��5��-sG����r��ZX�Q&���2i���$1��w��������k�T��Y��#�B�'O��� >�:�� �S�2�&�rh�s��Cg���!.����O�?�'���lY�a(�'��>�)(�L��2�SW�
1P�g�9ae_������L��S��I!c6��u���rQ48�O���~���*�����z\��?�u�b���w)!$n�'W��)����O���s��!R+}X�]�W��>�O)gO/i����%��C���� ��'9�X���,a�$+���8���i���?.-��s���Cj7�����=��f�����zZ@yRX��%�����K����ZP�
K
u���2��t�0�����%�g�������D�iRo������e��#�bh%�����a��`x����f3qN������o��`\/������s&� x��p��J�Bl}�/�yRy/W�Z@��e�VQ�����/�`#O��Aa���/��pEZ��m�Xl��3��u9���)������v��P��#�Pyo�$�Lk��b_�����/��E����z��r;�r���2�m���'�B��������]SuS�n>_�]�1P���~)�6K����V�zM��6T���/����pI���_)T�����Y;1���8&_m�r���	��<lX�`G�;��r�����.�8��0�W�n�8�wY��'7@$F�[zrI&�Z�M�&F+������qG������N��T��qk�9U6�
���l�������i���%}o��pj�~�+t����3���W�Z�)��~TG�|78���~&8�8��|��]�ynZ4�i�}��O�&cY���+b�k����>������
n�h�%$��a��:�����[w��NR�g*`3��U�Ax�����d)�r�V��^�1�
�N��T�������R�X!��`@����O�����Gl�Q��O���L
W��T@����3G�#9�����e�u�s� r�e|
K_m�@M��`z-���+U5<�C�0x)��@��^m[���-��4���|>�p��D<WmC�a�]�vc{�oW��2�����8���
��lH��L*��2&T<a�"�`X���S�������4N^��fn��-nn��`�U�#��p{[(G��83g���H��xX������m)�2:�L���A���\���az��<�W�eXXN1Fq}��i���IM�h~�MQ���8�Li�z��	*1�U"c�*�1�]�!���7i��d[��_��V���9���&uH*��OP�&��6�B���,�p��,�����`UQ`��|@�x�[9����@�T��������S6M��}E��_���5FQj���Qj@k����W��TG�Y�}RT�7\��t`�t��AD�!W$�-��p�c�Ke_$��0se�!s0��gz<�ton�����*��n����.�|i&%�3�����BW�O�5��nG29`�^��[Q�|�8U��Y�^�3����\�y9'����FT�����������������nY�������NR����VVT� 'yy�Lx�-��F
�����Gs��9:&0T��&D����	�T��-��.��F�����$?�B�������r��#guH��J����/�V���\sk���1O��	!�q��R<N��cf��2��bqs���(�������p	�L��J
��O��OH��AB�8�x���M��^��+)M����}������/�1��_570�ur��~��
["���vfHL/��^����mT�!���qP��	K�k(�5
[5.��e��i��U}�T�*u2r��TA��,����.w��`������W���[m��JK�H-Cb� ������Fw*\���Z��j��������fOe����x>8��K�k#PY��?V��h���*��h�[V$�)xE��W<�+9���9z�e��,BT���F��g�����yA�J�����:�	Q��{ �I=�������
����Y^�-��Uc����yIQ��m�`��'�����;u��l����B�m������3����bV�����;�&��L�z��W-jZ}J6�jPs!�L/?����CU'��V��&����&�$	/��/��wM,j-����^�����*X��It��%Q�^�j�����GnB�3[f��^�����#����Z��\��{>�,�LX�l���I�H�qe��-��~���
�U���0�r��
m�e�6d���)T�k�����������IR�{T%u!����%�\C�P81w��"T��4�O�C8N�R�.(b���J�55�Y�&AZi���F�j��Sz,��o�(1��,H*JX�24XH&���+J�f\�Z��^j����hA��Y0�3�Sz����xP�`\sv�Q�#��]�#���f��$������6Qj<>U���h��_��9��-D��6�aC��#4����<��r�d��Q����)(6�}���y�bZ"�k��Tf�]SXU�l�F#GL(�_T �m���
���y,A%���"��p��_��4��[�@P�>�"�����H���%�*2c��dV��A�����F�8L9�O&S
�f������3G�7���Ksf��Y�#g��E�K���l��"JRM��FW��������n!�(z�YS'*	� j�f��!��Q��{F���e��A�7�����<�����U���C5���ow��2mn�I'�w�=sU �dd54�(��EDB��x"]~�D�S+3G��N�g�e{�&sL#G����hx��U���2E���A�3�����W����t
�n����-�u��6���d�*ePiY�|}�n�m��r�����&z�g�A��XV�'�L}MQ������)Kv�
<��b�I�1�����
���W�l���������r����[I�P�w�}��O�X��G��7{���0J�������{@! �|�m�����Y�xd�=Z/����3v��]��`����Q0
_���)�
Z����ga��/i��V�Q���+T�9&>�
k$��j���J$�<�'�P�7k��y%o�.j������!(L�V��a������_^T	�R��R
��,����V�����O��!��`'���� ���	f���n���j0W����#�����|�%���-�?d�K���m�x���m3�hi(��J���k8g�
4�DtFM-C�jfBE��T� �7uW[7�f��8�D�x��w,�gUjs�E��J�AU�1��GqK4�E�Gq5�"j��\!��gM�e�����L[.
����/��+YisD�3�21�u�F�����^<����Hjg��q����I���o�Y���Y2sf�/��<��5JZ���2~���h?I}*}"'x�v0u�<E�n��N5��ZW����������+t�X{���1���@�[�{�4-��:,8������`�{�H�N���*^�����L�L��0�$Ej#�Z���y����slw���"�L9C�������6
,�}Y�TnZ����o�W������S,�2{����S�~[����5f���N�&�	QU(��>��}��������%}�+��t(3Tw���*QWk���Ze
���y}nz���j
�)�*��-����������2��^���#j"!�D�������%���Ofb��h���~�����b��-z���>���0(] j���������y�C��N�r:W�����w�����2Li�<��sG�cii��F���	���@�[�|I��9�j��0�h�V@���8
�����.�`��xQ�f����7�{�=��!�E�D���]�m�N��e6�6"���)����,��{T����lG[2�
�������� �hqk��V��m������^Pq?�1v�����J�x��-�	/�3X_����r��)=O)Apb�r�wB�R��w-~y�kc�������giK��G�6_>���0<x6P,T���!�� ��S�����~B�
kUj��J�6��SrQ��p����^-�D��u�U�Q/�%��0R�kwS�����&�X27YpMFlr{3���z'�Xm{�V1��PM�gW�&��������/��V
d6�^mXj(��;���������d�A��~9QB���J/1�s�"�#��<*-�e8!&��������HS�����ujOy�����l ��sx��rQy�
����D*E��o������@�	�����T�<�+���=u�U':�;J8��D���Bgqj�t�n��R$��*�n9@B0�.Bw�������_�jF2��mD�Hi*���W�����HQ��������j�RUZR�-���C�A�Mk);��.��v���<�[S����e�j..�m!�Z�-+x���H��8Y�f�z�t/g>�bmC2�7�`}��+�P�����N*��F�{p���<L#�VB%~�z0rg�S�4�\���\'�����:�l�]�SY������_g�&�>��
	WA�l������y����B����
������s*�Y98�O�>r��&0������8����	f��6�
fp���`��o{K�$SCd��
&���r�����U����)`K��Y����''L+�^o�����1��)��k}7��VTb��w�eSBp�����.��'�W"A�X>6�y��<����a�d�}�|�W�+P�f[�����
�~
�
|�!S��@{���f9�2q�_��������
���i��6#���!y6��Si�W��>\MR�q�wu�r��/����
�wZ�b�Y{��l���N�{O��i��B��F��2���W\������Zw������F7(s��.��.JA!�k�&���Q���]�1���p�9o8�7���g#hQ)f��b:
��D��s&��
M��uV�6q�A�"��FJ��`=���[��/M�����������O����	�+��j�U�!z��������F�S�0��d�|�&�;����������<#s����hDy�p�5aKU�t�t�t��Sp}[��J�n�u�A��C�JH�$������X��o5��y�C����{�>@q���v"��
��0�B�����a���J���_��F����XJYA���� �B$�����G�2��Bu|��o��CR����W����b���{�d�r��v��Is���n�.n�"
j��%A��
m��c<��Y��?I}SK�d{.��TH�6�M��2�q��%��w.�H������	�w�L�������',��t�?�j����f4���u�(��c7� ��{��Y�^S��7�Pz;���g ���&��Tto)�<K-eEO�e+�^��{\��D�3
(
�7Z+.�v�}��@��'j�$��3V���)\�Y����n�uD5�`���f^<XN/Z���u�!W,q�*�c��3��O�_:6�1�j	����F�k��5�!��S"����^1��o!���9v�wmI�K4����x��B��@BCf�����'t\�S$(b_��@��_R�����i���{�@�C������6�v�Xj�oF�J�U(��v�n�T�4�[	4|�<y�+X���9������;��Ud���na+`�kQ�#v�k������AD������|����d8||&c�Yh����J�+]���T�o��x���+��o������>6e���������,�mF?R������6#n��^E�U�P����`�0��~������k���M�/���)����zF�Y�*!�@��i��D\���<jQo�
T;���9�EKY�g`������E���S�~������x(OR����y��Z�w���-"���m��}'�a���h�ag��&�t�V:�����y��[��U;e����@�	���IL�5
�~?���Spj������:�(���G?W�w�������<:�(&{P��
'��a��
6���2.b2~?���G��2��s�o`q��/A�����i5�����j������:�Q��]����|���:���s�����.�l�������N�����endstream
endobj
6 0 obj
15001
endobj
24 0 obj
<</Length 25 0 R/Filter /FlateDecode>>
stream
x��}���q��������i�*��qZ9|	GXb�,=px���3�����
�U@"3�@���I��V8D�i��D"�����pR�n�I�x�������wo~x��'J�;��w����v��������/��}���wO���~����Y��	����_|�������={�����rr����'j�����iv��s���[��}�����z�a�0��E-S����'O_�x�~��U��L���q�����n�c�#����l��0�:x�5�|�[�{���������������f��r��abs����~�V�����G��Q�����>��8��k~E�Nj)'������c��K{�'e��n���;�Dw�h/�F������~`�6����kO�a?��O�I��x\}�m���c���x�<!R��\��?��i���'��I����n�My,��y�X��Oj��%�6|������E�[~h\��xd�+������.#1��gB�&������h7�\F\��(���\� ���
KY��&����y�i���:���H�����;���i����s�D#��bu~��UD#1�>���h3v^�0���i	Oh����>��HX&���G��5=��$��_5�M�r�HS�s���2T��H��J��<f?2dP���F�c��k2��;���N��g���i^�8�����MN���F�y��Y�=��'
��[������B�a��t���f��|�|����G�A�d\�,����@�aO���y%oj9�����6��������$w6��d����W��l0�E���(v�W�;6-=����d�T"������uJ�@������+%�Wv�@�KY���=�#?��S����i�����	�=!gZ��4���&������n,�=-���������������^J=������W�;��W^v]�6c�L�7K�����*�G
�Ge�X��9g������Eq%��y[����*vV��H/Nz�Zy`~����v�T�a</�y#�@��*?Qz����1��F[�OZ��C���5P���rz^M6�=���R�t�hh���S�@��X�� W��Q��U�=�Y���������ef�����2+7N��#�,����g�+���v��
;�!��c���b��i�������>�Yy��9��>���?���1��_6.0���Y�bIPsIp�^f%FS�s�A�J��y}�E���B|���!>V���ac6HR�����=.���W��%��o8��������<��}R4|�1D7Y�[~��)�%�Q�K�S�����=~������]
%�+�������-��S�%�/	9�����:�@���i@���SO9W/�aU�~����N�X�ZN����_�^>����pS�j��%a��^$H����1�i����I�2Z��"��*���b�_�nX[��F��RaK����/D�H�E��]��"Oq)S��)�)�����?�����5������������+WA�(�i��W\����!7��U����������X	�~�VF�{��*wj?�Yt�$�����3�������oy;'��/3��%����"�X����x������h�h���}��=$&�6�@����8v���X���8�����I��WL8%���*<p���;��&>�}
�#A�����v���c�����u�/��i��&��*^
�v]>=�V�6v�/#�����w�9�J���P�t���`@��e[�P����t[��rH\��2��7n�*2�:������g�����@�����^���_5�3�5�y&j���8M�n'QC��mV]���x��g�����F�`��/�(�������������;~de��6����Y~BL����<�n�8v �}I��q�n�9��T�!�4�,QJN����f�a,����i�`�1$��~�;I\r��b�A)��N��G��A��O����6�����Zd��\-�4;��������$pz���X&�d�
/����!t~��y���Fz�f��*]�"��PUj�5�m�<BlFO��l�����i�������:��Ix-�
@Y��l�Z3���Z��0�%���`qL^,	�E�s} ;-x��a��$�.���*��>��G%�*G���2��P��9��h}afg�!w:�r@�#���g3����Q��v�_���W��������Pi,�XF�����W['�:�WP��LR�fM�x��
�����k�u�F%���FiD��X.aM�t��.�c��Rp��P�l�yA���r/���:_��_������R����o;�s��Dq�b��IG:F�|�I��������
����.;o���&&��(X�W;8(*B�BV�\��MK&-������'Y���{�2�F<qSt�+VB�*��{-:�>1�`�dp4�;�5�*��9v�q�A����^�c[^�2�� :suo�
}M?�G|�D�G���%�&S�4������5��^��=JDx5&G�'�jO���r����M�b�M>�F����cN�h��g�YM�W{�����e�����(�|������8���i
&�eV��X��eh���y[YJ/�I�Y�b��.�����,}Z$�-%#��2?�C�E����f���A��V�_�G���6��o����FHw4_-"�e�y�3��bpFO�M�$+����m=�^�I����\����T�8�������p�i��__�}:?a<Pqr'3��xQ���+>��4K��v���AGGT����$$�D����Id�@�����y�b�a�������FH�@��N����R��b��b�t���FA�=sk=����8�E��8xt�:_��c�p��Y������{ 0������S����o�?�����>����	[DBU
����|q���	l��q��C\u��g^;������l�;��o����Z�����ekZ�*�~Y�����`�V��_��8-�N����o^	U�YK/K��G��m���B�_�����H����0U��bL�\�q��-b�Gi�c���1}Q����#N����&r�*�������4d�v�n���'�m�+L�Bl`�U��`q���3+}Z�Q�{������E������p��5�����Z�D�� ��)��$9�S��odr��Be^t�@���#WKhDK-y	��H.��*Z�a���p�KL�~��Kf���k
^�U!��t;^<�h�E>jL��g���G,F�����e��<��A;��	��t
�Kxq�d&��t�����h�K���5 @(��������`���������u*�
���po�4�^�D�~/2�����
�D��$34,��#���/I����w�}�3T���rH��k� ,���e��N�=�X�@������,PM&��b�z�}�/�@���5�����)�3������L=���)��Y�����2�:�$�e��+�]�����*�`�i�y�y���HU�La���i�H,�4��aIZ���X��`�I�x����]���o��q31KL�@�(B>��Y���������\RI�1��z��
g(�My:��?�N���I;_�Y�4�� ��,�[�s�\�pE��t�H�)�g�����t���ang]��PA�K�$p}~�w�5K
�[�B9���������F)Q����"a��Dw�$���]Lo5�6(�q1��#yct�K6����id��F�S^+?(><�	��j����G��������(�������M��y-*��y�{HX��I�����8�2nKx>X0\g��,���P�x�50����q�|:z��!��ec4}Xgg�v��!��<�,q����~n����z�?����K����J�Q��mY<1w�{���wa�6f3+�PO���!j&�f+�9U���c����{`���/zbR���G��iu�^c�3s���S#�m�3�o�]Z�1d�R��� �m�a��)����D�fq�j��5_�������Fm'Ek;aW-��
�1#j��L�oB���.����e^����ra�%sEnG��nR\���:���N�e���e#�����$��:���^=����M?�Mn{����x��o�l@�6����^N�I>�����!nFo:2K��3��e��@	�8����~�vi���R�qf�N>3��
m���eR�_r��"�Lo�/�A�u�B����G$�Dl��=��������T�(������fT9�Ba���W�)��S�H��v�';������������Vfv�!�~��+zq3�-���p�JYAn�$8������\i1i�f���Y]�y��v�G��R�9��T���(�o�=�rV�����8Ys���bC���HU������X�|.�@��pv��
.���Ky
��.k���'��w"*��"����v������3�i���"�HpI\v�����og��X>p.�������gD|����_M�:�^p+b��R����1 R�;�l{��K���f���O��!��p���
�C��2�]74U�
<M`[��Gn�Ek)���m���S��l��n� s���
������,��0�h��<�n�N�B��(��Q�2;��;3;�i�S��%;���������L)�.����8!�crkyj�8!��C�64C�E�,�W���b��G�}.��g��)�o����JD�����KWX�v��fHV�*�=I9-�j�#���J��\��2�"e�Wj��)��WA��kgJ
���S�sP;S{��/TC�T��+g��K���}Z��'^��nG���U����G�f|��;��_�7��4�����������1�b�<ry�R�`�4�w��y�L�S���+U���(������
�*�QZ������6�j�G�+�_�
�SC�]�}��b,��u)�����sb���������������PY���eBK5=�>�O��b���J��
���K�\�6�Q,5���z�����*Q�Ul"�j���rc�������=#��S����sk�
��;Z(}�o������j{H~���F7������"����;T����^��=�7�Gz�D(n�?��%f��I^+�Bp�l��	&u*���2
U<��&h6���"=���5�H�:Pa�FU	�nD+�hf&Lu�3+~Q���/�g��
����%�`�~�:u���Z�+��BN�n�+���T�6�V�u��U�M��[	��;P}�����x_��_=��'�j��AA�����{w�;h���;H��������T`N4��s�<������B�f��X�W���������
A������M���&4�K�������/�^~8����+r}#��Z�u�{$��A�.��x����S�t�.;�l�#B5��BVq�o0�h�N��
��j����f�c�yU��^�_^l�	}����g�|6�^�~��(�"~�����Rx[��Y"�^���av��W���3N��P"�I�OJ�e)]�g5���=�g����%UJQ���NQ��1��Wi{^���k��S���^=4������_���\j#��U�!H�k{���k�eO��fv��b���dgf/6R����9eT*jv�0b p�a|�K@��M���,�/�r������|��!}h�"������:��51-��7���m�o[��H������$�2r����P�DnY�f��b)'M�;L�_�K�Wh;_�l�;=|Hh=�2�2.g��E�3��d��]�8�����5�}���-��7�/����Yl*j��9�U���������G���NA����]tX�����ko$����f����&�j1z�F�Ji�k��*h�s��X���������;�=_��'iC�l	��	���h�o=� ����k�$���k�����d�������M�f1�������>������-�)R6s�E�K[)p����L�%+�G�m������k�(��<�G}\�S�hM)QT.=&����������HVn��x�"����D����R	�k��������E�{�J]-e��:�~����g_�g�L��R���S6�����#��(����41u"
$���cq��A��'��yy
-3=F����-i���h�Q0�x�������qH�k���}z.�z�O?��y���H����H����	��Z'�4���
�~/��S�����G�H�ug<�R|�������DMgC������R��Y�'�z����YS
&�P
��+��mm\��6$�7}t�����L�:!��)7����b�b8'G�3`��E�$���&��f�
s�\�9�h���!���}(nn��)�!j�|���(���!�EIGU��L��O�dm���fs��_s�y��1��I�z(D[�SP��wc�,��
����WwE��Zp����]_�`���K��P�
�����P+�Wi�#�]�Zm\E����e��Z�bZ'd"�\�zm���0�����0}:�p$�q]�N�K�'�)���a������@����RUi.�J.�Q�>��:H�q��I�Q��0�t����mu��[VJ%Aq,Em(��3#���@�&I�F;�R
���I�>����_���������m�U���%D�������	V�*��
���IPi)l�t�r�fj=�3T���
/c�/���g�]|�RY!I!����7;�"LCf*�cqd9�6�{�i��P
>������zS�v�������������vr�x�m)�i���+��=e�R!�h�{�
o���eZ��@G�����m�����j5d�qA�`�fE�I��O�
�!9
���A�O��M�?�}��u!\R�R�N����R�r`�����x��������Y�b���h-���N���\���zF�r������$�"7���*46�V�����wu���o/���Nq�i������'��d>(���<U;����i�DW���|E)��M���B����A���5�Y����_���A\�XQ���������[	�.�X����B����8���S���(�J�'��+�N#�kR����q���o�9��]U]4d����N-��?[�u�4�gA7��*�P+�o��~\��o
~��,u��S��TdQ��.8Mn�x�w�7����i����u�]9���%�n�EEu�Re�*��~ER\%����,:}��r�����M��"��u'wX��B��.�L�S'%�_�nO�EX�
c��d���}�	�?���I�|�{
��e�f��z�U����@P�/`��t�f��yV-�CV�8/����4V<~q�u�x��M=��_jb�,�4��
�za����-)����(Y��NZqXd5��]�_P����Ue���/W/"��0��n��S��/UW�v��<9`�%vN��yC��@5��$d���T�H�L��r����#(����kbLV���L�cn�BR�d�,V���jV�8M��^�S���UP��^3��Z��m�����xA���|�������d��Z��*�4\���um)�g^r��3W���:�tVo��el��U��I��!\��
q��Q�}P�
���,9�&����%M�������%}��#����\����}��mt���?�k��K_s�]JS-�>_���RD2���VT"vp��M5H>�<!E���	JE����J�[�$a��]WIe�y�����,p��u@���-���fJ��fBD��������R(���� c-�p���F��^�C
�vz~J��NA���>
���.P���*����FT��;u��
� _�YXC��#�����Z��x����a��j�ER���k'<�)����e�n�������z��^��Q���~�I8��B"��TA����$�:|�Ft�(�<���ls�v��<T�krP�i����O���o�;S
�K0I�;o%�u�kDj��<~�$��t��Sg.��F�v
g�+P��������B�[�q��,�A8�J���T���C	���q�{�.�Y:Z16~
�]y�0_�rX��V� ��5Wt��v�9d�5X���xHy�<O2��H<�����Nw���~
���?�ad��+���~B
����E��w#�����
���6A�/�����P�l
����0}�
S�^���\����(9���_f��e�l�,��J����J0��b���C�W�Q�;DW����[YEP��	����7> ���_���ML����`AFA�Vp��{CB���7�kS\��4���S���/��n��~mGlE=n��}N���v:i>�Tx�=
���-|u���t.+l�H��/;��� o^3��������N��l��'�������8r�%Q��L�q�4�"�=�h��d��
��D��2x�J����PX��d|�er�A�.�P�O�4|��w)��j�U>`x^^�\.
&S`�h�����
�2k�1�7��u�W�ag�w�i�����og�/�$������������,p
�$������S7��nZ�B�j��)URZV�2����T��EG�kFTB`���?����+�U��h�����m����'!w.Z
��g��t]���DL�D��[���1�TA�P�x��Zu�+���B�i&73����i�������	�993�BT��2P�*�T5�P��1��N��5.2t-7��6�o�_,���f�Yt���	:d����6^�8�Z�|�����/P�j���������t������NQ���H�#-��C��:���6S��O��������*k'�<��������,	���?�
�7{���0L@�{F���-'>��>���y�����=������U���r";��\M�wp9���v��B�t���d`�0~M/�0�P��M��hO���d�����;������m��&�yOL
h�C�"rdX�
5���s��XV��:�����'7��Cv�Y:�q�HU6y�U�+��,zN_^1��>��v���sx����Y^)`��)r�${�Q�^n�i�"��Z��@u-jt.E��x��f�,�W�+��,��p�9�
��$o���3��>Z��ev:��-��
>����K�|��Z�$r�������%��=v�dR����u�����(����2���np��	{����7����E;���^�
�avU�7�u�/n�'x:Ncf�zd��bo����#O�,wC����E�Hb�	�Cr|���)�:}����I�����4b!��#U����N��*0

�l;D���;��)�ED���'gJ�)"N(�v�$���e���^�Zbzq�	�O���)�
��]}?��������R>����l���6��H��hbW���|M0J:��v��L�\s�Gw�x&���(�SV�u���a��r��[�xO��@p�Wj�c9}s����G�u���W�a��S��lJ�aiF�(?3���+�1=�����	�[8�;��|��v�mHL�)���	�'�_��Zi-H�����������[���I�P�lOnjn�e��^��������k���E���d�Pgv�ET
�JJx\��3��lJR��KK�
�,�A[
�{�^��{����k������I����/n5����v�����W����Gm��������-�W+M��E);�z�^���Ze����#�.u��Am[1 $��O�=�J��7R$�����m��|C���0���P���8��M�>��	����L������?ZH����+���-~Ji ������M�.��Y�I&ZW%�)��zC��
�A���������{�"nn���=����~��d;l��R������v���I)�RB��]K[L�
zy��$��J�������|N3��I�R�Mo�o�B��I��ass�$�5�r�[_�<[��#K���K�s)�� ���2_Y�0�������n>�\�aBX� ���&xr�&��%L^ ]�1#X%��\'�haf�d31����^Y�������^1??����i�7�����y\)�Wz� d����!&��m��_����
[� �G����*]��K�IUQ3��_AO�������fD�+�+�_�m�{��Jc�	�U��q(��JN�S}�bS�
����g#���mO��+�yl.�zy��^p.������'���T����
U��z��M�� ��>�k�R��fN�U5.ABm��1�������:�uu����F���pV~B]��A�S&�Gj����)^Mv������7���4%�B�l�����k{V�11{�2<�!�6et�!�A;��a
��7���������*+��(�:�r��P4n�,�|]x���3�}�R~�k��
��^�w-�J��~}yqc)Q����1��!�'����I4��q�7�8�	��������o�@S�8,�)A|#i��B�6y���Y~��l����?���k��p�m��L$a��eG"=����.���F����*��M��t+4����vsh���qy9dR����L��9��)�:��=i�L�v��Q����W�hF����:���h5�E�z�x�X��U�����-&�
c��&�om��.�)�}���������I����xN,iPM�
�Gj����c�<^���cB)I�^�T�����z4�~�3H�~;uH��4�DR�A�xS�HF���?�@���H0OUe��V!,����*Q}]�u��G��r�����&����{����p�����X�@h�$��C�jh���3V��5��mB�8-��Vf�'P�CR�x�>�&72�R�y���3��J��N��f����"ST�B(�[jR��y��N������\��v���"bL1g/9���f���vt�~�U��+s��0V�����(z)QuR���@w��/�N�(*8Y���c�0GZ�����q��5._��W+����
���%1{���L���9Q��oN��]��}V�[�Y�K)m~�#TH��u���?,R��f(�alK;u�s��BR�9LH�����Z���L�Q�bTSu%��fct��8m���R����J����bE$�QG"�%���r95��l]��J?����8IQ�oyA*����K[c07a�U_QN�2�R��5��Zx��Xw���(��?��������'N�b�����M����p�\�U	����}�����vC+��#�g��w���ZEG���������#������]-Z-�E��Vq�I��T6P�rY��YA��H{��Q����������\��X����?am
>*l_{�1y�	��!Lj������G$�2;	��zg�8(�R��U�*���4�\�b�n=�206Z��r������|O�
��g���r�������l6����x8���"�#�$J���m(����rA�n�c	!��NLE�\[�(0�d�����H��FF���2��I8{A�ob�E$��C��\:OP#�p��g���])�c�ET.�Ue)(I��2�u�����yso��
��Js�����h�~�����x���-��^c��(D�\>
)�.��u@�3�v(�[&�Q�P��a�6�
���+?��!��%c�������b�C.�K�U{����W�DZi>'�y��_�����
n�'���)�Crn��/*�!��&���z%��J���6^aL%)rB�'�t
�vI��Ix(�]�U���h].�$n}�n3��e3����-�d����Q�F�L��M��kXw�@���j�Qe��I��0��@�8��	 2]�Duq=�������>n�� [X�E�����j��c���5�IJ�)����
����F�?QHG4|��,�2�p�//y��3HbJ���M��5����l���~�,M�m<m�7}�2�Ui,��i����K=����H�u�km�������h���V���^��/}/a���2>����b���<�^�#e�ds�q�q�
���������L
��X%��G�)���R(��Xj����������CD/
��V�"o�\a�!�Ftr�*5��U�p��VOW�^�-	k��j�Cw����#t��"��s$��<;Dg'VK������'�X%�������w-��=
�8����cV�{������U�c�a�o��(-�f��n*z�.������Y�^����+����������<T�w��1)b���������Rt��V�zH�Yhg�����\��_����V$+��%���9�WF���:��������u��vD������_j�F������r���9�q�����i���[�J��1%J*��=�C�j<����,i��4�v�*�x����L�����|W�A$���������F�[���1�$���{�������$�N8�I����0� �-H��V"��������=����&��
�y�Wr�*�~Rw*~��	]��5�]�����=3\���9����RB�sd�E�bx~��Y�&h�0<���g�eGS���G����������`�G�����Z�D0M������������l������8�l�9�q�V��`&C�Q'
��J�ZC��V�-�����t�����kx �#=�65b*}A�Vb���!��k'�������?�W�{�}�J�1
���,L��H��_�J[�F �����i*-����+�[e�M�@����
�9���M�
B������e��!��)��v���
kV'��%S��K��Y��l���]�e%�5��(V�x�#��M��������v����Yo�ME$T|���X�4��YH�_nI��������(�F��QRcKI-�u5���np�8�S�����1�t����}���n������l�(�h��]�Q T������E�u�J�p.r���NC��>=	��#�$�#c�z��I_�}����D�_��U��h�M���Q�>�l%�3��H�p���t)�QB�DuS��:��j���.?��W�p���_���W7�s���6%K1�����7N��/��������i/3�,@
[�n��1�����~u�\#.�>!�����������!��r�x��~���A��aV��l���
y���]�E�eW��R�B����Y��(AK;�a�s}ApLk�5�M�T`��#r�:v�p�?�e��=�����^��"�u���-��;�q��-��N3��m�q�����	W���i3�n>P��o�����&���}����C?^���I��6&��?�H\GH�Gy��4�AwT	�)�
R}���=��p�����y����>"K�=�s}����N��F�X9P���e�hE������${��\���i��<����o[�o�)6�O9fq)�fz%���!�(K��V����q�K��E���X������}�wkm��~SrCf
���d����k�A~O��N�IW9A�eE�	(���/Z�,���J��RD����E4�x��?S^�+v�9���������yWz%�����T��d�����f�
�8����HQH����)f1�\�D~K��R@��u"��D�>g�*����R��P~�/T|O�r3�Z+�����h�~Xb�=[���R���R1i��+VE�_n<F�Mj����/Vg&u�]:��X�OB�w��|v���Fr	�X8h7kzQ��sLB�G���=hM���+��������'~E��V#}a�}���/�+��y��a>?%�_YP/X<��7��l"�m�Ps"Q���e��\�A.

Q7�%_s\���Q���R�� �JT�Hk���G��M��[�{�
mo2���M��^���F2"�;�H��?<e����M��_����y�+�cq�&E����.���^N�"-.Z�}�D����r�.�+B�������PDco�F*|�FA`{J�o���R;�4E�T]`��B�ud/�H
��u!i��.Q����g�q��U����������P�$�{HR�7�,Z`
��ff/^7��ffW,����=�7��??1&���3�������G;`�@�<�$���O~��������C���7~~���+����w�,�V��
���]��E��KxX����i�v���O��W�z�!N����)���h�����/`�@��DH����]����O�����_@�#�g��Mii
�c���y��g��Y'�T�S<�Ry4��nV�D{w�kQ����K�������uiA����nQe���b��wL��%���J=8�%��"�*��{�i1��v��	�8�W��4�,'�>���Nq
�`����V��7Cq��&�T=Pi1�{{ �~�}��dr�"cSw�?�MfZ��r*�*���e��K���?�z��<�w�2���;���Q���������"��
 
������bF`�f���������B��[����1��f�"�_��i�����pt����d0(�%v�_����p����>AY��t�$����t��e���)�O@��8�([���e{M�"�	���Q���#�����'":
��	C"��U*��	�_3����mH��b������q-��Q���{��8�6eR���%��x�$��v@�0�����-�LH)�	N$
:�f�s�J~�M�s1j=�| 	��]�]X����-����+2%4�X��?�aV
���:�
��;���o�"7A�0�V@���q����
��S����W��}����zD�M��F�����0�V���
�����}X(�M���O���*��i�@���V����%J���N��+�(��� S>����7�[����:�a#q�!i�i� �����X��Wk�������*��:~E����n|0^�y��0>��� r�qN����Y�T��6>��"��$	H*A��`��i*�&�4��K"�������LP�e��r��w����T���S<B,���7y�46vXB�������������9@��]����U�]������(����%�"�7������MX����t0����s?�&��0B��	t��:��L�����:�h����y�����9k����X�������oc�8(~M�9}.-��m��0�����q2���6�[����GL�#l��AOv�Li�fj�(�!2�1=�q@6�8H�[���.�fw1�V�&����(2!��S����DlMc���q�^�YE��M�o3�z�;��.5c?Y2oA ~�m��[��.��<y����8
 }'����)���#�D���������8Ab�{����#yvg���s�5?> >��1��b���i�!2����A�;
L��Y������S���?7L�4O�*��q����f>-9��`�5z{�u�d6��]�6la��	vo�=a+SX(������v�*��
��!q=��=w0z`9��Q$@��`�pw�{�h�s���$�LI~�g.W6���o8=S��m��=��^���S7$�{X�"�IH���t!	M����2��=���!�h���E@�]Y��`&0�lqP��x}xQ�����W��f�<S����������b������u�!�i��x��,G��o]v�%sv[�H������V�J�=/�A:���f��[��4f��a�C����8
 ��A��&�ghSR���P�kN�sB^�n|�
|�Y��/�O��EM�I�������i���l����T������������k�d�<
�l;�i�W�1M����E1<S��cJ�.�N�����F�q=�� �L�o`�WO�_�S��x�]:���VT971T-q`F�S%T�Q��h�6���)z��b�����?d�OCI/�� ��_���l*}>'�&C�	 �����>��lv��Vh�������_h5j�c,�A���a��q0mv�}�2C����+��F=0�!���a`�
 ��k�U(q��A�Fiw~��ExV��W7$� ��.�S�� ��4NtP��k'F�����@�J������Ei�m�����DFG"��������5��H��C�Vc<��O���C2D���4
 g.|_w�v��%d�����~��8��~6�<�����+9b�8'�"EM���8��~�:�,��8���I�����K�p��:v"��3l"3[D|�u��lX!@`0jH�y�.<k�:C(~�xz�s�M�-�����������Lig0j�a7�5Nqb�����!�k�,��&x������wCt���}���q��>Z�q��^�Zw��� a�������9dL��Q!e�A��7|����)$�'��kN3O����{�7bh��}B��a(������	�����W�F��m{�����C�Md�<��chA��1X3��}����:w����gs
	���:&��
C��) C��
;� z��~��!�+�2h���y�_�Q�B�y��}m{'�����CL��X@`�0
��f�{�T@&
�G�g���+p�
��g���+pF���,E�d����q��qX#��}�|��Rl���l�����3DDg�1��a��������_���B���e{]��nG��;
����c<f���2
��,�f\x52�t��y���L���c�Iq,��q��
gv��y�;��W{���!��\�{M�����3d�������dY'BF���!EI��(�d��}�3/�����
�\z��OQ��>@fyG�C�Y�a��c�(�43@T��%AL\��_w$�A">���"U�Q�H�P��)i�b�{��d�7�3��qc�k������
��V
�@���0��5����!H�w0�"��<�7fH��]����$8���p6G�N���w���_�f
�{������(X�T�D��x��"��DL��9�A�*F��r�������!��&K���3�����*jaXc�}	�Oa	����8*Xy����GJ�Z,@�����H����-�������]P���A�bW���6�
�<G���'H^Oeg���
��W?:�gI��a~;�1E^'
VQ�Zc��:.�g�W� j;
"��D���2�9G^�W

!���=�d�����Q��E3�} @��F�j�6q��
�)���J9!�1E,j���5�*
k�)$b1�
{����x��f����K_�>�pC`d�����Q
(��$��d�y�%H�Y�
��fr�14�����gVb��X\Pp/�d|�Q����k���U$c1aC�	Q^J��
��)o�XFy�eF��bv�<�t�F�s�������pf�����F�s�0���� ��h��J���!B4�?:�+�j�����4@�^WH�G��!��DH�k�N��HUh���kx\��R
iD���"d�/�L!+���N�E�O��5|�QN���Qnx5q�>���$� pa`q{�u��#D�} �����i� W��I��V�od������)�cN���2�����8d�B�6i���wG�p�T@��~4�q�6q�p�z�N8�|:R���g���Cd���sP�J��;����3�
^pQ������GH�9���n�
�aY8d��l__G
�F������)I�78G$: 8����atQ������� ��j
QIx�QyE�������5�
r�`"�� ��D�d2$n��}mF�63�_�Z�WX3��b�O��(���H!�
�^�N#�bx+�����{�
`F����QK���������!�XT����0
�K0���L!~=q�
4|���&5�_�1���7�^�j( &�#�R���e8$��<j��N/��D�A����$b(��2$��qAi�l�K_[��p�s�������y�$����!.�h\5O�d����8j]���Qq���~�(��c�z�l{�!S|�(�=���M{�3��uRll">f������Y`gi��
�c�2���9�~vs�{�'�D|�cI��<J$��p6G����M�R��0�b f;b����6�q�b��czzr��p�f��i
��"�pf5�l����mT�&�F�&�L%H��Kws�������Vi$�����1� SL���P�n���3d��z����0pf�:��.��i�L��`:}�^S�\��;�G
q�6a�-��k1d�3${C�(���  ��'x����)��i5O���9bh�9�&�(�n��
����	v�����{������>2�"�D����a�����1�}@\@�tf.q��l���2p��'4��T���
�W�(�r���W�z�����!t�>6��a���
����H
!�������!�<����t;\�����;K3��,���-�e&qf�y0
j���@4�,	c#�g�{#�,����y�1BCI�{M����g6����$�F����B�I����VHz�����F���I��"��$��4��
�:&���u���5A��-��ClZ#@��3�$����r�0�U�����!i���!bH�8K��P�9I���C���
$b1��8\�:�C��4m> "Z�5�����) ���Q.R�4;I2���8�m���J��������F��{�GA������jql3o��
�B;�W�(m8�W����C4���
�@:���C�D�@����Mf�  S�$��� qT���G�:H���9$J����bl�����K�b���� ��4�}(!S���qE�"F�;���P�K��G�EEHZ�g�x����!��0�di S���-R��8���@���@��8��`g���!q�#�#�L�$���3��D[!)"f���D��Q1��������5���2xI�8����8F�E�H�yT�!�(��3�@D��;KVm5�H1C���&�A"��Q�:��t_�^��es��(g�T�A@���3!��(�Y����Q`��{4*j�h�!B@�]SD=�[�^�t�=�
_��8�3Ld��Q.B`f�|��y��Ybb���@l��fI� �
�T����)<JEH�y����Mx�@�7����t_2D$���Q�
��!%� �%�* �Qq_y��
�y�^]�5���E
	6Cf��Q��7.�4H�����"�WCu%�4�G�H�yf�$Xd��E�`3$�����H�*�tk#�$R'��!y���	����t�V�~+����l��&��(}L6���OX�%�Q����=�����endstream
endobj
25 0 obj
19373
endobj
37 0 obj
<</Length 38 0 R/Filter /FlateDecode>>
stream
x��}�������_qwC�)�x�gUc�������3I�d�US�L���?2�;$"�_��2�����(y�������?UgS���������?5���/������d���~��������vlk�����}����Ow��g���?�����}�xn������o�����g���O/n�[Ms��C}z���Y��9u�y��4��������������;���_�/���L�����=����al�?�~�q�����<t�O/���d��������v���������s3��;w����N/>?{�����gc�����jj���,�����b[������'�m�����?�g�q����6����G������%�_����x�?�����S���y�H��5�m�Y3��������7�Awd���L��}��8����S�L��O��f]����H��_�����c?���Z�H7��^��d�_$�p��x~�6(�>u=}M&t<�
�P��hN�_,����f���l>���td�	O���?��Y�0�t���������1��,�����k}n�����n�K6?���j#������s'����1l��uv����S��j�����5~)<F������R�a��'oF��~��h
m���[��Z�Z7���y����M5.;��C��=.�����}��}"j��UU��.(���h0)9'v�kk���1����7����;_�7�<��~Z
}�?n���p����FOX���������+;�
���������g���
iV03V1��;���E-�df��vR!Rz�����u����*��kp����������bQ���q����]u�=x|���#y,��A�[��Q���0o��v�o��g_9� ����lt����=<h����]�P��������p�����v<4.Q�9�+k�����y����Od5��"�!��(���Ol���e��j�.��xtI�+�52�T��%�rW}$������w7��
�Z�BG_!�2Uc�G����$,�t�U��Z,�}��'2�������A���1����9���)����Cl��d5�[o���s����~AV���5?K�Zv�z�RVk������'���-�y^/�\0��t�c8r���K����]�oA���cl�a����5�[��K�,r[�
����a;6���Ui�uo�
����,��{h�!�C���~#��7f�.��f>�'6��@�=<`����L����DkO�����/��h�]E�z1.�����-)�8E�*�,|e��9�{,������8F���9@�A�����X�����2�vq�������������x����u38� |$���|�<�M{�#�(V��~8�}�}\Y�������o]�j��+�����9a��":wt21�}�l�_�p>��47�#
����
�0�\�����0(8�	]���M������������+?��bh�=���V����?�jX-J�����.A01ra�5s���.zZ*����i��y�f�>D����m����I�(���G�!�:z�)�q�����Kl8��&�����0Xpg�c�[g�NE��P�Fz�<�z��w��Pi
��c/������d�{��y88�R�}!�����e�*��E.?�.���#�z�=��[�e������]qe�4]�R+�i��Nt[x�sx����@+��]�����]���a�yo�	o��Msi~�(�
���
��l���^��(�$y��7���Hn}C�I>��M��^���$'�F��h|�
������s������}$����3B���,�n�����0!-&#-�;��:�1�xZ`J��|'P~�
5�3��;Bp�ya\'��E����c����+LA�w#��u�aG�������b���Ns�����|��2��Ll����3�'f~�C�)�%�ujkk�e\:��w9DZx����y6)��(��|F����{�"r��wsY���/[���
9T��p��6���<����'o�����O��������=�0����5j�q�{+=!��S%��c�����p�-�	�����v���Yp�"I�U�o���d@W�!?���[U���n��L��<�9�U�7�K%
g.Y�g���o�������7��<����%����:���+R>�&��?|��u`������g�E�yl?����C��Hl�����������y����W���P�]cAE`����?Q�?!�w�8M<�p	��F��c�q���Do��w��R'��:�T_u%gj}�L(���|B&<c`Y[����x�8/��P�($���5)����D�7�Ut��FTr����!���0������h���H���V�x�Pi&�t���_���[�05��+���p�"F�b��Fy�����.L��H�l\=$gu���p�0i�1P�G���I���"��h�6M/6�JT��JF�Z������?�����$L��FLJ���V��P?���8z
��a�c�e��b��iR�7$�68$�#�~����T�;�0P���!�(�
���:|\�>� ��%� �O�=3;���t�2�`��r�o)����gV����C�����=����Q�+���s8&���>C����nD��?]F	���1^�?�[b����|<�lJa���d�,�o���j���r�
�����-q��
��U~'��o&%1��).I��?�a�/�������B��^\��w3n����.EbJ����7+T�G`�M���<�O���������+�bE���c���r��x����]��p����:zx`&@FI`�yA��!���$�� �0�]�*�>����������.y�������w"��J���P	����dH2rm-NJ�(���X��C(�G��]����G�FHn��NMbk����
U<�T4!��-��F����~`Jt!�y���l�^��4#0����"���p�a���6���&i�H�����Y4ut�����B�����)�2B���[��LD�E03p�����<�*�fE��@�
'� �}����������@�o����jr���xRp!b'����G���^�����,:D�?/}����4�%!�,�6q��!��}��,������c�3qH%��-���]��!�>ay����X+\ �j`�� pB���I,S�SV\�P 	�3Q����	e;8lj0R�'��C(�Z��l��+�w��^��;2�N���J 8��E��&^���o-����� m�S:����p�'�1������"
��+�����V+�c%���m
�����y�{j^&i4���+:D�Srg���7p�������&*�4..�w��b6aD�h_�A���]��Y��1�th-o����{1��;�%L����*��J��
G7�*]������������{A\B�zaeY�K�;��0s^1���#Nd�n�n�}C'��I2/��O"D6��S�����`n�����{���ky��E��Ix�Y�}�`�����W��t��u�{}�d;�h�N��i\w���2	d�����m?��O�x,�=��(V��<?���.������uc��M��o(t�0����qQ(��BL��S���t6�mKCD���N�`j7U��U
v~���D���+��n�`�E5m���N�����k��%;�4X=��5��X�'g�x�BC)���JQ��$����3w�Sv4n!�Nk�q���'��`��|�rLL���y�3�B�X�8BC�>5 ��[u�s=����Fi����S����0�)���|�cjR�Z0���*��!F�E��6��G	�,>�l�
� $HEC��b�7d�x�R6�����k��g���O$,(���H�0(���cD�s���������-��$Bdd��s_Q���L"��2!4�0�1"q�W���������U�1
�;�8����V4}�X������Yv���(���Y~����`�|�N�b�2�&L@���n^�+y�~b�a.��o$����C~��@��'�l��UM��1�
Z��eZ��M5��cyo�Q('����"���O�5�e���32�n��J�U��5d3����0`4�3Il�:�����L�O�>�je�h��+��:��'��"Sa�GCe�]!*4-��'��0UY2�?�>�<p}�u�����;�����G�y�F��xu����=>H���ZM�VB�Z�:��U:ME�+�<��w'�4�f/��]K?���;��t�k��z	�1+�/������g@[�(PJ�hF<.������*�������w�\�{����>.Uq�V�>4���/�7?�9b=����������i�T�uf;><r�jr�-�.(�Z3M�<�8V���P��5�|��F�W��ATwC�Y���`�����dU]�r����p����H*����%���.K���OR�T���"B��@)�a[g/Rq��[����;�-��#������,'�kxS�b�s��f��Z����W8��z~��a.[�2���6j�l^y���eIi����bO/
������lK�h�@�:��5�����y��V8	��Jk���3�
�6���5��`y(1p���e_���df�����vS����Lq�14R��[Rsy�ZN��w��Qk��F\�����~�O%��
uUj����z^��t�hA���
���d����o�;
'�D D(�����I���V�H��k��� �f�F��|*����5F{�-yItq�D���tK���'A�������M�����i��0vS���d^]NU .��^���'3A��+2���|}�!O�z�|�p����C|=�����y}���h�|�F�mp`�����R�����_�����>�SCm��������o������������NJ,��~�N��x��f>rL�@eQ	������L�'k6�����������-��N��Y��e�(.���g��h���+����g���_���f�����������?�]�R�iT�����;J(\)������t��
�
������vc��]��E���^�Q��<���%�Et?vb[�^eRHH.��m��d��q�{t1�*t�}4/�iq����Sm���T@g^��,S�sK%��d���W�l�B�+H4�(���g���s ih^��M�vi�Y�{����J�����~]�f��F�[�T�>�����;�ix?������%k��L�x�%�����D����>�@fH�`��W@�1�AP���F�������e�%;! ���vd���B__����c���o��*�<�E��*T�����]���$B�\���\\I�\��S:�bq���N��"�k�r��\ ����QH���0��^���x���\��jZR�
��6���S��]�:{	��G������ ��^��A�		�%���)cD@h�S�A�����@����6�*�d��D��Wq�������x@��"�9���u�5\;	��c�E���.��Zf�����]�����D�I%��7���SB)���q�Wi>�]X���hBt����ug��2��9'b)�azZ����R�:x
��F�������82���h�Jz����!�<��Di5����o@Mv����H����oJt�7��'�����#k��[����e�k�O�kD%�9]�����N�I�v�I0�M��l���u������p��r��=(���>�����B��P�9�ZV�����_����U����|�0d����?W�(D�p�HU5��W��"6��UuG���XD��QQz��(���a2R!��Hy�i�Y��@���@��%�:��;��qp�����6Pi�2�5��%��$�r��G���h��QN�m.�?����Y#+���Q���<�*�����d>!��?��fw@����	��8Ck.��e-#���nB	���W��9����	d��&o����gE��y��5B~R���}H��<���g��|�W���~�t��L��o/i���6�����H��i�s�����3�����*Y�J>uy�Ux\��_��R�����x������m�#
I#d��2�M�?�<fF�����A���S?����B�^��$��t���ZB]�E�H�=�J9���<&2`��o�N5���G�P(��;�����>;R4wH_�(}
Hf������J������XK��S��21��F�iHV�DI��pK�e�Y9�3�fd�i������>��Gq���1B����:�oGB��ynn?w����"�����jg��/J��U�?��Jw=�������L�|	'���:2!��U,��Ab'�b��/�`�#��>O>��Eq�O��6Hf���zK�=sY�aW�����@���foru�~��7���d^�}%4�
���������F������c��x�{��2��\5��Ov@� �7���=�P6��#�h7������'~��=�s���w>�9�&�D�A��!��JUBvZC���F��ky2Z��j;��p�����U�,��o�����4�C29- �#�����4�:�z4D�t���a�5���(x��9(������O/��
�����q��P����(Iv��lq�3UC���zE���H!>R&�js �E��g��d:l�=O�L��:!Zg���f+1������A�|�Q@,�03%R�<���Q� ���P���b�E�|�9�L�Du�� 
���3r`�#��2���r��0�F0`��7��s��=����mmZ�\���x�
_����gB]�p
,4=*9<R��w�Q��O�ve���Q���*6�
X�	�������
lC�<�������?5pQ��we"��/!1�] ��Zc_�%D`yC��q:��0��H]Qb���*��,�_VQ�F�Z���/�_9�S�4��N������P�`�$�Q�c_���2�[+����\\���-����/���U��F��@ ���?�i<�5C���8��~
s��u��]��c��3>�p�����i}d�
���?n�n}v�b\����ol|M����w�/NwO�	�������i�v
K����m����
�T��M�
�nw��c)���Lor7i����=�A��d�|���Q��+4�
M]-^������b�I+��y�
��
�`��w~[�W�Q���z�*j���������p�0:�O����,�
xK�l�j�,�!W��f��$���1�E��*V^�)���A��/���K�����^Y� l9�}���^�8��z|�I���Cu]L���+��X9��{rVT�����Zq�?���%D�z����C�"�b����9�i�6��h���k.���|mIB+4#�^���M�t0������s\J��������{^�Y���Y���=�'[Xd	�����NJU1K<�Hs�/BV�����w*��OTE�U������8�@���TO�'T�*.��
�v:��l"�g�C�l:�?Xj-@0B����L��S���@����H*�&.���H��_x��;��3����\5�9�}=�|�������;�B�q�[{z����p�_�������3n�������| ��C��������5Z/��=��p����\@!�"5���zp�U?��zBbo�2�����*I� \�L�.1�;�P���?p���1�zv	^v���?�(XV������[VJ&����C���ZK�D&iu��������F��*7�m6����R�>.�sA|^ �p�H�i|+���?r|P�^b��(�|Gr�!���������J������t�!��p����P��}u��@�4GDD�QX�htz�1�9!�%B_�Z{�gp���K�zy�����<wxSM�#
7���W��=�ZHL_{A(�>�wC3������$��p7�8%��MM[�	%����!22#2�S�NL6�k�h|�
��;H��&���A�*;��#L�;�rz[�i�[	�?�^�&Dx;hI�������t�H�^���ZmD�O^�Kz�}�z$��o��%���S�$�������46���-k{	�#����!s���"��=�3�O��v8�Tn���x�B�Y��
�moi����<0-���:�0^�]���������A����P%�=�Zx~��W�i����.nv�����}��2����2����7��6�>�������D�v��J����hWq}~�V�i%���1a���B&���k��k�
���W�V���_E�a�����Q\�� ����������-�L3����~6w�'�hSSLx(@�����e.a���������
h��B+(G�U�aE���V���A���1�pR���Rd�
Bp�����\�D�a~N�$� ��B��;^����B�\&��b�z7�U��=�]�R����#���j��j6]=��r�^�Q�+�dp%�dH�]�����f���r-�}=g%@h�_l�W���9KJ���L�����hK�����7�8[����$z�S��a)lQNI����������S���e������T�O��$��hj��7�d�v���{_Y7��0�����|��nl���<�F�v d����I��c�k���/���k���a��d�������\���k����s5�@3j�p�
��]�����h�A�5Uc>[�2�*V���������R��,�x��_P��H2T��m����uV��jH����7���i�!��N�n�!U���Wz>�.��������*�L��9��)G��"n�=��X	bTYp=6�(x{�����X?EyaV�h�������Y��lvL��6��7��*��-��������@�$������f���
s&�h���Ai� ;�����7�2G�M��u������m�LM}����T����,�P� Y�H�r��|%�U�c�� <��� W6��K���K7N�q�~b���o����Y�x_�����CT���Ym�0�"�@+`
��e�pmF�Y$��u���z��}�����/�a=3�z�"OW)�&��t&J��6^dO�|Wc'��Yk��D���Nt���������D��2Y�K]�k������R�a���]����	��ts��R 0Rp�U��{���j>��f��_��n
HO����sz�j$����ze���&��R^$o|E��7�$F������x?�?����;�PR�|EYPk�r\m����P
�	1�����@�hP�M���'�;���=��&["#z����u�q����5��Vq/pZ�����v��m���S��p�vf�RZF�����	9���@�Fw�s���)��F9���=���s������E����q�{R���1G����=Q�z��V�~B����s=#Z�f�����z �Q�o/�L�m����C�um�a�`��d����G�Z��%S�Oy�J�47�_X����`����
������b�B�~�</���,"�e(S����u�J�P���`D�
Z%��"��Y����K��M�p�t&���X�T�J������Z�i0q��#�������(4mB����������Sb)��0���C���Rmv$%��(��CY,o�3)�b0���r���h�d�
rv����w���;�V�%���Y�:�o�����
�i�"��+��?���W��<�XW�}<1�0�����H���b�p3�����#<t���n�&����qaPY-��=�����S7BC'`��a�'<Z�X�TP�,/m�$� A�!��jM�d-����-,��-G;�+Vsc��:w	M���K�3k�Y)�H-&�:y%�z���WB���n*h]��
��8���"F�sTp4 ���57$���Rhee�0+��s��]��65�����c"K���^��������SWf�����V�1�����R��������D>,S]7�q�m��e�N�-Ts����Q�fy�dqC3��l�Q���~������PS���V����n���t�����0'.�M�}nY"��{�����L�'����:0/8�~�Om$�e[/��{�
L�<�5�����=yU]����I:��xM�G��I�n���KQ]�Y�<O<�
P"(�L�
�����f����U�����
e����?d8]\��<�:�&>��`p�0~��r���))pyX=��u�2��K8�U�L#�����wF��S�&����+`����S��K����:�	=
���S��$�a�8(+����5���Z�=�ix��nnH����x�IJac����Qae��i�~j���l"b��b��[�l�Lt�J�k�������F��-3

��B�9�SP���rE�1M��G�=?_[�� �Ac�v�;��f}*_o�����Rh%�e�������s;���b8U]���\WCx�c[.o�:������OD���������^�GC��|T��XZf�Wr�b��?�L��7�\2�I���8'b�����C�H^�e�It4BS�����Y����+z�����vhC�!��C�|�D��Y�������5����`S^�U��w�����9�r��[����������-��awO�^�H#<���r�v�%��XD����S�%�Tn9_.�N�b
h���,��	y��.n@DZa[���cb�1p��v���;��/�O���D5o/2�e��Q���|y�������1�F�4��
g^lS��.gm����Vh�o���;c�k��w&����D�����
���W�GH�a$��(G�j$(�[YMa�w'�<��%�Ll��,)��L*b�!�p�G ����XB�a�s{�L�k��Xf��cb�����X����HY_��+
z���i �����q�X[�F�^�$Y�������
�~_��?������A���(e��"{��sU������A:��M�T���
��o�7U+$}�&���#��W��h��Mv����'��� )�~AHs'M�J��(�������|�B�~A"��J�U��zIZ��r�����_9Co�O�S������g��p���L:�z��h�|����G��:��A=��0w;�~L�Z<���7t�[=Wx�S���K&~�G�*v&�T��uIh5�~�k��ga��XYl��/�'S%Ir�#_o}7�����dZO��������'�#�8#��*��-�s���)A����W^��~��W6p�"_F+2���i�$���Xv�Z�Y�=���r�
A������T�g��R~�=:x3�����������)���B�w������zI�I*��IbPOv�|��`jDS�&�"LC���L�k�*G�yr)����1+��!C�������3���h�����nP����7�Q=h�l�����r�qt5��U���Q��*�Qi�����1�h-���Dk�`CkJ���*}�O���cCA�9�)!�O� ��S�����%1bsL����x��v/D
e���dY��@�n���}9�Q�s�R������H���?wM�0��PXD��{�|w�����Ox�].�����"Gl�B���A�O~�@J�R��)��(��]%�U�%������H��������8��#��'�mC��=��t;K/��f����Q�	����LW��{zp�\���r|B��?
��?�@<@��Q�:��RV
t-��B���q�@�t�|L�"�(��c|6�~��lv��o���%S>�/{�j5��e�,��������da�A��C
$7�Hr����D���q��Bf>����4�l���G�
|5�H��*��m���D�"��T9u�:�i4�yz|����3D.Uvml����v�����Xq{�}_��&C/V>�
�����!2���*��_�N9v�@��%o[�7Ju�E��~���T�M.�&*
rLs���7<m��v	n_�sG��>��6�@��|�I�}�(eC��P��#1�ue�o��t����+�2MwF

j��1�
tt�|�����8z����5)����g�Xwo*�s���Ai(���q���gK���.��;�C����5���$JeJ]��
��f���s;0m�J��{��j���t������"�����,H�
�F��Q�iW+��K�Z���U���b'	��-Or���8=�/"�Z"�.h��"�RE�r�^+�r�=	q�q�+Z"S	W��8d0��A�9�4�)��|M����k����X����<>�wwthh��S��8XD�f��L���gP���z��{}���6�bf��� �eH��r��V��3�4�1F��&8�W�0Z�{��}Z��8;Gj�Q��]�3�4�
�����q�l��Z9�G��>!�pDf��X�k����?����s���[��X�j�h �c�H4!�� g4"C��N��	
nU�3���A����@@����������n�#1���������������	�����p�X��P�N�=���P�Ul1�UR���b�k7�3itq�z8�!���Q����x��K��U���"��@�Ym��*����X���Y�y��vsG�����%Jf��ya-���\������&�<���\�a��5��z}pB��`��3M�9����+d��egV��r�`�CvC]�R|>��00H��j�V��G�<!)�������
�W��:�%l!!����D���X�N��t�����&g$zJ�{K^�Mo�c1���j��%P������oA�)|h����T�R`Le��z����
>���� }��lui�hC`�)|�e9>�B�iq[�-V�m�0�z:�A�S{��=hw��c����%�3��u���d�����jW��>����WH����g���JH���Y�� K��D�B�VdI����^����L$} '��1k���������O+��
���	BM>�{�F�D3���-�; Xj��dBY���#��5
��`��R�[�D�_��U~n��O�����Q��C"���g�}���>!�
+�gf�<���]���^#�	�~�vc
�����*������L�	��R5Lqb������X�����?� xh$#��IV��|b_v�*f���Xy�x�N���{0W����r
�$��EP���U��x���l[$��^QG�^�&�E>�'"�c# 8m(4Ww����1�SB��,���O����V�^.8�z�T�QT1@k1[����Ec0k�
�2��ZYj��m��-�R��l3�vB�����'��;����8����!�A���!�YOT�j~����~Vb'"5<1�D��	�t���h%'�X�w�H�7�c����j�4���}�dCN�����0�����]{
H��r�j#����*'����g3�b���1[v�����%sC�(u�����#< ���A��'�$W��xh��3�Tp���I�6.~�p����<�ZVJ�d,U>F$����]��3�:+�L�QY����I��*�zs���WP���!�D�|�_ L��k�7������*�D�\���O�/���<
J�
L1%� \��X������&-}b�RY��gK�V��������1Z�c���.���_�wY��l��N�=G�j�,���%}�E��M�gK�Z����Ce����@d\�m����b�!�c�|\2Je�\�|��E��:��{�Dp'rTw���V�7U~��~G,�d�KbT�����V`�*"V�s��R�A�6j��F#y���J[�K�x�l����T��#�1wF	J�!�<Q���Qu�AoObgW����@)����,�ag��T�9&��@����c�y�Z��H�ek���O�K�@��T��]�`��TA���d�=�����f2������ C>��s�J .����?��FE��/n���j��,!c,���6�#uB���VL�
���"qS�9�
��=v����p0�^��vV#�.�2��#k ��kn�rKMf�Q����YAx����;��|�������"�
R,��9y0������W�qu��\S�L�
(�S����8/���=qnEX��Z_szA59�O�7��}���y4W9^?3s�Un���r
���T�P�WY�������FT>>���d����z$)����Mu�����C���,��I���>	
����f����'Xubpa����K�u�Qn� �|V�qD��]+�`�j����
S*�]��Z���B��|�����F�m�������FO���U�=����U���li���l	����[����[���"�-l����\ S�`h����~���L����!^i=b��U	@e���
��z��`_},��zooM��"+�@w��3���U?�%������_t��@�H]�'����s�(��sa`������(k�q�&�$����E��������������s��<�O�<g�7���*&�
�hqj���kH0�S[LjH��bt�(Gv���G����D�]��G6�`c�c���g32mGI1���a���������,
e{A��2�a"�X��IMih
>]���(�_��>���QM��cmE�v��l+b<6��4�e:@W�"����=U'��x�[�������1���C?�����]���# ����qK�����n�������R���<�<1��l1�
�\���R�����O�^���l�{�5."%�}I�l���n�.���a����{3���:�$!������C��L��&m;C�qn3��BG�X%{	�a��G����T���c��X&s��\�'��m? B�����J��HI�B�>n��
�������-g{#�N�����^sw��S�:&��\�r�%�-]�Reg��CWI�(�uM�'2�N���Yv�p�"�4���LO6Y<x��������E�?�z�6�.��SqBo��2Yy�'6��D����q�a��HX�:���?�=/����MN��Z&�,�,����R��}Q�P����o�q�)�-�����q{�0?���JoY�������8� �D0t�!�K�#itd�����F(�V������S�f�N:TA��s�������7���\Y'v#�U-U��F9�}���B5���������B���/y�
�@���M�[������d�Q(G���}ICE`=�d6p;b�.V�(+=��
%������i����*�^����SL!�5@��T	(
RH\G�"�=�sLi��PJa�J���|��o\v,'q���m$=��9O�|�O�[��=Q�x��)�`�:'L_N����e?h+G����5���p��I�/�H�������{�|#L�9��=���4"j��xF<��h�A���$���^�3���z�����d*���A�1��W��r�o��	�84�l���c��c�Rdx�Go�\-U���
��l�JvJ��P�	�T�-��K��cGT"��)���-X�'��)a������40�T�����f���5��Je�L:�PT����?������C�����
��&�2��|��U�	�X��`p���K�n���s��p	�6�9��N�?Y������[L;,\*Gv�)�K��,mI�o���*hU-�+��������%+�w�x�����<�>�O������-S�8|E���R�@(�x��0�����������dH�[��������pl��@��N{I8lF�"�ml���
�T�~����Ep��b�����SO�@�D�.Be
C� ���"!�t������=�M�&eq�L���Vw.*��kk��]���k�l"-\���!�!� ��$��u&�������v��@�^��TI��5e'��f���\\���6��@�+��C3��'����R��5��Z^��%������}.,�`�d�5P*���QK������-�����f;B��_
����M!P���?�a^T�������?�x����,|��endstream
endobj
38 0 obj
16228
endobj
43 0 obj
<</Length 44 0 R/Filter /FlateDecode>>
stream
x��}���F��;�"��2S������zzMcc�^5�v������mT�bS�e��7�"��@YU��<l�X	���~�������b���\}x��/U}x���<3�9��;|�W����Y�U���/n��{����g~�_�����=�;������W���Yq*���+�VS�Sk/�<3��C[������6'����=u����zv�������'��~��~�����S����������/�S���;�����o��_�����/�����o������QT�eC0�����;^�����8�_�S����/��?������?�1��>��������K����o�o���[�z��=~�����N��
���m����ROmN�!������W���Z{�<�~lx?������X�xyx
G`�8��~f���wL�����B��������������5�p8����S_���O}��������!�A�_���g�72��������>��o�i���O�G���?r~�e_q~����3��[����A������sj�;�~������h�yi����<��7:�����V��c-�'���	;�
��\��z�;r��?�p����yG�����S�Y�O����^�#mNe��m�����=1�p��U.Eu�^��������JC�(��Kz�{^�`T����?���/��n�4n����q��}����"K�=<U�%_��?�wFwox������
p�v����;����WiF#f��={�X�fkc���xo�
�����K7��������_���n�if_�'���+c6n�����{�U|%�C3�SW�l��o�-w��0�F����7?�t���5�V����\3� lv������
�J-M�����y��
������S%~���w������r<B��q�~��#����������y	g���1�����\���5F��H������������Vx�����R�v���	Y<��9y4yE�����z:0-x�>�*��8��B��D=����S�}6��K��!ST#���<9���0�[8�����c��wn+�i+~��N��`�g��+9j�2n�����;8�
w�n�^�����������!�10��}cH_B�Jp���D����d0w}M6�b�"7�-EM�S�s�`��O��p��
3���oF�5�k*���O9op��~��6�Fp�&���J0��.�eklYE��X^�.`�0?���	5u$�7���V�-�g������y+��������p��&�RFZ�[z��=��[~�q���bx��'6�}x�V���{���H��)��
C�C�O�z�E�(����c 	�e�R�'[���~K���T@��c�E�)��=��?�{�G�go�F�
��h��0~a�rU[������� S�\bu�	>D���V��G���������+����-r-vR�E������NW����xa�)E�qK�zT6�4�)
��c���9-9���@��|M]Z���q>��Xht5������8_U�8��v��l�b��.`�:��9��Y�s� �0�:��h�����9���z�9��Q��'%4�O%�H�D��~mH�����~{��_x���%2��I�\�k~3����H�q'����I� ���u��7���)t�hGCJ���V]o��1���b�f�\��T7��i4�K]QU�?-NK)Qg� �����^�
=ymd<��	c�6L��O8���y��0�B�oE�	v�ZBY��e�����T��Q�E4J8��B�.�4��%&�J��[z��6�\$��E��vA�>S\5�0"�r�u��j�����R\�I&^g�*O��t�`Z�io��<~��p��|�'���/���g,��,I����8�M H���G��<G05���������]��<~mX�	/O��--�{�s-0��A����� o���3U�G����K`����u*�����$���*e{��G���#�@'.	�3���+��P�w<���r�����c���\�����G��/Gh�)G�2�b,S����Yg������\��+�S�0�s�a9!�QE���+��������%�D����1�9�G�B�p�$Nxn��]���4@2�=aO�Y)�<�l��c��)�0F2Yn�k��%�z��o�����^�8^�����FS�Z-�u�+��J��A���=y2!U�

�S�	�d#���A��l�0Z[�a_��h���o������\l����>����}wit��Msu}����je�J������8��K��^u"E��Ep�X��t�e94�ou�ts����.v����{��jU����_�;0E�qh�{l�,����6w���Xy���|l�>���n����E-d�G��aJ�
�B�TJ���B-'��,��@��P^J�IY��q����I�,���e,�F��_y������;2[�)0�oW'D����5��EZ�2�],R�O���J�%�&�Q�B�f�y�O�0�/��t�6��)��r����sLgB
����S7������I���w�Ov�P�!�bJJ)|��w$��k�����k�M�0)������&���W_�KpQB������t~h9U0��Cb����?AD1�|��blj|	O�)�W�
����d�Z�9�I-S�L���u+	C`���&�~hd%y��hI51]��.��	5?�<#sa��3���=��@��������
fA'������N�\����xI�AA��������� Vr��2�,z%��b���g�K���n��O���_[�������M�F�*+P����n��|����$�Skg����r�C�����s�T����������sR_���#R��0����.�krz�o�����G(
Zc�k�����P�5La���$xn|\�A�C��q��@mq��g3z�Ln��n��~M��s�;?�{+`����<��|���u3;�����������RX�
��!�
��9��,��.�&����|�@+�2=��.�tJ��a_�!h4p3��|
���e��^0�)������!;�'|�;Yz���z�~����s�nA	���m�"�#�2���v���}�Y��ic�8_J���@�:.�����H�ZR4�W8��E�~Wh��(z��Y���?������HB5(`��K&R��R���&Q��=����}��r�d��"BK
����]����8���a� ��B��9�]��s_���_1:���}2�{�	Ta��s��F�)B�B{���"�`��7P����y��
�o��3�j� �VmQ`����0i�Kv��dzv*��:��E�"����2�O���V�:����h%�{�P��!����et�����,�R����L�A���4�$��_��%e��
e;��^��#�Xea�/�g��~���^����%.�h�4]��XrUYn.����������!{Qxt�?`�0z�����D����`���]�����f�n6�9���	�;A�p=�p!��P�
�*�y�i�+�]*�W�Y�&���*N��UY��6���
to��g4���.V�oYI����~��,Rt6F�hB�_����������`��u����@��NP���	��&�r�_��hW�*B}&.y��0�\��]�'��u?�B�-.���r���^�U��(& �e��F2^2Jh�|��:�{m'>�D�Isj��Mp�s2X��T�z��IB�0{�Q�M����������%���4�/x#U��e��$��y�,���Y������ac�`����4&@�&
�*^Rs��PzP�EQ�>lQX�Y����H��;�@�2�e�.����[~��+���������/�N����8.y�F�NzJ0�Z�	�M�'�-���N�z7��5�bY����z)��7��z%����`�2�����R�9�z]R$���(����g
��LX��qc�%��5����t�s���M�`��Y��
0O!��3��`�Cr�@5��~+P�|�
���`���k�BG���D�3���g���\�LS9���h*��H/n�iT-�a�P��og�==��p��|j�b�u��R�FT��(������l�Ys;56�@-���@�f�����f�[��K������l���{�����-�x^����5�^3v��Y��8��h�Pg_�S7.�<�_�u����jl�@5Rsj���6v����Y�T��h�I��$|�9��4(�f;���t�L���a�nNovkYb�<b�<U����%$���T�V����������g�[C$2O��	UE.�����Di�A�9���u3Q�����������`��� kq	_�*�}P��xg?����i��c��^������G�V�3�r��a/*"hp)�jq���������r0(�0�:uaaGUA����%'��Z;��aY�
�T�hZP���G2K��@`��.�5���F�[�����u�	��t[�E��p�F��H��5Z�b���'�*+����2�	���0�}0U1���G��*����0v)}7b7�QA�������n"����V������F����vt�m5�`w�M�2Jt����������Sn��f�pu�f�S��9��J�a!Tb������0���j���u��Hj���3����'+J!���3%�rs�d��J�=5���>^���U3�lX��aJv&M�&�it�$g=�`d����h������n������E'C�����#>)����p��
m�S���z6��4)���8I)Y��s��#P��J��/�U��`<��b3�>oi�V��BO�"Ui���S���g�$�� Hl�$�J�@�=�;>�5�f��#Z�I�>��0J�������zlk��6]G.D/���;�|�
N��gMV`�
��� ��*����>S���b?�:4�W/�����D��V�h�x
��1��8�+��I�)�(Arri)��q$��?KT�Ru���gQH��d�z|U���aR�P�;1xl�{B���o6�2�n6)��-7��Vj����;�N�J�%���@~�q�f�g7-���uJ(s�-~�qu�o_�U:b����D�� j�-2�2���e��������>P
��|�x�bat���4���w`��5�E�<��ISF����b�}���j�z�����R��LZ>����)�{��)Gn�X��v$��O�9`m�����m�����e|�R7F�m[B���'�l��T�#yz7�.���cbj#�P�O�MkftI�&_k��.��D��A��s���_&�S�6L�I����/�~��?p��W"(%�����MLk*s�e��������p��yE�K�y��5|�CvZ�t�Hw*�g��"�Q����2Q�L�!9��PhO(����*)��1:�D�K��x
��f/���LF�u��%t`1���<f�L�!%����B-�(�������F)��k]0�����f�����pA�p,��H���0��@g�Y�J��%R[�������f�����R�����A7�o����?��	p��\W�(�����4��
����b7��!���S7�]4����pn$�����M�i�$�����y(&����(jE����"/>�����C��T!vW������a��w3}���s��z����y��o6i�[��H(�3�����������U� �����\v�����^&h���G$���~,����O8�/_��:�GCD�����g&����x��Aw���Q*2�i�`�^p5l9��
�B=|��,	�r�E�APT-i�D����Q>#:GX�q�!Z�YWuG��8#m�[-���TC��xh�y�4���y|�����
��6r�����#��}�V��A� `o)�V���dO=e���(J��6���-z"���Y��^��Pq���'�;]�'+�~C�7����S��}�����%]��j<k:��p]U�����B�b����5W��@�h���v��U��F>�S ��+����JB['�
� �s���\zl��B+��,>��Y�G_��������,�^s'���������3����3�����0���[�s�T���	�"e�6��g6�����<3�9q@UURtU`Rt����& �e�*~�{gz�
n3���A��:�f==��r���Q�h�l,�pE�"�}�=fv��+q�=
o�E/�������UI���O_I-�\*�	��`��Q������L�^-��e���)�~��"�~�����Y�$��{��J/�07�����t��^��`W'�8�4�,��~%>q����O��n��S/8��N!���Q5��y��h����;'PjKR�
T��_��w)$�����X��wP0�r;J������0	���
�����V�gd[�e�[Ao[�b�g�����K�CRR��dC���������F��\k�+D���
������Q86P-����6AnE-��!~!�*��fb+F����e���G����>����2�������j5����@�r��X��>����6���<�dF y���[�Vn�T�.�@%�@N�����h������p���d"�y��D�	�*d��T�
���M�
�w���Q`����)a��o�.�o ����^����(K(���L����es�\�0Z��A��%���sTg�#�;O��z�j(pE���P��%�p���6�����"]�q��+-�#���W��tM>^��-1�,������kT9_V	�^Q��:�5=9�<?����%(���K��*�>��/��$��
Z���j P"����2�;$����,�����c ���;�-��v�v�:�����[l^�]��(h��e�E\��5�T��:.h��@�$M����rM�]��7:V*\�)�g}���q�G��>T�6�����:8�����N�E���U�)��O��HT���Q���?
���!�3c��\��"���uPn��E�k�L���M��[���B(#Ai����2}5�V����G��T���`1`"qgV����}A��HV�c]p#��;��9�
��|�,�t��,(����gA�!���r��y������w(�Y�>�,�N[��W=�c���Hs+�+�#Y�H�Q8�]���#yq��/zss@;
W��i���8����T���0�-�_?�.�r���Zh���;h����.UE�1Rr�Yv-D!��� [�lBv7;d����������=�<��u�&��&+�k�}�L�D*`K�3����y�H�2E���2��SWk�P��v�r1N��3\pM��t�~b,N��>����[|Mp�<��)8���ycG���� !��V*�	�!C�-��=V5�5!	%q�c�,�\G�t}$���S"rc.�V�
�Ek�F�	�U�h��:�����u,��^i,����� ��Z� �G�_E������I9��n4�_p	�O��S����a�
p�"
���j����$�&(!&k���1��1+�nyRT;��4.\,1l������$��hM��7�$W�`���{Q}������4k�E����R�>(p�&z��
~��0�K�3�B�:��pe���[X�h�����C+�o�$���G���0�Z
d�/�cl��x���oh�d�3��-�D�����"���~V��@����X���^X��vs�D���1�\��)��
���5�R��3�B�������.��<��GNuZ#�tG��$��0\�������������	�
[V�\�lS��.��U[]���������]G��5�Y2���Og��{V���W��
��4z��'�4Y@A	%<^����J5����}4�������`s�>�����Yu��%��N���D���Pv������^��)�*XW�;��@�i��{E0i�����kv����[�h�h3�h�l�����2y�}��|n�!��1C�_��C��NP���I~�sc�iZ�����[�x���\+�����!%j8��Y��z�<��R�����\�S�|��iB�����?��D)��nX���R�?��.���[�H��~E��R���x;Ay����i����{tK����6�����z��RV\����Y"I�d��v���������-�d7VE��}��.P���f@�zx�FO&�����N�����K�F�B��Mp�vZV4(�f�,i/g�j��W���<x�:@�*���p�*�`g���[}�������]n��F�����W�����R
�[$1�D�������n�#g�
S�wN=wO=���;v�#�l���J�R�z��0����vFA�P��8�LH�����LTd0��2%�6	������S�i
�\���Hpk���b�^��"����.�*W�0j�O�����+�p�!��-�x$:(X�ra��A��
E���F�[�76�,
�"u��]V�M\f)�t���^p��,0���9oQ6�u��#�m]�Vq����^0
�0��LS$��WGAs�_��[�"%d�{�T�rN����/�c�������+���.n#�^�^<.`
0W=_e@��O��?�+<����X��F'��q�j/��@�e�����jb�"����L��X`��3D�c�K�U�T��X�F�eQ��Y�
���h�!��*}K��/-��K{~�!��6�j����jN������RD���g�Q�4��%������4�����J2�F��7�q�~��(Y�����7L]KIV��{���L��K�w ���J���cg�9�9{��}A�D��"��h`[bL%������d��azwc����mE���;����M.���Jz*e+�!��n4�B1��Z$I]��/��9�`!�b�:��u\K�������*-NF�3}�K���w�������0d���y�l��f����������`�	o5����?�~S������j�bZ�z���������.��'��Z�T��Q�tB���kcMSU�
 ?��+�����72�J�0W�>"h�	����b�y~@�g�8r>���I?��%�����m� ����������p��Z�d)�v��S���=��b���:���pMn�<{����8�+�8k�C�=�2��!�h*��uK�g����M������F�i^���];g���~��6��Rv�?��<�	�g��G`��}�% ���:��}`LU�97���`/����v--=oN@�F����F9��&%Rg���@��S$X���k�x���tE�lI�E ~�0p�`�N����Y,n��)#}�zb��`�y�V��b.���H����E���B�`_zG�-��L-����G��"q�l(8��8b����E�����@q~�!�p;����`��vnI�z����Q�#v���0a>y-�z������c�����.��\*F�+����,u��Jtt����b��*I(dI������i5��+���� ul�{$��Q7���;�����3z�+�u	a~�\�w���������\�|���>��A*`h!��.�-������������:���i�D�^?i����M�$"�5�����Z�X�YRD�PG���vi7�est�>p������u�e�P#�N�+��U��������������#���tN$�d�������Ei�����$l�$!�Q�:@�{O���P�$bA��:��T�t���b)�Jj�=2�*x�AK$g��n���X����Z�Z,��S6����>��J���ev�����D�wDOF#�/li����{�P4,%vQ6�u$�w����:�����Qa�����>��/��aE�>t�6n�eKc�l����P-4S��)���pOE����I�\���F�wv��4��7���!�������P�p��^��D�k�P�W�c���:7��dAtZ��42�6.f�� ����,2,��������������F�=��<�����W��U�$��D�jN�o/a�?xyx�S�e���[��-���s����Y� �F�w�+�'[Us�Rcf�.L�7�n�cr���Z�g��?�^�(�j�����VaBH]	E)���<��d�o�20�����&Z�'ujuC�hl��?r�����m�;������x�d��y�L�R	�xZ���y��T��NiU�O�}��{�w�D��u,L)�m�
�nVu����M����J�;�0	_����OCP �LV��j��o�<�5g=� ��C3��,���&��z�Bi��#�H
8
Z�+($���e/�����pE�jg� h�D�X����"CE�R��?W���@Dk��;���k��r�"���Te����eg�a��u�g�#���3���Y�[SVc$�F�c����7
����'T��d��[�k��%<j%NL�)�1����2�p�:�>ZH#�b`��N�A�7A�nAFf�����zC�����*����
S��t����a|r1�NN"�gD@��I*�[_��Q�{�}o�9����SNO�����!.�B^������m�b����-��.ZW���2����|{����S��hR$���Bo�i�����zW6�W��	��H�ba�>�Ki�������KDX�i1{�4�9IYMtT�2���B�[fY5X�U��T��$��d���I�l�`�T��j�%~����
�?��d�J_L�\����WU������C��KH �+iMA:tV���H$=.��$���'��
����H	���Ml�N���M��bB��(|�AL�
�j�-�-���{\(��	�����m���T���n��$��qx{�Lu�v���x���#�Z<[��
 N����"S��,J�au�<�������$���w�U�'��S)d������ ���$;���D�����OS�-���Xc (F@�����cumc8�����T�k���Yh����FEL\y��8����B�f����NDi�&{ ��
��_L�+����n!C&WCs|=M���cnYz�T0=M�����o��AJ���^
Y����$�Od$��4�.q��G�$=0��no���8�����XJ�Dt�������P��FM�K�V[J'�:�H���$pLj�����<��m��~L�8XX���j�m��6
m���AJ���6I�	0l�%D��6��.����K������;6�6�<)�\j_Kt�33��t' ���kU�Z�M���P�?�������#`����I���jx-@�%��M��t�����_�y}���f��\���#+�����4�����W@��c%3������!@=�6�R��gH('�a���������!J�'/7J2���x�|U�oxfP0�uY������r�dXh��K�F,��[�%�_�,I�������~�F	#��G�-�_���k�dk�@��:4���>�������6*$�M�����
�n������[56	*h����W~t��@��F\�3z�>5<�����7����BJ�KDK",��*�&�B�H�����DJ�F�������X+��JaQ��������<�V�=��J2[����&��x��0 ���#��X1�&�5jJL��+I���lZj�����d�x�4L�U��%�����aoE�/],A���6\�
N"���$�_���3+�H��Jn!������H�[��� ��v)�h�k��w��b:�:C+�A�*�����;�R!��^��k��d\�K��7&��QNh�]�l�@���yM����������N������'�%Cl�3���������Ih�t�#�)FtO/`EDQ�
h�6��#��"�R�L��&�1�c�����]��Q�Q�����s�h�C*F����z��u���og�A��.�y4����dJ;�7��P~���$�qx������������6fr�q��������
�/�J=�XP&B���lc�����T�Hn��a���O�j�59��K]�a��.8��LLW���i�� �$���&K�n�v�"t��e�����.s�m4������.s��W
����e�@��Z��P��6^��O��4f?���y5��_�k��Y��75�@�������
U6���N�fH2�J]_���Ah���~�����)���[_��?�7�������������g���FTj$�4<GF�1a�B����Xl�/-��M��w
�B�J.���|^����`� �?�)���*�
��E�E�m=>���(������I&S�������Q�	7>��<�Z�(�����M^���y���W��mRB�mE�>m��T��T*�/�'Z����={bq��c��=�����X���=�kOi�	kX����G_���2����{D�1��x9���Z��"��"M�z����AXi��o���_�=tt��6E�,��_�;�������
�6�\r�t��=	�+��:�B�{v���b���n���b�#>x�i>��H����%Z��w�u.��C�N�%.��s]-sVQ�����Q�=bxU�OwO�}$����2~=��6u(}*�;�<
�es�Q�-Q�����������)���}r1�=�(%,5X��������j������BKB��<�Y^�a�1�j�C�\�s���V�v���)'V"������g�o�
�����;���oOV�<Y�O,��Q�v����l|��v�����H�p���.�D��e��\l!��E!t����k��_`D{�*|����5#��%���`t&��j��N���q|C�Y��:�5k�P^�K-���5��x��S�K���I�Q��zi���Pa+W}?���Vi�3B!����H�Bt��iK�X7������|�M��M
.����k��*�h3�F�bC�^���GSu��4���B��k3��A/�/�tEG"���	.�>���J�ugL��I���%@K�#QtcQi��A�����83�_��e��m����I;���x����B�*��KEie��KND1(u#P�[�����P�_���m�p�^����+�t��w��D�8��_�4`�/��"��
��������Gt�e�JcIY�&I
Bh���/�i\��*���X�-��0�
U��j����-���6��A^r�|h%&�AA��Z��3���~�!SCNM>���.��i��)e0�t�K��a�NS�?�e�!RL�q��m��VKMze��X�����a���Ct	&W��t��3-��jG�W8�����$��S���f:~�wmz�w�:'�9CS��L`s���1Ex�b4+V�`����Z^J��2�g!��������(��U�1J��l�� �Q��o�N!�`�j�6?������	�����N[���Z�B��PH�z%�~Ts	�[�* ��c���`��Y�dT��J�����#;%[������Z����E�b��,*%h�����t5�X?�$@InQ
�P����
&'��h	.��p�&:g��*$e+��EP� �b���FZ�ER���
�R��*���
k�����<�������Tp��c$T�b>�%p���Cw�/�����F'��U�6��d8r/�l+���!��Z�-��&O�{z�j9�,�_�m�9��2>����"&o�#����Z��3���}?������[#3/��m�BV�7c9w���FY�|��_\<�v�U[�AA��Y�j��{��L�0�_^����:k����)n���>�P��Bn5{�/�p�o��b>��r<���{��5���%�}�R�� _e����*��h���Xz2�&(�h=eh<|��#f��+pbK��z�{��rb_�u�U
���)7[(��_��kN0C_���qp��]�9�dY?����=���"6:N�9�F��5Lk��H)�;�Xkb���@��X� ��<�kA���(��}��S��6!�A�$<�A6�F�&�t��+L""_�\����N���S4	���Q�w��eN���5���#������m�D6�6��uC��%������G�,<50�!@�����0��|�!��%�k|���`��C�$�Aq�F��v�P����K�:,�����$~$���.���J�t������-�	��z%�.(�I�{����i���*d�|��^��F&V1W�:R�7�K�74��u$�����n�+�����x�������:CI�i*2:)�wM6��@��L�2�C}HiO
��0�Rj&��4�V������.�I���eo�g��
�P�����F,|�D.�8�)�M+�W�\~*�n
�KN����
�.�n�UR�^�w.D�������O:xse�O�30.Z�Y��\>�i�T�t���s������E�Hk>J[V��#�He�,LO���B��`��-�A���K���/L�}��� x�>"t\��Ei��,Q�!&R��+�h�*�.��3p
1�j� ��2k���M���vE
�wt��s��)�S������1��N����FuKP�$
����O�px�S����L�1���'��S���.���6G�h�q�hG�'��
�_��z��9��Ba��Su�+�G�	�/O`	4l�7�;]��ByF���=�sb�"��C�#�m���{���m��g��u�9��6���p�w�:Z����{#��Z|rKA-AO|P�L�#zZ<8E��xs`���)
�(*)
1%��&�\�'@�������&�`0jf���.��7"s���7x��2d�P���C�����������pT>���%�.%MD�W�������.i=��Rhk�i�(�dIn�s?\�.R1I �B�I��W�����I������,��2��A~�no�"��R�>��S3�p@c�D�������u��s���cwB�a����\�N�X��P�J�8	�)�����L�A��A�X��$K��������g��<3f�p�EA�	���%������0o+e�rJO�����Z�{���)
��y��)�D�o�2�Dq�iLY����I�
���u|/a����2g7%|L�E�K��)e��G�������Rc��n���F��r���x����l��J��A�����BYiB "�z�/��Q[F��}:�{��f2�z����]f�N�7�����p/�MaK�9 �����;�)��
�"0w����,������f/-��w�|���1�g��+H����W��{%�Z+��%i��Qc�(U��gA�|V��ge��"��MD?r���@��M0h��z�P?Z���L��I�th���!�P���>�)���#��t��n�s����`��c�
F������������y��������U�Pendstream
endobj
44 0 obj
15495
endobj
49 0 obj
<</Length 50 0 R/Filter /FlateDecode>>
stream
x��}Y�G�����W|o#��%����tC3@V�9������d�
x~�dVeF��Y���8����##c�?=��u&������'?��`����?=�v|����j��8�p~������'/�|���������=���������V���W?}����0�����8���ir��xb�o���O�_�����������<=�����cj�N������=y�&�d�w��pZl��E��YiN�������>y����t2S��<8�������=|�:������_���[�>�U�oYc�a����La�'�y��Ny[�iF�~E����������,[P�f���������m}�����^�n�Y�,'��k>xX��3x���M�����O>6_�4o���x�o�f�,�>g'OPh��?+�m�aO�9x�5���z-^�k���}���nv��3��|��Y������PV?�,��lo���W���oz����/�T�9��`~xf�)�Gt��~����O�^|���n]@��&h?���S����M1����=��>0�{6���u�n^��S:}������%�F9�?���q:f	���
�����s�a0k����x`'?t�����H���i�!X//�O����4�]lNwh�_��%�7Y3SD�v�q:m<�t�va�8-�8��+�2����l�i�3[8�����U������o{�����������#Vd1��|���o�����Ln#�{�o��9&Sz��9
*c��"��X�<#������q�o�F7(�!zG�e2^"�c��~��Y[%����J�M��.Le���G�R?�o�H����8�:�����]h�2Og��s[��Ip A���qy�)�;n?�@v���3��/��[�9���<�C.L�,�LQ^|�0�PD|���
��p8��.9��^}�����"�}�U�4p@���������ob�0�u%.71��#�hn�$��}r�4)�/�~�?~��L#�����-O�[~^��b+~�W�<33���
.�'oL���?J�
C��8nE{��i�9�Gg;�b<����La}��>�-��Gm�!I�iN`$}h�	tt�\���� T��z<W>��|i����h��p����j����)WYw�7J���_���M�N����!L������N#B����+�I������	��������?(�m=�uiP�����j�[���9�"�����;�L�mt��6����O�.X/�3="���a���I��-�	te��%#7!����eK�`���\���V�X�����j�+��;����>�W�0�qga����
31�6XH|��y������[0���9?�G>x~��v����0���w+XLF~�&I���w�y�}r&��^�����g����"`����	�Y:���R�����R{��a������s;���?�	�I7W�������n���������@�B�{�����D��-�2&1;c�Rx��n�]_���5�t<�����8�c
R�\J7x�pZ3��~I�
Lb>�SB�3B���3��� {?�����48L�J)���2��+�U6\�o�uZw�e���D�G�2�\��N �H)��Ra�ZdM�B�O��E�����E���\����h�r��/�M�~�y�o�c������+��O�n���D�������hG���m��9|���K+��1��w�'����oEX��(��{�����Teq#���kaw�Kg��K��o�PNuW�����V����a�{�"\/�65lO�1���/��w8��t]�i��b��J���B3��}�R�%����3}���]lh���Fz#�>y Bs��u��9�������6Pfe�w�@���
�m�7�=Q�m��x�#F5 ,{l���f"��+'#[=OE�����O�'�"�a�n 
���
v��������rX�����.�;���W	kt�r���T�%�g��e_
���D^W�=��
?���(8��5v>�>{�|ZdAB]�=lCp�_
F���R���e�(�����LL
�S����(���nyd�|��[�F��wmq&��
x�u�}�D%����p�4���x�F�6u����"��%� �$�z����E�g�|2z^Rq=A�R�s,\�&M�z��ZY�"tQ��l�k�[0�����U��yK���k��PL���������������L5���w�K�������������������tP�1cHK�8R9W���B������%�Tg�j��.��C���u�������p����G
�����w���"u���sv��(���Y��i�HR?����:i2e��x��(���B��v��rP��5���6�4��F:�O�������H578����������eM�-���S�7���<��9z�.s��[�����;|��&�S�7�}�1n��:'��^l�t���+���"4>'G]��w0���"uW�+Cnq�6��EN@_����2)S�7�+Ph��^������f��H1"O����'�(H7��rcW��|VcF>��t�D�Ay���[������O������~2�-�����
�<�]��=6@t[��p�|�\���C�g���!��f���w��5������
�������z�[�|�D�q��������M|W}�wUu#����I�U�,������a
|;�}��� ��7�����k��ne�5Q����>�f����[���&c\�d�e����IZZ���An�+���Ju�����	>��S�������=��1���u�a�l��c�B�u<�H���[;&�_���y�_ �����.q��aK���,85�Z=�
�ebj���2�K&M�;�q(������b+�Z"�{{wC�����{"�m��2�o�)^��r�`p����4�Ks�[�H�������XiI���'	�Y�����
1�B��r+,rQV87��������fh��2�����$2��0�%U�9x�$���4�e��u�]'4Q�?� �(5-�����V�&��y�%:�
E�!��y�3dD���Go'W������ZY.�`�J'+c�	_��y����a..�v�D{} ���,)�<�Q��f�d��H��N^��C���+�{K!�4VoP����6�O5��������`�K��'�7��iY�(Ff��2��{�����B������>Y����(����_��u�I .|Y�&�
f���x��v��.*�.����a���;^b��.1F��L�NF	Nz4�3�cH��-��m4�A�@�o�;�>E��'�7�8Q���AE�=���4o(��;�*�{��aU�������S]
NyI��t�c�$�U����7��i�US��_�e���o�H���P���}��co�KiM=�+b�(pED�U������3�
��"�5p���Z��:��(�~���#q��1�b���<�1��y����|�����U��Z5���^|B�I5���r����i�� �X�����pt�Mu(�2��/~ai�����J+R`bW�qW*�5�r?��������_h�"����f���S��;;���CB��d���/M����;�;q4uU���rP�v�6aw��C��u,�26�F�|0��� ��_o^�+E�u%��4�������M�Z�A��|�@��(y��������H
DnC�Y��2GS�?�L�Ei�m5S�{���p�j��z�%��+�����v�S���Of���Q������le���_��
��K������U����s0��i�QY�
hV���
^\��y8���4�� �7��#�Vk��x�|��d��������y��x�IMu��D������M�/>K�o��etR��X~E�G����i�(^p�1w��\�'_r�
w�)C��iC����f>G�P��=�����'�����Fz��lV����An��z|m���?��!�H�d'�'�N��t����m�Ioh��y�{����-����p7�t��JAz��6���8_�>I�1J"�i�U�[�px����m��
���E�f���0���)�'+�W_K[�z�����z��Y��t���0i��.ZyL����%?���#\��)����|$�Tr���q��"r�j�����������Oo���*y�`e%Y(%~86�0Ta�)���7�.�5�"AmFv/��W�:�4����U���:��Ku���I��Q<L�I�0%�[r��f�L���p���oy"��?�s���:��_�4��M�����5�h�w�W]�5�E��-���z����WJN����"B�dMt4�S�;9%
�(��:	+�	}���S�#�V���~���q�D/C<�����aG#��2&5��C[)<���Gy�A��=����M�S�|�]����)�H�D����pP�������@e)	����d��y��R9�n<��G>���R��g����|�[^J��n�=�)_|/O�(\�,4���J��Pk��q�+��M2�(�IV�����&�����%��9�G�w�������"5��J���ZX�q��+�D%K���	���D*H����_�N�������6�	M��N4(��Om�����t��6����7��t8���?A�"�A�P����������=W�
����_��QZE����9VP��F��r����2�x�SD��z�z����F�)�
6��)�'t>����i��p#�y@H��S�X_�E��v��F��,�U�wb��>Sh`P,�zJ����R�V�+��"�����p/��H�|���g��wJ��`����z�+<j�EiG]�<�of�[WU\1��'s�I���%%(%�����^+�HFi���:�L]q���h������v;��W�A�Y�N�����-x����y�����fI���v���{,�����(���k���U���'~�zE:���DFi�{�`�W)Ozp�'��iZ8��q��m�v�|]y���7�C���K�	����m�j���Q:����`>�VW���kZb��ZG{h%�n2��VTQsM>^~�wO�[6%]��!�pX+���:o5k��34:�N��pl����@VD�iu���Z�-I�M9b^�U��)�w (�Y�%xM_�/�>�$��]���t���S���2E���*��o;H>S�������^��E�U�T)�~�.�A�|\`�q+�k����a���l�Zn���������?9��?m�������d��s�x���+�:��g|���&��v��[����4��]�.���Qw��iD��b]W�B����L����x�;Adc�%�U1d��*$����aS�X����W�z�#������~,]*�������������TO�����5���Fa���^@���+2�j-,]�s���7����WLW.�/u��
�����)E�:����u���Q	����8-�<��i�b�=�@�G#f��{��3k����25�
�H�_�h,m�n����@(,�_�q��Q�TWz;����GY#[���v��2�b+8~�"��8R|�G����������(�W�|)\|r�����X�n��Y��U��M�Fh�''"���z��%�����y��,=��UD���'�[�����{�#�uc�*	�$��_�����{q�� �GN�gY���+���W���hpA��xC��k�_B����r�}���]������Gt�k����m$���u<�
��������U�����D�5t�3�=Boe�K�1r��7"��Q^���p�,TH��hur�,�3����z�^%a�|��1>�*���dz[���j���\��5sOE'R%)���8]�=�������iy����C,It�>&4j��#�3Dc����Q	�xB#�7�hzEA�^�a�%4�9��������8	'��S�8��U��/�����M	�)��S9KKSS����_3J@q�/��\�����$��iX��q#���w�;yZIK�����u}����Hwp��kK��_Keg9����Auxz�E�HsEdJ���qG4�|��Y$��"���.c��&�M�Q�v�U$%4B<���Cv�����p�����H:�3��L9�kA'�����E�����p��x���_������F������'���c����x�K�5���oI�8�V���M&�������Y�a���%�J�2�y��r�����}�r�W�-W�Nk���rWne��MW��/
9c��j4��c�o��n�I��s�{���#�59�k�p����%��S(O�O��= ��U�9�w�+��'F����tj�����F��"�\f�>���W�����}�X�n��]�ET��.aX7��=W�7��������i���)3���>���L���Q���*5�9s�mz�~�&6�z�F\?����#Z�S-{�Z�gqR��:F�G������=�y�z�P�d�!���/���x�=�S.�m�7�O�R���fz�W	�������	r�'�zAII��A�"u����i��17Z���9����=0!��$R������^3 �C����r~�d�"e�c�����XiI��Aw���q�&�V��x����:'����Oq��Gy���S�O�����i=���y�{�q�#���*s���G������S��{��0.Za�������z��J"�T}������A�����D�T"��s�����@?��.sIZ���B/�%����
�wv�obz�;�e�!/�G���*{�����_��t����/9�������G5)l(�N��w���&�n�R��j�t��Qz��P@^;�����	%���[�)�DT��)���^��^�4M:wUJ����K�i���^zW,����x5
\�m�}�%�$.���]�������p�L��P,�e�����Sr�07�S$[0�)��Z2��*��F��k�Z
O
��$���K>�E���Ds������$�����i��X���H��9�'?�zA��?'����4V��%v���D�;�a�LU�JNM�oCy��e�$�]D^_H�y�Y_�+_��W6i���X�������� t�D�*���e~�e^O�j�@���h���<��
�=�e���VC�F���e��~/������f�xd��JW�B�r�h+e�R,&��I�z	c����b�hNs8Z��&Y4o.��m�#J�Q)6[�K�9�K�Z���|��W�pU�H�����*�,��X�x���
�j�
����k9%�t��V��.�V���\���{(6y��	�&-�����$i!�zI@>BV��no����'����{���`;,�M��Q�c������Y�����J��R���%��[^����zk�n}Rk�h�����D{�ID�����A�P�z�w�U��(����@/�U�C�%��VHb������8��f��+v����1Z'�%����������/��u�lv�{�����r]g�)��DH5"���d���T?� ���c�R�}�4I�����p!��������)n�b^��ld�+���;K���%5�]_��Q��H6�@lH�y�q����"�+�|�*��C��_SR��<kq���I�'�G���G�ks\]�����t��#
e_w���/(�	c���{��&������@���a_PUW|�g/����	z�f7�:a���|I�����H'�;��+�mo5Z����v]�=���Z�}���X��D�*��#N|��S�_�f���$H�	q����2G(��JN)���S���$US*E���|�c	{o�S�������
#V����6*<�I�(Z�Fa^�
�P�������K�j��Z�J'�����H!|	i\S�^OO~���/��L1J���� ���D�{��m�����PB	�K�����~e+�G��=yn%��s����^� z�Q�
#w���������:�������������V;�V���Q�q�!�d��L�i�Q:yo���;�;�_�#�T��j��~n�������8S�pD��3��C�����CkA2(}��q���#73F�}7,������<6�Y���t>���1��u"�7�K�jw`���!vx�]��g�R�o���q�(�U�R����;����	���To�6:�I��)��\����v�h��o�������"G;����s�BUb�4��LYR,�D����K�[?a�����
� �u�Y����%�J��{��.}����r����hv+�NA��S����1d^%guo�1u���c<�l��<��!5!xKmA�)-I��Ru�F���z���=
�	�=�S@��g������E��^L���7�����s�D�����^j? 3�=��*���J���g�
���}=�	���"[A�1�|iw�DPA��z������&���r���O��Gf;��9���S���T����H�}Q���~��X���m0#;�z�(xk��C ���q��HJ`��k�|W����e�����X,�z(��r�s�r������(�!�� �H����d�T7�1����t��{ U3#���}u��F�<w���"v��PfQ�o�a��^K5)���})�P3i���_3�"���Y"����#a^NC A�������u��wR!��}��@����]_�~	�E�w$>��1�����U���;���i�B���{�3�N���
���f��$kjj�l�JB5�e�F��?�B
�Jx�u��L����|-�=��r�D�_����{��m�7�$�4�S
�O���0�`�pq���Lh�|����1{VW�5�g,�'��\�����L�+���s�*v���O������:���W����+2��n������"��C&kT��_F2���H,�
���RA"���6��q���e IH���,��R�����;��*�3�_a��e5xhi��:��	[����[���#�b�SL�;��M�;�V�:�2�H,��<��`Q����@�xB"�H�-y	�k��)����m�q{��z��;"����1�uoA#���E���YKG�u����[�S���=O�������(�<����,�x����r�9��rd�����P���@���S�����7�k�m?��k�;xR��
���C��
��u����"�5��?����O�4��\G�y3c�#��CK������~�'G�E�	9�6-I]�T��&�\fvf�b����%��5P�L}��E01a4��6�Z��i!�����bbc�JYm ��f�I��I��p�7�i-�����g�2���!��}B��7����~�Q9�*�H�P�!	������<��c8K\ ��+��]iL��y<~����d�	2<��j�W�4|����w!�,��4&n"�,g�����fn6U!�"��LFv'�ov\���+-
���P���
qQ>�c��Aa?Z�����Q���l�}���$��o��/Af������/�������/�o�&����?���7iK��	�GJ��C����Zv�8n8x����5O���/]_ta��/Y�{�JYn�V9-����KUG���n�"��[�
���������_�jw�Nfq� ��O&y��|2��H���5�'����;����f�
!���f��H�?*��Y�N+fji��!*>�����Q�c��b$��;M�T�����R�bA�^V7n�����8�93�U�*���}��B
t����g����o^���%�;�AV�yJe/��6�q���F2�$v�5��G[V��;�k<dS�"��nin5����������s������f�������%�5WQ+5����p
3��QE:	���3���i����G�/�R������Q���_��Ej#�S�Q���#1�DJ�\��G��4EH��:?��XT���&������)��~LV�F�XmZu���&�M���liz��a�h��l$��^q��g�?���t/5S+O/<K���i�W�}>���O5C�E�����@����3����@=f�Z$��n���>�1zw���������Q��Q���-M7�Gd-���1i��d<���%-�Y
$S=h��A��xr��I��*5�,*g,��E��=:uQ$]�b�&�&�V�5�y���$�)uIEIIjf����h���[=�E���{�j���^�SGP�/
��]�	8x�3�E�ZLt)JN��jRr��l��<
N��A�S8JgqE�)�,6K9�O�
��x�+�����_6��)��	\.o�"���|����g�gc����/�&��X�/�W�q�g9���i�?�<BY1�����9n�a�S1j�����0X\�!&��~�&Au"�� :/�UZ��`uM��+���b�QWD"+��b
 <
���_�0�q%�;	Cn���_`yL*��������?�I�5o}E�$�YH��b�8?��[~��h<����d�G�j���hk�
�UY�?���X�<7Y���PV�X~(��y$����Cu3��N�Y2>L@���S�5';s��N��vbFA���h/��wus�Qe�wX���i��g�va����i�NN{���t�9K��������]��XN��L�V`>>�7��'�w���R������%2kzyI��Z��������/�A$Yr(����h��_n�`�k%q��"�vU=�S,�3��Ij��oZG}xE���nJ$������S���_����S�uo����x����K�M��5�"g���8���S�K ��r�����]�x�dE�,���l��N���(�&t]�i�,�>��)���r�?�i�Jc�G�M-(D��R�h���!E5�m�������X@�-w�b���4E���y
���8����%2@_�-J��Z�6�I�D��@�Za����/�K	>��db���Lm��X�F��p��A�3�0{J�����Q�s��3m��/�Nh�SR�@�K|n��4�D��i&^�{aM���_bzf�s�f)������5DT��8���$n�nL{�������l��h�v�Ul�#.��a�:8��C�Da����}��^���*d{���
��������{l�/�j��{�wXIJ�ko�S�k9�D��y�.F�wG&5#s6�uK1��Z}�|e�=�F�\��>M���]� SdVp��>��I��"}8��'�m�9%������4'N����U�mZ���9b
�������u}���oL�����������{
�
U���:�Oe�%����o�����10�>�U7jR�r��_q����6�B77���>[�b��Rm��t�(�>�v>�Pf�� �����P���"� ���sc�Nc���r�3hr�~�Y����+"�wn��l4.L2�0y6��vj��xR�tHx��L�}^�`�p�@:&�����0����������24t����6zC���U���Q��?��k$������ �$e�%Z�{��c�+�*���������"E�DL���r��@��m�
�:M�U����������Gd�,�Z5��W����f�f3�?�0�\���x"]�w����G0f]z�@�*�#���p�lQW��~����T�D�2Z�Y�|H�9��4#uj���Z�C3����d������F�Q��e���������tk(f���f4��w?���^�)�$
�+v�Fi��D���4��v&�Yz�gn�������v�	�	������!�L�����*��u���"��RD����C���t��iR�_���-�
"I��O���XX�������q�(�Sn��2�\@��7~^�Z�q[(d�����k�����Eh�K7��+���Z�)��F&^��i�Z6��!_�C�H��)����XW�@z��bCzw�]��{�7I2�&��|�D(����8�v6$\���9��n�����G4�=���.��j�*��^6�����W�q4i��_���d��!]e;c���$�E�s�t7y�v�v�w-N\�<��v��=�,}eM&����nU��c�Y��r�u����wHn�g�p�����,�_hYp���v'H�E������bLN^�Mlz|�#�2w���+���+��E�&{���K��=�����9w����O����*�B�Q��z&��������0'c�����:}E����S5:>W~���$�7B,�����yS<Vq��I�}�C������B�����P�i*�my`��6
[>�e9�������"�q���
���E�1�����.��E��� D�b�A���`�����N+6�5�u;��|�h���J40���w>)9�L&Q�tZ�$ ��w$�ik�\<�Z�Nx��W8G�a:�:)`]u���H���h�^p�%lkGqa��4����pK��'��H4������U�g�s;�r/j����R�jE ���J��[	��g[k�����u{G����uw�Z�=�9�����4�M<s�����VZ�}���d^q����TF������������s3��at�?����^7���7��ay�����sP����Y�����R����%C#�SC)���jMM���E���]��S���Z�A�X{(rL��I�LU�b�U�_��z-��
@n*�_���1)��_?#���>cj���9y�O�`X��zN4��:��9tUAJ#((� ��U�6�I��������#��M����7
��Tr���&R��~����R:��X�q���;9q��(t�4�2:C��VE���jm�{�>p�U}�]���q�qDn����F|?qt����/Hzrr>������T|���P�����Gd�q�k������*80��9�d�,��6u7�A�����P7���V?�Q��~�g7�|Jg�s�T��B++g
;G'{EU�
�|<���1��C����k/������H{����d_rU+	�����������$�I��Rr<�>;�EJ��0gp����vt�l|8ng���#r�	=MO�y�Y�<-�3��������uAN��?������\�����=IC�_)��>Mm�Q%�������y��Ma��:����
��n�F�"��e� ��FP�8[�~,^H������?s�5�����%��>�2��Dy��p��T���I��gv!5s�������������89��	�N�` �b_���e�u��W�G���B��\�F�
x�!�`���zk����]2�F���AP��I(Z�>����"�d��S��-v��=%x?�������tu��
����
����-��/w�HC3-��s6G���LC�����c��Y]���r�� `�������S�iG�L���;%;�^�#7� n7����~-��3���*iF�~%5K���N�o�Sx),�L�W�������H ��0R��������H�-��L��T��>5�{L��D�B�}7���y�?��y��L����I6�]�m'`��G"}�r�Q�
`���U���XT����so���4�������P!���-�K_�L
�f�����w���dx%�)������A��CJy.��?��9���cN6�4��L"d#t��"j#��o�����s�����Y7��xW��|D5����7N^q�:x�o�-����ye��9S���w�BP���|(���
��Zv0�mx� �vF.�*�=�R����SF�����9����Gi;Y�3_��/������O~���'.�(*���6����'a���b�0n���B^>���u����������_��������_�z������<$���?�]�����~q������O����9��05���_�����8���M{�����02(���~����u��(����2
����������n^��6��Ug����}U'58��"�5?�����������r��qr)�b��^���.�2��rY����I?��AXB�a���������[x���7�	0aw�IF�=
��u��4�{�g9W�8Hi�+6�nnn�`�7���!)!�
*"E�[@�x�@���W%��G�X����
���	���9lod�gH���^?�SR�y;��'�c�3���P�C���4�?#������M���&N�4;�����:(oC���I_��z����E�D����t@W�"]������~����qv�����J���j3���#^�Y&���7Z-|�����!�j�h�j#I!�����e�������q����s��-�3�������d��-���/������u�n�,^��#^m�V[?(s,�U����/6���+6�`2���~���'nE��l��<��k_��+x!}~]#>�B���~jBF��)]mJ�����!C�7!���/ ��� �~<@�X e>^@j?3���V�~���_��7�L	A��Q+���0��j��E��X���7s2*�u_�[�b
?G��F����, _,�������X���Wq���5b��/�9~�R�m��$�����/6J�l!
d�K��2Y��|���� H��ub�B�X���f�# V,�*1�/�~jf��fA�|��b���E��X������/��e�����.6B&�XY�MZ���"���at5BRP::�����	���_�����_�<���6�|i���6|!���	���-�"b�������i��i�K�f��i�Q 3_�Y������iONL�
����g4�|!���-!����3�2z~Fc�g� e��/mtbiV,���
�Y���b�����/v��H�<_� H�����7s"Y/�A����(��!�<�0��A���a��Ec^,�*�7^�Y��-���x���_8BzA!���
DPH/(��P�z�@4
�f�X#"��*K?���q���8�H����b�|!�r2B?5�C�V�	1bi���E,���h��z��B��b�����gd_���B�����Y���@4��zhF�#H��
i�4�BA!���
Q��,Y-P�z3/R��&�B�+�m6���h��(X�W��q3>xS��9X9W��H�[d����O�����d���}1P�����A�~�����]��E�O����_�DU�����qW�6<?Fv:m�0���b�s��vL��1��
��q���l���~�������~�L�A�3��dH�7�e� �I`��wlQ&Q�(s,=�U��g�/�;�M
H����@�2}+���X������DL����"���������%���1�8IX�����F��T|����W��?��k�Ky�Gq5�����y%Yn^��6R����
�P)�:�/��Ljd�j3c�pb�����W�+��2���g�{��2�DZ���H�Y5��+���V�����1���Ii�#�������B���J��(�C���	�@�k~'��������/���k���|^����������Ss�G���i�t�����s��Y���U�~E�xd���cd�*2'�h~���GZ�;=���Y���W�+g�]��������#��z;L����&������=s`�0K��Wn�_��c�k���9�:v������RWa"-zW�_������v���w#��c���y�����i���	��t���s&��
��4F�����@��0DH:y�B�� �dI��#d(�Dtd0)�&A��MRAl���P���\K�O��h�w3���&�3�$e���$X��)q�/}J���A��e������C�9KM0�������-��S��W�b��c�6j�;������	��	 �i����s.�^�U0���aOz����c��3��v	;�h��f�BB?1i��h6��(�����m��B�,rvy���������v��D�kJ|����L>y��d
v.$5���r��_X[p�,�4S��d�O8��,�!\�m���]����C��;Z������g�����N��y]�,��Na���%����X��])�6��������������}�O��O�vS��"k�L���+
��n�v�g����%K�E�Q!�V�o�Zd��3d���Vc�q0ziR3]�&�G�f���;�nok�w��`��' �KNS��D[�A����s��'42���3����
2BWC^��,0[�c�������yJY�@��,�~�5te>\"�m54!E)�F���%o8���D��10�HqS�����Y|S ��d�T  3��Y��,��`����V��n�X:�bj	-�d�������v1�liU h,#fh)d]����+��DR
r���6�of�R�g���=H���=��Y�"/m*�w����H�����4!�~�jH�y����������
�m���9#����r�p#���A������5NM�W��q���P���1��P��n�G9"���(@��j4�-j�m
d����������P�eh3��o��l�D�V�6C�
���3����b�y!E�5���$5� �L�V�c�=
cR[���\ u�����|��v����p�_j����$�-�d:�<�l�	�cu>%�g;�	��� ��
��;�uJR{vbt2�<m�.���Xpplw��d�#J8���C�:� F'3���KC1	��%�bJ����@����V;5!�Uf��NgX����n��{�.�Y�[H���v�!���o����$)�Nf��m2NB(�3��<`�������:��V^�LF��d��#�Bm�	}!/R0��2
H&S�U���&��mm���p�s^�6��s,�`��0���V! �v�AJ����
��d���::��������v�D��0���n�B�������2�g�G?���IN�\a	l��\!Sv��\�w�@�g�����U���<��Y�0O{���uv��;8U�8��2d����Q|��V����|�$��&i1�� ���F�������1�.����
k�R1[�B�l�p��Yw�Af����1HmeE��~���� "�B���"Rl�L����o��_'� ���G?���If���%�b� g�@�J+� �]����W�O���f:�6���E=!C���-����e��Q!@��@���e)d���>�6��xOZ�E����x�
�s�E��@j�N��)d���X�[�f��
��c�*CJ?��6�o2�*�:�O�dV��HP�����n����pS�RQ�)���^C��N�@�����,�e <������2����(�,uJR{6b��b��B�2�H&;A����
a��!6C	��/��x>g1�<�y;J?rg�rLI*�7�Z����2�i{�=��0z��p�\�$[E^��oD���;y�3��^j� �v�AJ�KQ�T��, S]�@��sYH���������rl�d��\]~��:%�=1:�a����,G&�`�2H������V>O�tM�HiU�3����~*��d^�C�@f8��J�����uJ2� h���g�v>�8���y0�T��B�<	����G?���I������	JRt����|( ��
���d�:��[<���s^���q�&'��Y!���
H"@������S u���s3���FX?����9#�w���LV��o2&O�C ��Bc��M2������y��T�2h�m�:%�V��A`,h%!u���,������D&���K+n����ub�$������$����U���%�`s�n�kW���1A|���(�E�s���o�&��`�c���(@h�	�yS��������Q+	2�U����b��m�CW�(�]AF����!�N�A�U6��b:A���
����9��*���9��,m0��!��b3�7!������|#��	A,@��x1�@!�������,-�n3$5��]��y��/Z!H��X�"d�df[B�X���H!�B���;�|�A�*��0
����G��A`����*�gX���-����\'����[��gu�dl�����VRGw�N;O�_����6C�\�R6�bH&��6s5�Hn����8h��<Jk��r�M�i6�j��j=�����������Y�.�2N�A@'�T'� ����g:z�R>�b���!k��oH@s��������X���q�.$4}��_c�7�R�*D��R ��n�L�P������FMHq5e��K�FO_,>k�=��
���q�)h�l'��2$c���o`,��C�\�����&���X�ry�A@m�0���A����T @�������P�D���$�pn
��x��rA�s�ZH����i���my��l��s������B�L�s�A
�Z��������c
b>#���fP_RE���1����$x&W�S*�
n�E�� ���Vs�v;�o`,�'�/K��(K��]lR�1��!:wu8G!@��d�*�9��5���K�:CG��.-�\��m^Z6�f��y�#�$��5��,�_�A��9�8����4�s��.�(��)B��1�$�����V��,$�5��@�X����Wq�*�-	9�>����vn���)e���V�$�
�,���e���$�rp�h����s�]w�B��aJ��AB�n|S��b,G!�$�	�-@�3'�|�sW��4@ZC9&+ ��!���d����l��?��S @��*�����-p#��yT�N���� p#�U�Lu�2d��N�A�X����Wq����*JW�o�3�!kC������o�hC?pG`�
�����3,���;��!C��"�V���������2w @������x:�u!�����!Wn�Q�<	�M�p��}�����:��gxs��Fj7i�<����e3��m(���$ c�d��1��70���s.�x;����Yf��%��*��\��iY�7�2�iK��A�:C!i!S6[�����] �=����0�/ ���V�A��V%�&�s^H6��b���	CR�9G�;����f�MHm�$Bu��  u�#��K�fkW�x�3�!g��l��������  Sc���IZ� NY�������������K�/��;l�
��� ��,�r��h��j�����D�������BTsQ���Tjz=����R����V�|b�s��=��b*gN��*#�������$mJ�����R��K5�������*�jh^��X�"��^,��>��Ns\=���2�f g�/��||�#����6�Y���>3�8OZ�����������8��0�w/O�~U���cH���VJ�
kW�gp����:lgb���Y0��V��c�~]Ivp��(��Gt�
�t��`�I������
�p,�}�_�����V���U)SQ8}$`���H���q����i(MM#ELmT����=�oMi����!�j)F�8���x)QT����>��x\�Y�Vd�����{�~����W��z/x+r]���Y�l���F�9�����;q��A�$�~|@��� |���+�����5��M�$��\�)�!d[�=(9SK�
9�x��wnzXN���2g���1��Mk�=(KX)R.�H�_��D��?�"��m�.Nj������^QZ�<�%/+o-��������M�����g�7��jE���.����3I+�[$���4��r����$�%�8N����o�8��T���{=/mMj�����}��=0��������V������@�w`��#�>�g����Mns8.�3���'����endstream
endobj
50 0 obj
19951
endobj
55 0 obj
<</Length 56 0 R/Filter /FlateDecode>>
stream
x��}[w�q�;~���J����\-��q�|�D����	�#�@��������TUW������0>�����K��?����w�����|{��o�~�����gJu;5���_]WU���fhj��q{vs����l�k������,�^�����/���x~V��J���������q�������4F��z?�c���n_�������W�=�����azQ�k=�v��Wg�o��f���6�~�#o6�~$o���|0y�����?9�O��O*�v�<�������f��'+��������v��ne������-�-�A��w~����<C�U��63���jw���:?�o�n�o�x'��>�]���U5�2�>�����}�f8�Lm@��|�a���c�s��Od&���2/�7������oz����Uc�p��������~I.�9��_���[����<���5{�
A��3L?�%|H�B���s�)��f�1&NX�9un$���Ix"2"Z�
B����}t�0���3�[p/�����o��w��=_d@'
H���	� |:��_Z�td���0�@��WwMfF����3?�����q����B��EvpH��������wd*�O�-Y�L�F�&6��>��Db8y�y��]�E��5*����P������G�������/�?/}B��$S�����bQ����>P�������r?����Ru�3�}}-Y��t���@YZ�x�@�����cP}K�|��#*�806����T�'�Lu��Dhg9�	,�������wTS���t�>�u��}�<���U�(A`,�-^�;���H\?��aC�;b"���/�nO���I�z:�������`D���aC���������G�e&��V7�O�6~Q��9�v=�0;���|�H�=��8e��n\m�'fA5�
EJCEyb���"��'j���%�������V;@��y\@aJi��nB�����_2�E���<Y���`A���	*���(�QZ����}7����E�:�v��vl�"���.�A`�p���#��pZ����*�X�j�k��|)�H%����!/��5�
���$�R����@T�+�&/*s��a��FG���y����;~pZ��2����rb��<PN������1t��@FL�&�GwP_�]���c���o3������je�Li��9*���<����zV(�1s�
e+AV0�U���p���^Jp��W�D5���X���F%t�/PU���oq�,0�I������:hgDhE���	����4����a�uC@�GtY{^+y���
.V�������f~���?Q��N#Jvd�q�C�I�#�i��(�x�r�1'hm���F\�Z������h&�������L`
�>�z�tK�i��64��V!���e��"�E���yb\!2�4�+Jpz�X�?������H,�@��o���C�@��b�������h���@Q�;���E7��l���I�3lRGa�^�Q
���Sd�����?�g��|�?�!�Wy�	g��i5��#��Q? ����p�O�%6BQ�BJ|"�d�1@O
�� ��k�����6����W/�g��TTST������9�4�l��#���j	�b��7�C%f�h��� Z�cy��g��okdv_-���=��9D�|�����>��o�������~k$
AD��+Hr���.��}�����GP��#���(I�����Ms���	<��K�	�����I���`�����C�WT(�t�k�lM��3"������ah��?�2�T�PI7�J)��)��R��+��G����b�u�K�������D4e��R��bB�`��)���;
���XR�s�Q�	��J�����?m�R��w�/������k6�%
��g��h��I�
�^�h�����M0��Q��o[�j��`NU Fe���x(�O�LE�P�+�0�������'����v�S�����MQ���qf�v��p��i�[���T�e�YM6@g�����c���x���������f��}�w�!����.�����ND2��xR	��N.�Tp2h*i`��C�n�(%6aP}���r���P����\����B�
���9�i�jB��
��������f$�:�K6����v�^�f,��j�Y+h�H�1�k�~���8�#�A4�<�}aa'	�{�k%9�+�Qc"C� ?F|� 4k�OE�*������QcJK��V���E!&�Q�	�O�	1lBBu��a����i����J��
�c���(�n��;G�)�a2��1~q�����X���e,�>>�����&HB���c�M	2�7Cs%��DFo&/dE�}�����~�+/x����G
����K�:���C���qhAXx���,�H��T����L���[#�e�q9/���C�;����W�w8�V����W�u��-kM��
��	`;��2���$��
�vUdJ���`�])����*.��������G���T�T�r����{�Fw�K�
+Pea5�N�Y2�4@K�U$ Q�(�mK6�Z���,���)#�l��0�as^�����W`-�
5��b�2���b�W����rd|��	)����fa��]"���g�	"+U�)''I��I�f�W�&_Qy�#b9C�:q��"����_C����	�����
�eK��j�PzB�c���zn����5����Wg�������=�D���{�����!��i_nt�����Q���T�
����M������@���L}��)��+}�f,�T.���Y�9e9�e�=x7m3���9�?#��U?�0��P��L���=�c�%)��%�5��j���+7���+�4�<��W��o�F�-^�������gN*2��+b[���zp����oJ�wA)\�]@?�����\��: K����MPM��P����+��������������&�}�dT����
1�� }.���x�1qM��&$q���,.�*��t&3����{fE�59��U��r�0�V�	��*���������aTI����n,9�x��.�K^��B���7dfL.�j���GCN^w�"][Y����KR_�@�{b�+k�y%.d�%��#�e��j'�W�H�q�H2D>j'�<�S)�2N��%9le��8�,���v��Jh�������N�����7O.I�j��]wk�$F{S?0x�S������2�;���O!�{jQ5*�Ps�d���A�xsac[�l�XG�����w��&Wjq-h�
�g_ ��IrY.�!�$t�Ut��9ZI����(�E?��L�jQ�`�2�H�kJ��'�0����c�K�,�eC\�H���Oa}
���~T��4�T���'v_����;� ��C.���#�P5���A����1gy�������P�j�4L1�'�+wa��#�z��:�i���a
U�E������:�v���gda�
����Y��
����Q�]��*���=��l�@ZL�K�i�I�����-�H4���pq5+��f�������e�I�A�����.��b���F����y;,��$��Z�4,�&�WJ��K�g�V������b�r�{:�OC_�$������Y..�aJn,��x*�j��Z0O	I�lC���bd
,��>����X[@dX�	.%/��X���y�X��8V��%�xx���dsm �'�5��oV��q\�9�u6��������n`�~0�cb�z����Q�����XV�t>��;9e�1�1_�sD�G��M���8��.Q��Fd��H��k���@�HTF�k���s�[�P�G=�M��tP��+.(r�'����b/�T�5���/Z.�[�X�T#��b�Za�*�v���W���v�a�����i| �=C�g�<e���L���E�WX��!�:���a*�!�M,���w*���}�0!����_�B��%�����I��b�$�������Y����ylb�$�-��ys�%Q������WW��i�(U��S�i��j��
*w�@A�a#�h_�wE����I5�hzBf���XS6h�����d�����P�s�,e��-���@�Z��E�$�W��W�E��e����[In���|'=������[q���Y��U�X��bQ�'�3�y�]"��X��s�����P1�9:�!�#0�C�o)!�QDK�7Kb���������Uqm�@�d�J���&1qb���e���F>k~���U��.��=�:]�Y�<�R�$����j�Y��mO<a��3��/���[��>)e#�U���$�6U������c��=0=�w��}�+����8��|fM(����_��p���������sNp���v��d&���f/�t�h!4,a��j�k%�:�'Q{����#r�O����e����Dv��K���wF�o�U�X���C��[h]����LJ��P�.zn.@�(�
-�����S��@_�,���3��d�������k�j�S 
9��4���}m��x�t���\3a�-�{sca��C ��l,<��9�;���������(<�"����=�vM�w~2���2�:�8���K���L��N�s���Y����{���e�M�XS��M�����7�������.�-�KI�������:�2,~Do1�Y���-F ������)����������?� ���0�P�6�����pMU�]����y�6���6S��W60�P�ag�*��C%U5ow*2w����#�E��-k���������BGP,�d�_W�����-~}���dO�g�����n�h[�t��hv���(���>\���70�C�d�*T
t�!�
�Xk����&&|��xG�z���27��U�����R�)��5���(�G<*|��Uqsm�Y�*��#�T����������������$Ax��i��T�n������Ex��9�FJ����V��������Q�\9��q���w��u�O��J�O ���QM�>nT-���0����0"����Q
�#?��#<����JG���~�3x@h
����Q��� nT� 
KA�\7��)����6�~���R@hy�KhH�*U�'P��IVR6N
��������X�B��H��{��!% H`b�sr���w��(E���A��5H�/#/���T�X�F,�,���*n������N9���)2�i���(4"�r�b�K#B��f�?���
E�c�F�Y�s�/�p}�P>�(�
k��?�vs�5Z�f�O���-�A!�WU	����P����Q���Q<���U1G������>R(�$��+�R�z���EL��Q70��QQ��z+�K���n7�e"j�^G����=�%!��<��(\;���x�.�� �����k$��z�J�T���a�U�F<��!!�p�LU�+ Sn���s=
��;�)�B�j>����1��I�T�M~�Ux��BFej#&N���	1�h��H��
�Q^1<��?�3�����DO��~�u����Q����1o�,��aP�h�W���(�k�f�G�6�p�i!�Rc�fZ6�@��lt�/g��j��*�C����|���4��X@Kr�}�i(�w��\V/u
R�c�P��?oI��m�'u��O2��l�+u:h>N#�E�,�Ix�p@����xcE����	�f�p���$��!r��\)��'��� A��c���.,P��F~o�M� �:���~F��,���P9��9��{����
;Q��#�������a��u�WXa�+e�:�9��^��Z�0�R������!J U�&F�M5�^���q�[�hV��3t$ 
���0�m�.���g:������5G����Vc�`g0���|��T�D�G=4��
mo��0�phC��P(Bf�����sm�CJ[���+�8��#�wy$���\�b�����`C�H��wM�����e]�!����>b!����I�8�T�%�?1CnrI&���PCRJ��|��P����,���_.���,�C�D��q��V��z�$=����t-��M_,O?J�
^����J���>��!���]�%p����+MZ	�-�2(�e��%�J���$��:{&���H�_"�����'�-�!�s�/	B�h��j�r��B�G�ry�\��g��'��S�+��$���N7t��%���*yl�i�F�Y��d����������,f�W�������U����d���q�������pN�(*1�}�4�t#�������z��c�[����n���������+�t,y/UI��*���L���Q����aj�Pn�8m�������O>�)7�������n ���m�[����v�����7g�&Se���M����[z��qf�x����H��.�CN~������N\U,����/�$A3n�y�*V��J�.� v���be������29�M����=y#�I�GZ0�����sY3p�8v����0n�&�iG�Gv��e�j�-������H���+�7����S�$h��^�$[���z�&��iLH:���2��VWg
�_X�����haL��fC�D��#vM�#	,`�!���B56d���~\���_�n���|��E���aB���L=���2'`��z$n�G$����\y��w���2�>�>4������Z���Y���N?�hD]���R�Q��I/J��x�_Y"��h��=�������~C��u����vJ��{xe����1#���ny�p��'o`�X*�Z+�+�g.��$U�F�vR[�~����tp�%��������8�&b�����7������}.�.�u�=%�x��{3C�?�#��1�x���$��&1����m`�����4s�����|���]�@��S��9bB&BUb��D��k�|�y���L�?�K��j��
��m�li8�]I��XPu'O��3v*w�,������k
�VR����j|�w��R�q��e^G�%�{b���O���2+���������4���J�TY+�����m�����
��rA�A(5����E�/i�tIhX���]N+�;����1V#�{���f&p���������?��6��6p������A���`$��AO���A Z�o�1��[����B
�D3��S������(cQU����������u�Sc<r��H��q�O�3������B�U@�}7��Z��a�i����C�i�*��s�I��^�W+�v9��l>�!9�(���D�G%���Mj��a��[HN�
��3��!�w���ngl�}���<���l�n7�S.��\���{��1�	���j.���3��H��Y��j���W"�aj�Je��V���!����Q�=B�E���]c�i<���aa �T
p�r� E&��s�o�9r��?�F�G���"��f��9HMr(�`�j|���(�`��P��(�F
]bT�q0�E!�?d��`���0T������W�M�F���?J�:>��
��SZ�v;�����S�F1���.��p6��C�gZ��z�@�?�K����\q��v����?M#�%;K�;�Q����(l����>:u#C�C��g�u��X3y���4�\��T�)�K�pm�p��0M��je��4�I'u�_H��������BeG�zV�`����[k�'�cY�W��������e}�R<na�]:/h�O ���+������Y1�b)*������
K�47).W��!X��.mw::d�Sk}�Gq���?���=��=�e��;�{�H�
�������Cb����V=��\I���\�%���jY,h=<\�o����o�\���S���E�����Cv��B�5�Z�'����z'�"#�a�E��2q/�Z5c4J���N%��J@���o�^����`��\��dr�(�P���FZu�}H�M��$��6�).ql<�xdphQ��<���6'����K?��+'J���[��vd�������M��k��\���0��thR��=n����)�X%Z$�X�O��R�}��.��H�diE��Z�F�1���S� Y�(20g�#�x���	p�p��kf���*h�8�8����|�	���C����37C�M�X����7%��	5}x�,������DBx��%����*��UT�`��|�]����P��'������N]���n4HcK������+����#�i�U�'��B��O��j�)�"����!-X�W�8���|�`L0��%��
5�o�8sG;�@sl�e�{3v����I�{C��j��!�n��.K�������mcaZK~MV�'.x�ch�S��:���knM�pW'tP�������bR_��*��Q�86]�S�"D�m[_|,fG����`/VL�����7�Ub�CV��lE��<��e�|���=�xG{,������&� e�A�d�!�A����Z��/��
�L����#��	���o(G���*�Iz`o��������z����0�P�����>u���0�!�"/�o`���3��&K
@������)�������[

���@	�^C�hk�CK`Hb�..���fd����
:F5�2A>X���+��F����F6��Zrdx�'�Z
�L_S�����5���HP��W� L���x�<�O&g-�e��J]JZ.��P4����e3p�>4^%�Jk����� ��R���-��Fj�4����PA�K]�����V��&-��.�j��-����4S�j��
�p�L{�&R=[�2��$c�����QMp���X�#zN����p�Qt��1���*�5PS�-eE������R)$#u]���.��0"���_GYb����Af*��������b`An_S�6��.���/�Th�-�H����3��R�Y����L��
���z��D
�
j.fW'0�_GN��������j1��:�z,>& ����]����L$�������I������+�(�Y�5'8*[���TM	.���q����<Im����
��7��$;W��
_�A�=X[-4�1�ZI�f4�"[m.��m�6�<�9�wJ�����"v��"�m=�P��:�<i�;�N���O\�I�V.���	������(��>^t�u�j��	55$����a���#=�0���R	D���D���L�<(?�6���_25u�(�����D��RWZ�"N��P�YQ�����k�%~��CQ�03�5�Z���='�������(f������T�ysb�Sh�P*�d�������1e�H�8yO9�E����
�)Y\L�~
Bt�`��R����f�L�u3�_�uM�T�"Oy��T_`�K%W3�_�5t�^K�r���e<XS[o���V }����Q�>I�'-Y�^��;,(-��_�"����G�s
���S�H����e~��n3=�i�r=��@��Z�a�L�E��+�V�p�O�?�/�	[o[c�%��Y^m�^�-��^@�����S.db�+
#Ra���
��!�@G�J�y�j�����*�"BP�pCK�`B���&]��
�1��*�A�hU�d��	�vt�eb R�
�C��k����Ph8~�Ex��V�|V�iH�g���@��l
��"��<5OX7�vo�������W���<�{6��4ys��e4/�������`l���D�����?z����n��3�r���L�7�b
�a}�Ca9�2=A��qpF�;
����rOn������������	j0��$�WC5�`���������j? �b i�\����(��K��6-������W����*����^���1�h1tL3�DV��t&V1����,E���M�K�����2�m�:�y�����Rm�7d=��9��t^{8���I��P��4\~'��#�	a3�����*#ZVk�`�
�4��XyaI���]t�b7�<3�t\U�Z��A~��P��
x����g���+�7J��2g3���<�����{8��
'�B)�SD���~rI�����+���,�.�$3��j$�-0� Hp,����9u4����K��/�u�K�mpb�2mP���w�@���#�|�be[�y$����j~�� C0ng���K��5=�m�T��RF����O/9E}E�"eh��"&� �q�gyE�*#�:�7���$�36=��K�R.��:���f��B�'S�<�f�'�o	���1�R�"-Q����*��e�`��A�����������xgP6��6������H�YJ<�����1�N�zl�*�S�
�	CuID��}��H��M6��-�ud@3>	>��OENQ�MV��Yk+��r^���*qch=��X���4tHW������B��:)����7
�k���o������=��B5���%tj���dqQ�:�r�'<:�|4R�t�UDN��!����I�*�,���}����q1Ql3�#�_d]����������8^[�P�]�m^�*�T|�5����f�SBPK�;i`��B�`<$�$��Fl#l���ti�����=jX���}=� O�	�����O��Q)7�%��e��G	L�*<��O.D�&�J��t����Nh����z�+..�z�����3r5�*k���\�E O��{���2>[tM����XK��1���d�h[�aJ�`�%}��y.U��B����~C��8���9C\�����Cs��
���7Fk�"	I!�OR���{t.	:�[j;B���������!�t����d������K2?�.9��6V��P��m"}60�w����Mp���I3�B���i�����la��P^��QY=�tkE�I�M��)0�n�Ui���]Q��|Tf��W5��v�)1BvZ��G���w�+K�~��#l-�\A/T��|c��m���wL�ht5
��������v�8i?�<?p##
}Z!�em�^6~=����������R�j��(e*V�p�z��Eq���r��8�l���7p����b�Ji��",���$:�CTzMLlL������#��>���I����v��T%�O������EL5$r���K����"a�_|
EB(�4�p�X�][�F�0(]�a�����*��9S���"���\3��%A��*t�}^Sx<$w���I8.���x1Fh ��)�Y�i��A�e}6/7AH�k���~_��[u~UVfC��.��n� =�xN�D����.������z)+����b����R=T��p%m��i�qu���}�y��VT����4>G7>�{�}e���&��{��R�
����Bo�1��m��c��%Y<�x�1d&0%�v0�������&x;���Q����28������i�{V���_�������6y��'��V������[������X���k��B'p�
xJ��v���1�406�������w�`�F��Fj��&�������qbR_����!,T�\G3r�k���[�"h���8����XG�p�UK����}&����k�fbPFn#���}�t	��Ig7�&w�U�w��	�d����z#��c��lqgcP�T�y�����J
���6�����)�F������iT�s9�PG�#F���z�JhjL#
��;�������0�G���bC�1�/H��f�	�����fyl���,�e����P�� �`�"��|p����E���5��
QiYk)�� d� g��|c�!<��
�TD�P��M����s�A��
*w�{��1�],^#�<���C��T����/ro�h�hK�eE�2�}�j�����[�`t�7�D��hkNvt��P���N�����q����J�A�<L����k�:���-�0�E*:��q��L��2'S�����Iif��.�WO�m�!)������r��z%�L���(�ry��,!������D�>Y�\,�B��+������On�%���!�o�����4%��B*b���B_l�n���.~EE��9
,}G�X{�E�8�F�M�������oBp8���S5N���Z��fnL�^�j�,x��i�}��d;��#�D����*(3�4-z5j�+b��R���;�BD=�D���]c*#�~K:fnd!��RX�S�V���7����$e��-=����V��^����DVW�$tn���2eRY��FA?s�6��p>��w~���H�������7D�+F����J>�����M���S�X:��SA�q�f���?�p0��M��d�Y����u��"�2�s���t&�jHK��h���4K���u>V��!��y����=-�^9�CI��������X
�(��z	C�W,�}A+q�[G5e��>���aKpM�K���+3��h�����E	p���j�KVua��E�]}��>��2(�UB�gFd�.���AetC�B*}��v��.�Ef|���hB�XR�%��b��O�������Y\���F!1���/�����T��X��
�z?�����WO��8��(�t��������Pq�83����w���N"���&R@(���y ������A�8��|qNmjh�z$����d���]���$4X��r����f2���p�.1�UQ����Zz��n{x���/\e	��IUR��$�"]&m��r�]�n�������Ei�"e�<[�ZLQki�!���j��Ep�J��8�7��C���X�d�b�VC+���};fcb#E^6����r�+=�)������D
����+�20"a3X�<D�,k��V4��zl���
�e-z3&��ag�M�B����6�X`9��a�r�O{z���?�����o��������H�rF=���h� �XQ�@�Yx9��bA��+�3P�����&�v���_�QR�+<����u������e|���Ie��O� `Sb�����l������M�;b��4�x��l,�"?��iy%�.
�5���SGb+����pcA�U��TU��2�h��{�?P�umx�fl�������Y����!s�d#����r�2���]�������a����h ������(_�O������.���(���b��-���yRe�����~F��$e��jie�XUu@��W��m9����c���������K,����&^w�X�c�����I,���z�k�����s�$�3�l\o�J$��y�){�-���`5�,p����0��`}hW@ �(�4�����@�x��
�:�
B�(��(����DF�:c����hC��H-=|��Gw����S�(518C�R���dn�]�9p[������������v*����+"�g�)&��(�CRx�!�S��5��Xj1rcI;]�X���T�J��jQ����v�9���|��C���X��`AFk��%{C�w�����&�T�����������g,�z
��{�6��'��
�DP(��������������;|����U�Y��\$g4JE��j%���a��W55�9��PC��`��`�	��K#�50����Wio�P�/	��=�4\_>����gB���������'����!��z��[�%2:,���������`�<��/������������X�������P���usFi�����fq3��7�g�+��.D��eP����u��Yl��h���f��B��r��@i/�R�%�A�����Ik?@�%��'���65��yIy=v*�l�q�����/��'����0�������.O�mDR-��� �������Uf�����R��o��^&�{�3S&iX���-�v��5����z�>
?;���� w�������������F?I�����r��(b�}C�!���nY�L&S�dW*�PZ���:&F�uV��R�F!5�|R���9�Kc��W�!��7����/8���W�	
K�oZ��t0<�:���L���p�D]=�!����T��9(/x�p�!p�M�	�BE/1OV���)����G������	��ZlP� 11J/��TN_�����YB����S|��\*�\W��L����5��E`)
R� g��Xs�@ ����)&������a�OE�	���+��Q��W(7�M�y���I�3����G�����S2���ig�������}��kkyp�+�q��!vs�8s�P@]���\L����&x��+~'�g�]�k�;
��������5b��T���@,�(x�,vx�I��!
ea�n�g�#>�\vq�Pa
'�r9+�q���^y2!�|m���d�M �����%UL[�V��bL
��S��,����G��df���z�I��Jd���/�E��
��Cv�l��U$��m�:�9�P�X'��c��6�d�������V�G�&�"��N�L��v�R�)���`��b�����<L����&���~�0���R���B&6r#n�\%�>�t{��L�C���(���4��u�E5�������}m��xV��\F�SC;T��=kj�xr��M7?	o�'7g�����j�}�'3����M���ow_<�����}ctbj�v��JE�j�Y7�����b��������b���bm
��Q��32M���*q��=�	��������JV��>�rI�!�
E[�p��K-�!���:cZ����^�b�e��P��M��h'�&Q�b�`G�[�j���D����Bs%1���k#Z�C,���^��i6YVh���;ep|�p}���,��4BWU�0���6<Q�vO�[����j�������O��z��l�V��O�0��?;��q����f���}EF�'hT;0��$�����rO�(�z7�=�F��c��JVFu
9�����Q�V-��Qm��Q�	�����QM5�xTxG��Q�	����������8?��OT���O2@�o���B��<������jN=�F����X���3n�!�&:G�r���=7
=�F5���ws�wcg��@n��0��v^U_�az�n�;D��QF�����M3��;��u���>�?�F��������|~�����g�{� f�;T����=������3�py��"���i���O��M����'���(Mpy�6^
E��������
�(�\�5F6�M�������y^���&h�����+������A�A�����*����
����L�l�t�� �zbc���c����tnL;���l�D5�@[{
��-c�=q�����]��&�A�{=�U$��`�S7c�3�q�'��q�H�����8���������
�n���&
��L��	��]nD��M~��l���� ��c&�Z��@4=��7��{LjT��0p�xLj����hz��o��1�B�Cy���o6����������C������,@�1�Dj�#�1@��Mn�~|��r��P�'e��N���meD���@�\|���2���{�g
^{y�8w��g�W�A
;���g�w��f3�Z�,�ry��Q�"m��Z�,�:?�?��Q�"�WU0�Y��T���E���Z����`��0�d����a�m���aJ��Y=���b��Dy{v��!��v�����.�TWnV\y��
�����*K�l���i�Q�_�V�z��J�������g��-�_�y*�S������W�]h�|��[��p6��6�����aX������y�����������khG�U���0*�l�z�	\�����mx���G������m ��U���<#^!���lM��Uf�M��c6��y�U���A�&�ylh���uC8�z�0kG���~�W��}
WP�>D8�����><�jfQs���H��/(�j��5(�c������=�0a���6&������"�[�?TO�y+�����vUU�5��]����j�-5���'3[x�������g���wW����q^���]]�'��IpT����e�y�D
���
<� �p���z!Q]��c	����	�#�d{��z��������c"�S"�N7F�����|�������\~���/���|_.�2��H8:S%�������'�p5�E����i�f�����Z���_{
�`o�i�z�4l��[��.2�AHc/<
!�e�Sw��7����|�v����W���h�	���)����cS�endstream
endobj
56 0 obj
16440
endobj
61 0 obj
<</Length 62 0 R/Filter /FlateDecode>>
stream
x��}������|E�l�x����P�G'��u�w�"O����3�p5��(R\�/���T��	$P���r���QU@f"���i�\���'�����7��{���OW�������1�����U��g�_�]�e�p5�k��?�?��vo���3
��_��}�����=����n<b����&��;�b7s��{������w���?�������Q�w>�����N/�A��n�������#O���9���������L�i�9y���	u��������|��C�Y4x���u����~{����C�Z�����I�`��*n���w����������n+|��i^�C
5�B�\�<Y�->X�\)3���+�Nk!fh�z(7�Q~[@R�����}�����7Y��mx�!�:�Y����W��"bAq���6}�nW�v@k�9��M���a����p��>~�&h�A���d�a{h��c��l��y[����>����|S|���"D��K�>���f�B��v{��5�Tt��t��I�}��<H������
D����/���6�����`�~�_�%��~��|���+.�3)�YL���A�5-����������)��Ti\O�*i��� �t���E�,Dyj��7^7�QlT�x="�a�X���� ��N;w���v:q��
���a���^���|@��J�c^�;����1�~�(�"�p���]c�?�1�Q��(��m���~�AMx�%�g�<�W��^��!�����5��x�.�& �ay{!��c����9�X�+�H0�{���s�k��%������/������_#=�"2�����b�U�L���M�N�w�e�|��@��e���I��
!W�(�b~m���kv0�#)��m���<4���H���4�����>��#�~,%|�:��g�#�k�Z�H0wlw�d2�]������[.����5�%n��l����O%|��N�0/��%|}��q
����b�� m�HR-�,��dI:�����b��v����_����w������S���)3X��7��8s�����W�x���?AZ�	o��P���e�0�����l���O��?��3�F6�h9 �Q���gW�
8��H7��I��O�#p�?�xR�����IaNrj1)��IysqR���
;s#K�I�`����p-5��G����8+��Y��Q��7ia�G���e�Ox(4p�����	h���f�V�KFk��p#"��3�>�w��� �4b��r#����4�N�A?"�������m�a,�������������P<��3�p�o�v�<�=��#�-�����a��g�#�������8��h�p����g��#���wo�o~���ad�<���C��8�X>���SF�Fd����^���<2�8b��
��:��8b��W&<q����I��p��s<(��w�1�{5�6��c	�!��U�F1�#F�yd0�^��E�qd�~�u���Mn)W��s&:�#�E:53��ad�J�)E�D����u�A?"�H����ad�8��1R�Dq~d�x�(�c��02Q����@q��8�EIqJ�'�����������Gx��D;~�,@x��1�c��G
�T!6m�������9��2�02���t���f*�#�Tx���M#�=5}+���<%�$3�T��1�����y���!V$u$�p�����I�]��9R�=]��w�k�n/��=���,�5k�6-�r/�K?���fY/�T�����qt�����g��9"c���[~��$P��$��Y�[����.�t��b�,,�SP����l��o�w������{��q�������Q�p�R� 1�������"�5�?�V<��{�:E���?��+�w��!j���R�}��v��h��~�������'^������xa�Ow�3�����1��*�A1���;	z�|�������7�������������z�F���wW��3���0��h`���I�ff��G�g���c�T�J#[y���r�� �)w_r`��������k�A�a���P=y����x���J������=���
mz��	Z��D�lb��{��l[��.�M�)M��L��Y�e�))
�p��Ag���zu�A�w�e�A���h��-��w���^��>�������������gLZ:H�*�>�#�y�G!=�F�m\�`RG�v���*��m�e�d�MP��sy�������c���\��>:�����Oy���y��HGPA<�6�:��A��x����s��h|�vi�5	�,z_�tD��I/	����n[�Z�/���9�'�8
�Q�m	���})��������
��y�o}j�!�
c�E��K���c���IA�����@1�k���tK�
3���ubB�CvZ�����[5b�wD7�3|�# ��-(
3y>
���������7���"|o�dn������x<����w���y��w�D�3���
_��1f���;���S�$���EP���v��!u����&
c�`0U��B�uErU������X�6�r�k2l{�D	B��gg�IQ��A���18�ho�Z3�H'r	aB�U/�����Z�*��0�P����2����l����D�����y����P2]#l�8�0���cl_%_�tfjH��A���-c��CA;�T����3K0v=�]�	s�zmy�����#>��}����
��0��;&�V�>�E��@yH
�@�����R���.?��p�d\�]���@�C�����
>��T�n�6���������������������>���:U���'J2u�[H�:X/���l�Q}�>�K��R�H��kC%V��S�v#��SfqN��i.���e8��
WE������y[����b��a��E�����K[/�������gI4D����
����l4�9�X���#26�Y��(�Ql�?]���[���61�1Z��j~�+���A9�|�f�Sy��r+}8��/���S_/{��Q�~	y.�=R����9�^��B�j���������v&W"�������A���E����Y��W�U�����.�1�R����l�������_ ���@��C��A��%s���\XX[���/��tb�e��N:A��P��6�,9YV16_����	�{��lZz�jdR�c8oR��F0,�������-;�5��W���1Ce=��>HMp���)�)��X��G����O?��S�`���r��+��B�.&�q���j��9}��~�3��;�N�������2�R�Mng��e���S��q��<�#V�w������-���+��Bx��,�^��_����O�WA���.����)���W�
x�#o�[�"n����� �z�RUc��H<lO�e�8 �?���9�x,�X��T�X2+����Sm2Y�E�������h���!d�
 �dk�Kq�JK����}=��C�u7�}�y�������&
x�[�����~�����x�C3��v���,G��]WIxSrT���9K���CxW��}_dA�����(�qJ��V���?$�xxc�OVv���UE2���@�Ba����Kx��S�h��(���%|O�����s�h��=W[���'��G����5����F��7
�}��J@�t"r\�$d[��2����z>��z��#��U:P�\nJ�q9��W)Y��9�-���6��kH������~�%��e��D�M�'�U}��J_��8J�/����(��nA�'���5�YQW!���+Q�������d�B.�l������g����)�B�FE��u���#O�����&*��X���>��m�:��
���K�-JVa;Po�e�S���S��/�0	������K/�_���������_W��z���t�0��UZgl��-Z�A�_K(r6��
M0nQ��E��NexmG��2D�2�fT�|H^NS��M?����A��{���H����4
IT9M_Q��H�*��$��q�|vA��I|����J��	K&m���v��C����Z!�=�0��-�/�L�.M� ��t�hG�wU�Gc�k}"�*����^h�Ug������4H��������S�T�V�~�_,���7U-���7��O$��>;��SQ�)#MaYy��w�'U�zYJw(��
��[��P�{I
�����`��g�zD^�~����k���4.�j�)+��|�
�'-�[FNH+0�x����X�%u��^�+A+>jr������r�/[��������*�N��{/��9�)�Ma	���Vki�Q��Fw����w8�����sVp��������� ����]�J�MD�_�P�#
:�n
a5&������F���i�C86SXp����e���|������qq���M+� V���7�O{�
Yz5�D�wM������}BLk}��C,��`-YY���M�h8:n	�hT�K/ ��k��{�0K�����h���eaaD�;��|p�w��D�2�A�����9�a���U����hOQuJq7��~�N�Sed�e���I��"��6��m����������\c�� $t�I��+����-	35�e�J��ht$���i^9����W�3�a��w:p����7�B�Q7��Umg�7�_����;=����z�����7�s�|,dG�� ���u����I����v���5|��ldH1mk�	��p>�`�-�aQ?pML���1�Y;�L��`�7��oH�<6�oy�f�����6��G� c~;�F��hU��
�]W������������(qX}����4@���U�(Q����Xa�j����NU���}i��S{��)�S�K��i	���N�����W��q����j5��L�Ft�+D �.���`�uI�dm��P���#EU�?��,�R<z]��L��/�\�I)+��Rk�"���L������f{?C|s}_����H��i�w�c���
��H$��D,?����Z����K�=������hQ�B5_��P��.`�#��DZ�F[�"�8q��>I����+{��C�r� �N:�����bO)��u����8o�2�l0j�t�k:	/ME8�	�}U_�d����lZ����J���Q�Wt
��.gO���4�G{�IE\��|�F����f���_A]��<�h$_EYR!C���n��F�l�J'��S�tx6���+���.`���v��>i������Hp11���o6U���t���T��!�C�!�V�A�v���M��9eB���>�����#�������F����T8���K�J�Ss.��}���Cy��J�7��s��'*H���
b�Cn�0��&)��I*�������0'
��\�4�VrqR^w�4��&�_TY�jwq�L��|338)��%.����{u���cn��G����={�r`��z��/�>}�J;C)u5��p����������#��|#�����k�Pc�	g��!vE����t#:vE��	#�'����	��<�=����{
�K�h��5*�:kq6���Yk^��g�W��q�����y�6j5���;�;k�1���GK�y���U�B���D�v#v�u�'x���!�����G<�t���#�\�����Z��U��] 9+�266���GD�8�����nj��3��]q����F�U��^g:V����n��}���%�U�H�����(aZs4�#p���A���K;���Q"3�t(x�:�9������R/�[���z���s'����_��)� ��Jv�WV<vn�$l�l�t������r��n'�[���`���������t2�|�C��@�����<��^�pc!�/���VA�w Y� �o�����`:oP������I�j���tY<
�,�>�44 ���
V{*_���
n@�^K��H�Y"�����wF8r�G���	�Si���4 *i���^��2�A���� O�c�"������V���YS1��$���f�mK��
W�GET3�1����p�>
�1� >�Nj+�j���D���))�������������wv;Lc��3��q��}QI��O����cAS	_�Id*�T+����0r������<'u�5F��D)����RM����m���>���qw��.+r#"O}�0���O�h�f#�=8�=���qQDc�6*��M[}�"!�Q�F�>E��%�,?����#���IA8+p��%����Fva
�x����23�?�6E�������������}�&����q�#d�l�e(�7����������~o�|��/�~�E�a ����-����Y"=�:����(K\R/9+i�i���JP��e�QE����k|���*aq'w|Y�2�Z�d
J�����NTL*�%�������Oq�������h�/[%va�B��:D�����a0��Q�8!���T����T�]��C�l�a���W�}��h�C�`�GM*�D���}Y����A�������q
��&��1�Z<���}�z�!��P��dC71)�1��o���
�Y��/�������u�#{�D�,v)��t���.R�V�HDX�����q�D��#�f(��5i�%�1��K���*d$3Z^:���K�����U����q<���A����v7�eB	�xJ�����_WA�f����Fj�U�>����!G����l�B�����^/Z�B����w�h�-E�X-���/�Vq��:�����X'k�����'�,�J�py�B*��*C�tr���;���
S������;q���U�V?��P!v�����h�����.��sV+�V�QO��Y�n��D��O���+c�
Y����n'�tp��[��������D�v}���UTj+�n�e"@Ey���m����81��~r��c	wwh�\�b�x�6]������/a�����Js��u=���/��hv���tZ��L8�B�L���P}di��:4j�6J�t�0�q1����(sy��I�����=s[]� M��]������4�zY �6U�#_+I�+�6��s��
��|��l�[���W� ����3~��n|����6�(��kg]�o��	��g�}Y�iS��2��"�[����!D�������������|���K�����,E[~[�K�:�u��nY1)�,��/*��c��;wV���m�(��y3�e,��Rj�D���"�#���h5�lil_E���W�
B��bpN�B�E���b�^�}��6��X���U��uNX����V�YQ=��]��=��Q{�?�K�6�.tl@{�2�]I�t��\dP��2����w�>�����	���0T�C��<�#O�����	b�S�52��3��V���Q��CD����� �}����n��;�S��/�xsMp4p/m>���I���� 4L���i+�_~�H�7���b
�m���YP����ru~d�2�HW�� ��l�`�����Qo��Z����6i���2�����e�K�?�4�QNT��P�q����~���]a]
�7������I�����),�F8�����E��<�
$��]g�������	t��|6t��T�R
���4�-��]}���h��p��������*>
�z�G��'	��������G��:����W6r��p��m�"m���0�[����y��CE.v,���7��+����$LqT8[i�%�0/]7}�W�!Q4t{�LS(�<��\�������	�*�L���#������<^T��+�����j���C�J�-���8�
�	!�V�6��W0�b�����I���u
`���X�h'�3���_��U�*D���X9�K���;N�F��8����Y�_��_��q�-s<D����L'
���}�h"[���]J+u�n�7��38K��N��
��v�&����`�g�$��5��|�.�|%bc��L�_+�Z�����U��yL�}u�P��W��Pe~$�����S�t������m���|����c���^~���}K����g������F�$+f(A����X6��l�W������������Q�����	� D�pj����$����or�n��D:-g�����^����l����������+f55k&�2A�U��*�v/�uId|����P������B���&���F��
j��jeD������y}��!`�,�����@�=�����l�O��E	��b�t�')��&|�q$�o'��L������������]��OaX-S����+)-g4��%%��/�0��:+������c���!���n���.(����FX
R�}�HJ��)m�N������� j�RxO�������;���r}��$����d���O�>Jk�Uh���(�,����[{���
=���s��D�L /lE��������s��5�<m���l@��I�h��bH}�7`2����\�M�]������;�um�,�R��k!������I�;��~Y�� S�,�U��*��%�
��b{�W��uV�����/���y:&;$��2�=����.��h��RzB��������^�W��W��XO���$�CN�C�R3�i���X[4x55a�������0�~-�FV�$3����P�����6��+���>����w�&�����*`���#��Z���7��@�,�����ZU��X�}�X"|wtL�?l�|2}e*\����x#+������<���y���^�(J���P���
P�>�Y��d��$"�}�H�5:K�E}�������(��Cg/7���7���W@��p��+��X5�$�)�C��	:�dj>��%��P,U�*�K�������R����>dS�$V3(O�mC�����(�Q�[�5��9:�JY��(�����N��
�jH_��y[}��#G�SI��{OK�.�~;�������4�g6DU,9N�J��d��������G�sQm�������C}9
����Z#i��KG#��x�����8��`���������������������haX��������bt,�#���UW�Y��z)������6T��P�!g?���	���"
�y�N��&z�bk��* ��!���}��_������	��`��B����N�������f?YY7�p�VY75�z��hI��}��&n��	�T&k:w���i7����+�Q����e�j�@�+xS�G1m���V�b<��z���Q�H�����t����������]?�`�f�I�����|n�^4�D��J*'2W���L9EC��j����!��S�6�U��1��H����NB�(����S
m��-�TB�t�A��O�	���T��W
�$2	���~����D�3�L�a��8�?AP����'���
qY=
��������Lb�&oVW��\��VBE�+,i�M�1ove�`$���=�*����;zUJ���=V�b!A�	�I�>��/|�]?Fh5<��k���E����0�)�y��&�!�H��m��%����6m���+B�r(rS"
����\a�%(�������C���ON�6@����t/��\C<�8���e�p�:!}���Q��~�Z�[�P)�h�j�cz\�Y]DF����n�i�r���35�8������������u�0)���}����D�b����FPpJ���2�16\����7��X|���7��Nx����xi$\�H��/0��+��b�cY��K�!��B���RP��%)2�d��7�E�#�2�������sYa���/���F��X�E�k�����<�;Q�Y3{��>��*_�(������]NJsD�N��+eRj�C��+��)	�a�+Wr��/����L��K}�u7�	Ee������������	��f'R]�x��*Y6���Qq�&�+-�o�x�P^Y�#���*tP�e>s���E`I$�|K�X���)R���]y�������n��,����V�������n�]���9<��W����P��:&�u6��Fr"i�c%��to*?&n�$T�h����9��w�c��6(�]�b�B�G�W�t}�>�v�5����ZV-�PD %�e� ��{Bt�� R��3�RN.V�������	��|]�[o��F�����/])�KU����*�e�J~6��g_T���^��6'[���&�g�I?Y�����|���$w��~�1����e�`Vv�I��5J�*:����7�B����0���L�S�io�R$�p��C��Q`��b��P�4����k����d����)lP��T�vs�jx�F��x��������;��1I:�u?)���k�>D���zy����wN,0���	�A��-����^�J`�����T���p ��1=9�����u���;��"_!�M���k��)���b:��-\���^�+���������+�����j���rH:��������6sI[�q��B�0����_��S�Y�A�qn:��%���pt�bQ�&�+���@�0����`Tp|��k�W��0'���(E�|�2�5J����"u��E���HO��bn#����]����O�z�K�%A�jK����Z�����h��V�E�t��{����%��y�O��(�f?,j,�9��������O����jM	��;���#�����|��]O5��H4}Q)������M�b6;C:�9���(*FD���� �j&"����D8
�/�|�	�Q�����}�K��{��n���,���2a?��[��SMd�M�*����(�6D��ZKY0,{��q@�Q�
���	`�]TkSf�iu�q�&[rRt��x������RAwq���uZ����O�w��v�{3_�������>��#��3�/�R��'�S�>8���9?���tUYx�[4����_�q���Y������s�3�u(�U�'��D(����=�w��*�Yh*j�{����c�WKU��,�@Z�>�(��w��#�}j�y�fVR+�-^�[��;���*����9�|<�rU������v�A� Y���e���5ORfC;f���$�vP�;�:(�yYT}`c�������5G���2���`��Q����b����Dg}��?���~{N)�	hQghi�}�������1�qu�1�;��\�/O�@[���bn����^J�R�o?�~|?��LY��{�m]���D�v�7MV5���66�c%�A`n
+���)��X�v�l�\���s�
*�WX�p�h~���8�������X'�hZ@����g}W��b�Q�5tu�EU��p�V2�AU�q���ytJ1�8��R��W=Za��I;S��
9���0Q�T��X�>���~!�(dU�?b5������C`�������&����n���l�w�R>CY���g�*��]F�����H2�
+��Z��	J�FUl�y?���E9�N�X���M�D���I����a��-$�~[�B:����N8m^��#Z�X/��|n����
d�����%�'O���:*�����q#d��FD��H���X��E��^�(T>�����-FM�G����yQ���c��2j8�PP_YT��S���i��)��d}����%����<`O�o\y�R%���%2r���(�4���U ����o|���q��d�y����zf�d�.������[�W���Hy����m"�J'{�`������l.G��o�����������@e{%�t�A�V.D��'|�@��S��=�c/ ��������p@��7�b��W�P��cJ����������q;!������~"�NJ,Q��"}���
�c����fQ@! ��/pf+�y����������[���[����������<����1�25��mK	wD��v*��,�Y���%K���-��2�P�m����^q'���z!����L��pr��:��I�s��j8�h��7�MY�C��n����%�#���o��

o�yt�`%il�
����%����9 ~E�W�_��`������e^�X�G6���J�������	+<	��Z��ap�.�;�����|D�+���CyN+�C=m(?tb�9`:A�y�t2�D�:GM��Y;[��M<��d�5Nm��Y��F5��PX~����N!a��Kp�l�'��$��"*{T�������?".��I��nbc&6�f|O��>AkC�������1���mr��	U��F���W�
����wX��'�.����=^��h"��m�<��$7���8<�m���c����������}%w80;��V�v|�n4n7�����z��W?]	m��$����WJIG������#��<rw������'����~������J�\O/dZ���#��y~���p��c~���t�Ya�����!n
�+f�8�~�G�Y���4���,>��0� �4���g�0KYU�J#`�`l1+��Y����#`V�Q��F��?{���!��-�T#�O#B���Ti�A��<��	pwX��JP@c�#�il�%F�y��J�:�y�Y��&F3V����YF���f�1��qV��i�����L���<���63��~]Q�46a��a�C�4��al��%=��ff�t��i�����cQ��Q��
Q�Xqdt}��.�D�
Q�>X�T�(�����0�R�X�a,���5��ck
D����D�jD��D��jD��D�(JD��F��JD�C��1�A�8�4������
u�B�f�B�46��2������>T���25dj�*dNc32�
#V3*��T��i�W���?c�����9��������9:^��Gs���{�b�]���w���l`����9���L���aH[��������t����������3�g^��^~����v��U���hG$k��K������l#'P����=����<3o��</�b/�C���7��Q(�S�3��~���+��/��;$��f�QR��&�1�]���9V��_��wH������o�s���8��X��~���!�.��^��r�=�cDQ�5�f�x���!Q�YK	7xS����h��6/�b/��:�% ������9&�A
��3������/��jv�<abqb�H����i�9�9h��=���3��UJ������=�w�R�������J�y[����Y����f�����Y�2/W��q�b���[����Z��,�:5����g=��G��W�z�y�:\1��g.�u5_4E��OC���g$���P��`?�B�<��H����Dvo�|��~
K�'�
���E�L�Ud��u����O|��'���*�o�
����O|��'���*�7������O|��5���"���@F�SX�>�UH��<B>��j���F�SX�>�UHn�"�(��h��l;�Oa)��W!�X�h�b��p���<������B����ER����0�������B������E�.�~
K�'�*��g�a%����O�����4�~�v���9��N1Z�|�~�S���p���t��|t�5;(�����}�v���y���z������q����i��4�F���m���\��}f/�����vO��"8����{���
�����;�Z_s�~�����?y�<S����qG�����3<�����+��b>P����}>������@G���\���;��mtJ
O��Cy9���=����}_-���A��f\��M�f�������UpY�wb
���#=������yDo�d<����4�O#��p�}�*�Y������������<;�^�=��>��v���g�/n~.�#�tb&"�%��#FD�i|*�dfiw�,�Y�\}�9Fh���|�l�b
A�K��92�����l��y������`���%����!��p,|<3�Mq�p=�}q��Hx��i~_1�����]LF���))G���EDtF~BI�_�a���1���V�'s2��Z|�>��be(�tf� �h��Gk0��8Z�V	6���zrm+Z~�K!���>����z"��$T�b��������������N��7��HP7�*���1C=co�*,��L
���e���K�3���&�}"%���)�\r�we��	
Z}Q�8��O!����A��O�e���dY�6s]/)q�d>ZG,�/c�Tb����3��7�s��-��u��-����I�{�
���b��.z��e�����hB�c����k�ZE�O:��F*wU���{%���������T�&���u~X����P�,9���U<��8�F(p]��\-d/��C��;X:?���e�(��Hr������D7��(����q_=�RG:�m��1�e� ���FV7z�����o�l1���r����`�x����ryN���:���-����X�\�1�M�������b��v�.	�nkg�h��bm:hY$O���j�-V�i��,�u,��TX������`Dh��PnH]������c�e���L�Z��!T4u��VV�)M�[rPQu|�.��[~h]��S���B�I���m���#.�9�`��FP���zz�
���b�V5��F\H����J"_4��P���-c��v?�k�v����ehq���F>�����*/��h$�5�h��iXe��r��zB�����\���vy�xLQ����O�]'tY�6������k�4^�i���w%�.��i�u�qM�:�RK�?U)���M\jN�w��=��|�GN:���R�J�De�|�#"jK���"�Z����+�������Z������[��>�*��������\�5o��Y=$!�j����\d�l����d4H[�R�����K�3��=4��ca��/�Rp�e�1��.�&��	��q����U�h�:��-������=����k��@-�{
Xl�rW�����k5���V�n����$�(�yb�6Cp�U%pH�0�Ch�����]7�Etu+;�V���$�f�� ��TP
�v���w=h��S����t%l����)�P�l����`�����:����F�!����N��]�X�+�/toT��\M���z|��Vm/T(0���ofh{�������A
Pt��}��s���{b��M�&�cg%7�������3��~��`�!U��M��U�EY}~My��r��d�yy�H�����^Y\�1�QF�o�=�h��cU7?D<�0@l0�0��TvQ��C��ht'����&����8yX�I��V��ZG(=� ��u��I2���|�$$	��d�|���K�������[�jvG��/;a��(��_G�HF-M���;���I�|L���o�,�� �r�BUT��0Ng��.�`�����xaP�m�$��l�Cmq��"������tY�����\
�N��
z������s~������^��)����������~����x ����R��E�V�D>�d�@���1sc����V1"�Ss�5�S�S�E3�>U�./o���1��\�����r�SS6wd�������x|
����m��<
�6���;�X�j�i��s��2�,A�}�������[���m��y|��{iw��#�$�-��n�7����j�Y�^ZzBP�!�AtN����N��ZvZ�"
�J���9=�v��N�.�A;���]a�
:^��������,��(���I���7)���kY�������}���a�g�����t!s�uhE�)M�Tb����b���}�'^(�����Q�"�l���K�t��M����.���X`�} E����;#X�����7��^s���O*�V>��p
���]VxfR���/}��2*�����-B��-NI��p��{G�e.�l*q��p�����C�ZR6^^G���������+����4�!�X%��}(��W)��?���.�[*��R�@�>�i;E��D~(����$��x2&AT�\�R
���vP�����)��Q�X>��sSb?}���%Zf�x��qBM�$��+3����p��	G���U��A|(
lOh$��Z���f>�f���/}�C�+R������Q<�#��w� ��x���g9���K? �NrZ�;��~?�o��T�I~^-��)�����t�#�s��TTT�QQ:D��hQ��]:`���G�T�^�Dq�0��j�����Y��/C7�RL�MV����o������u���?TJ�Z-����S9��U����������U��;�s�j�+�	��h������)����:&�C��DK@*(c��������/���c�����1/����|��d.4endstream
endobj
62 0 obj
16603
endobj
69 0 obj
<</Length 70 0 R/Filter /FlateDecode>>
stream
x��}Y�����{��x��3�����3������K:�.?�RK�+�T�J�����X��������4A������������O������������7?�h�zo���W��p�
�AO=n�o�zxw3�u��O����(��N����wo�>�QG���;<�/��0{sx��F�}��������4�xx����/�<���A?=e���4��7O��O��f��;����<����m�U�h�w>�5�����f�����N�q^�	�������0��>��a�uy���^;��@f�,v���al��0�r���?�A�]���s��~~�?�������\����t_&
&�&�2N<le�������H?\��8j���M�5~��<��;��u�C��%[=�)t]YJZ��~������t�~l��&�G�j�2�7������M�;=4���4�0������V����I&��{0�_�������_�N~
X%��W�����=�/��o��~{��h5O&���(���=�������|i�`7~&�d�`/��FG/pa�� /��|!�c�B�h��8���1C�7n����5<=��t�����[������m��u��3��K9��[�h��8��8}����!o���|:�
/���_�3\�h�h���$�L92��7���1]�����g�����
~G�.l���W3\�7x��+�k��F���{�n==	8N^�3o���
,����wo��n}A����
�Z��z���^�I��?���������������e-��w��L���%������af����3�H/<������g�Cs}_�
���Ba��4�0AEn�����A^�A�L^�{���h#�	������@��%��w��;��(�x�H��^��(0����|u���B�	!3� ��\]�����
^A���K�i��.O�h�{@K9���G6��ea�|���i��2$��hf�8��gvW����4����:���r�^8�	t<qD�-��#�e�7�X��|�Nx}�;�K��8`1�."��?�����v7/�-,^P���%/�+�8:�rM������|�f���d^h���l�������s�^�\g�*<�9�H�.-��A�8mg���������Me�@`�}��}rY)�j�"�R��I*S/�vc���zd+����L�Go���I\M+]d�����y�44�|����:�2�;v�Hf���U��q��'2>CTa��;���6�lf��w��!@�k�����b�����>��%/=����`�j_���Y����c���_�������7xv����	/�)�"y�+3&���Wd����M{�L���t[�)�r���zy�"���i@��e�m:����VnU�����"k��G���Ds�kk��0m�0���=�3X�� ?k����d�S0�l�s��"g@d�A����&�5bf�bd���C<�3s�����|�o��z�c��_75
�����^�
��>3� ���?�����)E�B���s��0p�@�
��f0rZ���o#����-�c��`b�8�����m;o�}"<��N�<�\�y�<����QC�6�2���?�KS��D|aA���9;�C��+Yvl]~�\��A,������`���'���0��C�r�����"{g�,�HY��	a�.�T��]bO�XB"����/�}���������Cg�f��hZy{�������������u��y��7�21hy�-��zi���%�E~
�����/�-C���������q��7�`S���Na�H��;�j �b���Q�Sj���%sJ-���<�[`'��������}���uJ�����}�R�TZ@/��+��^�����G�+��^��7��k0���-��8�7����hgq��z�u����/��2g�q���K��-�%\���*
��rtj���t}����m���^q	�/��-u��l��6���Ul���eK���a����M��#'��
n�S��������nYI5�8���.�6�)q,mp�]���8��y�T�09��.���c�)�,m��U�S��%?t����0=��p��]�-�xt���winC��G�K���
�����w)��]*-�'F]��vii��ZF�K��q$��W����4�]_���.�t�z��.�mh�z��.�mh�z5�]��f�4��]���@mh�z=�]��a�4��]���vinC��[Mv������^g}���/��q%�>?y,w|���e�Q�����G�$H:Zm���}�=��y��� �~(}�1��h��y�8N��k
~������0����q&�8_�3te�W��_]\��7�������Z�$�MS������;��r��K�\�}�Z+�I�,S�����^�3��x�?�>6���|�O���G��M'i�6@�vp�����cz������n/�������I{S�3�cz�������$��
a��q�%��>��S����@mp��N��)���
�>f�O�
��Y�~����_Z�Q����>�y�������A�X�����O=���%���/�w�K��zZ�����z=-��/�Yj����z}�P'��Z��4�����
���ifM���j����*���,��Z��4�d1�g�����H:������z��������,��d����
�d�\��Q*fJ�y##���ICJ�OWC9��� aQ����\[T�$z�$D�s^Z�SX�=s($��&	8I���+"A~
K�gu��P�LId���Z�Oi�O!��������e�(&m��?�$�s��Gj�M��(��'��"h�O!�������l�I�4H�6���B��C�#3�SXDO&�&�k���B���CA�.�f���d�7�*�u��B���C-r�%�5���u��c��q�0�8����987�p�9�x����f�r���E��[����XK����
?��%�Y�4@�b���S��c����K���<�c�8��!��yjUSv����b����c����-l��0�,����������5�r�2a�@������q��&�]hB�R��P�	
�����,z�&��4P������vw��n�,	[J��M.�������~����;;�8X�Gu���f?N3��)0��t�i�)�E�)@��9��O���J�F\g��=M7f�����p
)+1}�P �
N'<�'o���/�w!b#=L����|�>��z��
glBO0���WnE1<q_������i6^{���O/��Y����?<�!+�����My�#O����E�b������S��f� �;��������Hi8��-�`��M��6�[��c�p%:��J���������<}y������s@_�U��uYg��-�Eg�sz*��h����G�]���f��R��B</��d���;�\������AZ�3U�M�j/[��.�>�����y�3S���wz
�Z(����-�E���B�5<�S���2�������/K����2?C��q	�Gpx���0+|���gN?��3���)�fa��|X�@�h�g��p�t�L�� ss�T�7L������/�����;L��uK(��x�[J@���e���1t�����)������YDM#Za�i!0�QS&0���I�����a ����d3�����X����$V���,�!��EEaL������\�/�����&"^�����4�W�=B&��8�F�t3�f/��Tr�W��P&���"��M���
���*S}���6��8���gy��=+'�6�0@�M��Mi����lHI\=�����c��rs8Y�w/X2��>7"~A������
�&����;pp������4�[7G�NPV��N\��95�HBN�o}K�-�,��?@�H.���L��|GYF�[CI�&�a0�tC��4D	\H����N�Wh��;���$���7���m)Q��Yg��P��M5(���_.
U��&�P��@:�2f���O�o��2�� `	@��
������"�>�G�YBP6?���6�A21��d�|-:2[15�����j|�����_���t������������P�XPIe9ZlA"�Z���6�ua�Z�"�E��7����H*&S��#H���0GB�Ozi��'`����:Q(O�z��$t���{�������N7��6�5����ml=���*j��Dm^�"��Ga�AF�����j;��G�`�F�d�6��,.��F������%@������N�$���y���j#)�����O���Q��rg,0Dd�!�Z�&b�����a8�	E���A�	�������7���	[��^���.bW#t���a��zA��"��M����"�~��T�b�F~
f��%�0���q�����y�d��g�`$!�bv��H�LR��b'c\��k�.�Wym6������A�QtZ�������{�\�Cr:T7Z��Q�.���mPF���%�V�0-@h;�9n��zUe=CHd/�
B��N��j([�h�����
h�cVyv�h%��Q���O����uTl'$���|:����U�~}4#����������I�	+gr������������]5T��d���u���0'�s S$�'�������.����d���H|�"��C���tY?U�+x�R�EJ_�{�"�n�+���!@�q�J��	L���=����y��:�+�{�r�(Y�iw�� h]�::��D[)���>ry�wE����>���I�����e|�p��doa���C����yl���C+EC�%�-?RI�-��5����Vvn�y:/�t����m#z6��JB��Uj|a��Pm�����)���}���V�:(����$����d�s%N��|�VZe#�tv�1�z;'R�>0���4�����i-q�dA
��Hr���E3�
�K(�
��Rt������zVt�$�4.e�������U=�����<������X ������d~�N
|Y���G���o���-�h���>�?��J�~��nl~�h
{�9=�(���uZ��������>�yp�m-4�E�\_��������DN}�z�y@���@�VO��E\��Q�<�K����J�x>�m����S>[�;l���x�f��h^�m����E�
W�����-��W�M��[��P�>��zB��g��k�M���>�<��.��W��N��\���/��p:�r��V^aM��w���#�{r���~�s�:����L�a���u92[t4tTP(��5��m��A+��L�d�t��@��W/��p5t2�Rs�� �k
��qc�"�����mF�N<�&_y�����|�R�8���W�
�l(���3d�qT�&��;��m�o�kT��!J�9��K�W0���k�2u��wt�}F����=	��!=�/�6������>����M&�g�\�����J����WLlM=��'��rZ��Fpv����T*�:S�p����N������B~C\���i�6A�Y����
b�mQ+Y�#�gX�\�� m=���z�`���z�vllS)��K�����;��p��at��fcl�6
=9�9HHQ?�8� ��dm���u?�$|9�?����m�����+96I�"�J.���V�V��B$<&�_������V���Tcq�n���:o����dU���hp��G��6�����k�	�@�VA5�D�M����Fb��oO�5>������8Y������b���'���4�q���N�6?�Gc��f�������vm����4�U3�D��_������0"l����	���������5~�x
l`A�%����_�E��}�.��\���j�����FW��<<�����YD�����h��6���py�.��Xdk������/�����h�[lxyG^-����97�;������Cc6`N���i������&F��f��cl��j���^�(������N0��&��U{M�� �Lf�/9�v����^d�b#���Kn�=6�h�����w�^�ft�������3�7,�#WU\	���	�p���6�&���N�l��`�P�PAO���'cSE�B��7�������2�����L����*��pA~$�y*��_��'\"��E����c����ci��`5��N"
�aO����g�t����A+������
���C-e����7n�'c��1�����lK�&��Y/�<�J�J��;U����f��GWp��z*_dy�<�K���yT���^	�n�������o9\�o�6�)����&Yu�KPc7���ZxnB�xQ����Df�}�?�
J2Q�^0����r�������N�~x���a����k��%%pH��O/<�c�����	&-NU;�z�U-=|�h�
\��L��[+�v��N��y	�(w�p.��R��W8Z��G�p=a�O������t�	�{�Ku*V]����r~�����=ly�-��Toy�4�,sT?C��#�l��L���X�Q�z��h�
�R��u�����s�@2d��;�)�N����4��V��&�:G>.,��s��*��Si��I#�r�e����Wn���N��[@/7z�+��^}���+��^�@���^&\c�Wi��:�^����P���}�sQ�������{��
;��e���2K��:1S��i"�T����ll	���Z�
�f��`'���2.-�(���(�ZZ�8v�����[f��[�H�L��b��2��Z�$<g�����-��������L6�s�P�Z� �N�(q-m�l&~F�ki[�FwK�Tas��LH������-�3|��������#������-���TO=f��������2&i{�����9���"m���^�m���������@<���G��|�S�r�K/��2�\zP��wlA�]z����H[9���SQZ��T���-hJ����i��"m`�a����W^[������x������B�AGF��E�b���������`1���e�v����S-5��}}RA`����@m��
~�x��������
�������>V��_m�{t��,����}�Xf4�>q�X�Z������J����C�3���-v��������]�6���j�������2�Z���_m�{t�w�;�I�3�vV��S����>�.S����H����:�$z�����N{:����+��7�=�~�5���<��$�;�����z�������O�>yt�}����_��]VM����n�O`s5�p��)�?�K6��+��'�e�cO���{C��������if�+C��z����P���f�:��*�H�Z��43��I��z�����g����I:��o�������NV�(��>�{���
�3�<$�����H9JC%�<YNgi�O!i�����6����'x2�tP=s����P�H�����I�B�BR��C�#�R�<�9Pi�O!	�����:���"!x2s����B���C�#�9��������������x�P�Ha�cEQ<��li�O!������� �����d���?�$�s���K��R�ui�O!i������n�}uz�r5{�b���Tj��s��9�������6���mL7a"Nn%=N�hbh)L�(m��n�a	���
�A����*:��=��0�2!.��F�"�����HE�m��t��5��1��C��������L)\��kq��o����xt}/��u��b�q["�9��X��b���'�������@%���\r>�$���|�-��v��Tn9��8c�,Ews�d�?��C�����t1�"��!��K\b[x�W
��F���*��O:5�XG�q5�q��OR������g2�^���6��'�@������oPB���C������������o@�O7�&sHt�����Q�����#���5j�D�����Cvu���3j�}�����'O_�=�{�3���
�����.uka�Cn	RT�/O����Xwzt��������]�;����Vk]����*�hw�}4�F�I^�e��t�e�����1(��8��UNw�NM:��	�B��|�T� �r�����I�oT
�q6Q0D�S/�^8�[�9��O�L��#J+�K��~��t\��2�S���9UF�sZ�������6#�6��?�%����$�.j�R��H����d|�U��[x~<PJ����,�����P1�%�	r��)'�<`�a��z���z��V��������k��f����������1������#����*H����2�*�V4L�Jq����Wj���AO=���������L?������'���0����
y�n�����8�x
_��H�?���������)W���cGL!�pu�-P�%U������4&��Z���>|����S��#�y�3���E��n;r��� 5�7�an�uk�y�T%�Q�[�/U��yR	�m��;Rj��)59�
��"3�������D��v��.����a�!��i����������r�#�	W!g4]6��Wq�h-�F��j��2x�D�n�P�9��U�������F�T�nI�zG������v���$U���� #��P;B�x���u���uk��������Z1.�iR��{iH�>%��*h>e�?7'Q��qK0p
��3P���g���U0�9A��9����~R�3U�	���%�KUFy��"�D�f�����Klv��}��A}�5�7��X�Ai+z/�V��`�KG����
��Hs6�"��h}�����Mrr`�4���lPW�'�=aD���(�����OQ�NBL(b]�:�P����a��m��U��.����$��4��`�����J8����yok)m��T�D��.�����gk�b"��o���oY�I"���<f&$�1CoFS1����1��p�(��j��n��%��G�~����v�@�����L��0\�*�">��:A�nQ������5�R���m�l���%�q�<��
'Tm<��|n��^����,!��V]
�Z	FP�D�����P)�%V@	�I4���"�*_���&��F���j��3s�)R=E����6
j��\��!�<�6�mQ2�;V�7��S�'��L[0� �|F�����0����g�_9�������������]��)���2|C�r�������0�4r�|�y�D�U�o*�o���?z�v��B5��;�?q/�@:O���v� �� �t��CM9W��t_��`��������,f u��U0A��>d��k���`����!�iU)�nd�-*���j�8�c�*�'���$Za�+�<��=��l����t���hY�I�JtD��:��>$�6���|��jL�k����vy�U�*��P��u�r�.�
��;c�A�?LP`��m@�l	��~KyU/��x�=�i��PR��x�N:Z�'iAi;��U#��<R��-�LO�-q-����o���)�S9�l`�����	K�R������kpQH���� �BU���?������;���*�rh�}~��v���4���~�5��|CK�����M�����*3�j;jR���E���H{z���3,����S.���=�v���9:��!Ib��ZLRMz�����R��`^#�+��� �g�����f�x�W7�S�J��A%.HE�Pw��@�i��)��X�����F���`�~sW)+��
 _g�����TB��~E�!���MY��]9���zm����hW�E��~�(���!��N6j=��#��y(ME���#�k;�P\7N�������a��)��j8�.F�[:)`��H�$�8�m��e���,���/�u�}]"�^>�v~�*_8�*��2������3�ow������D�"�R�.���F�������������ekB1���p��4�j@��3�o��U�|'`~��Se~8�w�D�a<�s�)���9��3N�C�-���h��3+_��8��F���B�T����o�J5�.s�&k�R�N�k������������������|O-��1�����E&��F[���d�M>Z���}�+�Q����r-��d2�j����E(]�'�c9��:�l��\���<[���Y@�;.2����=ff��T1����TM}I2�j���wH51ew$�������f������R��^M�:�%�J��%i�p���]�r7{?6�V��f�������g(S��M��Y1��x����R|qT��;��k�Rv�����k�jG����������t�:�T-?������p8�M����2���,-�J�i�J��"�1�����N2{�h}|)'�V.�ET��c}5&-���j�e��Q��(w���|"OhEC�p�h���L��M�BK�Qy���"��v"|�����O�1#����{XV��,�����E�X[h���,�ge���z�b��P"�M7B3�y&�UO�V%>X�cM��~��6ybH�`}?1�Vfp+L��:6��a�:U�e���U��Kv��3�� �A������49,��8*
���s
����x�Vo�Gpd����WXy/�l��dL��#������An-���'��q|����q�&����s�K�����47��@lm��m�>�L�;�i+)���B���H�)A�A�;v?Q�����Y�{Yo$3�b��7�������-�|o�@
C�?��P+=�v��P��������}�k/��V�@8���iM�kNFi��?UT������|
�@�	5���������{�pn`��~i���)$v'�,*'�l����y i�
!�_��!g{f�xJK��kZj|i��e���}�?�����u���9rh0�>0n��$�
�����F�k��
+�9%�C`�+��������r����g�?�`���[(�U�;������Z��������S�|E���11����j
�z��r��,N&-�Q����	�}��>��w�]i g0G��Wq�/ZI]��L&�w�������u��Y��w��Q�$2����bWN���kQ�S�(P7 �oJ�k��klN��l$^�:����t���6���� w����8�TJ�aN���6����a�C�b�b�7����x����s:u�fpjR� ��M�@�j���$O�H����mWC��J��"�r��x9�ny�%��g���X
�`�SLX�l< *�v���l4�kv���e��+,�&��4r�v��B��$��!u�v���Pw]��@ms�I��	C e��K�������1g9�v�9�o2�#�Q��f6��1�5w�>Z�2d�"����_~N�`��v�%��2���=$
��O��|���^Y���Y`�Bq��4�3��f�
[yM����������������l�p
�=�I���$A<���$Xw��e�S&8P����$�))6U�i���,��/�*H�FLp�U��#��B�u��1��4,�9YI���D
6��-z�f[����g;K��,�u�`��0C��f��2,���G�(��?e-�+�� �j������rR���	�Y����<��-�LeZeow#��DaC<
�M�;k8��W�jBW+�)��G9u��Lz����D6��|��8���$�(�Q?#LWZHP.�+r@12������3�%>��L�]�;�SE�"�l��TU�~���k�()�KFYFl���	*2c?=�	���X���z�nV^�j,M2��I������,@��N�����1����X�:�|������2*EE6b���S �y�N�Fq����#���V����9��������xU)��^�
��*��_��A7�lsvf���yo�h�Q|���&���.�N���*!��JQ�E����MJH�q����uv����q�A	�0�x�l�mg��@j�i���T��%9����$��Z�l,�Lo�rKA;���U�e��ef(i�}�r���|�����p�Gx��W�b��m��
<�<��
����3��X)�C��G�hO�~lE-���7oQI�R�����Rw�X5I6�:������T�_-n�BY��� =��!K����G];��bdC%Xt[��Jd��~{�(����'S�@��5]>�H�C�.}>gK�s�O�~NH�����A�U �Uh�^4�Vp�-b��\��4����2����	�USy�������������i��<M@��1.�����(A"�W3Y��(���e��*�]b8k�$�`ZJ�Q�r����E���#�{���Fbn� ����#��F�b����Wl@�����Xj�5��^�me�Em�"-����wu���^P.���7�+6���������R	�`�<y�1�e�m���������e��V�/�e9���!��K�gL��4;#�i����)��(�1fTH1����[uojW�D9,�|�}gD<<]q�l[P'��Cb+��
��=z��J ��������������D�4�^n�yJ�!����W�]"D�c��7 ��RUG�#`������QMP�_�����M�%<W��K&:��q�@�������^HM�k�0��nzNXvE���_���ft��,���T=n�����:L�/��W�"hqJ@�9�^[#�>l"���������o%�LE�����y]#�e���!��3�����I��-��m�8h��@6������P��oWSk�S��v���>������e��k:a������w%��M�b
�a�5$����9�3���B��9�|s?�z�����c�����������P�����u,���20e�����}������w�jj�9����u"�e�����s�o�����yG8K;�������{���#���:D ������p(>�
�m��/Q�1Nz���x���s�[m��
}���,�(�aPM<��G���&���v�G�������,��)���@�E��Q��(�^��{Ge~�/l�
��^G��7_&5,�T����A��0��b�F���	g<`�����������6	>��H=��i�����������Ek�b����%_�W:��	�"���)pa
9��u
6��Wd�&�L+�S>�P�b��^4�	I �����c5�U�=�x��'	 -�,������+���l�2p���U����g'�	�~#�p���D���k�X[�	����`)=9��IQ4,�����n4�t|+�=����b6s&����N�J��Ll�.��._�{;���(���`�6_����&!���<V5�%0��kE!��&�f���tf�J�@ND�.y�A8�r����[��I'D}�@�I�����F�yw��fu���[U���p|��q��*S���Z�x
�M���r�����+8Jf��=�d�E�w�`����W|���	eNT�O�
�8�0��RL������\���9�V/(L	�|�'�-��&������C>��-������6�'A�s�����qt�����/V#-���"�L� �L���Rp�b�\3�C�A�.����02���g%��������l��JN�'�oxqx��rQ�j�
���_��S
������+�$���M��(������w�2j}�mo�7+	��=�������1S%JHU��$��9U1�f��Oa��B�9���Xr�)�f���WS��"A���_,<m��X����JQrdW��-%�h�;�������[���L�����N����������]���K��5<6�9��;�����UC�a3��":Mn���t����V�����fVK�c��B��o�f#*]�|��RV�Hf��	w��3
?�������c�(<=-Z7�����k���o�i�����7�����B8���Gr���B�����r�0��v.e�l�JZ2z�Ns'a�����+�6����Q;|D�^`��x2��_v���PH!jdx��~Fi�������LU�kVt���xH���m�a���s
��rl����q��}��-x��f#�����$E5r�*2!W�6�N ��[��*����S2����J
�M��1I���4(�hE��
O�B��<���%�dGB=I��kh�i�<U��Y�8��q�����������,q����Z�C����U���������$�[�$.e@�	���
\z��%a�n���?q	��
�{��B6��6d�
D�����M���Yd:pMQ(���F����@+W���k�]3�wPY�l��pX��,�A��@�r�L������d��p�Egx�����I����(�y��	l�Bx�<��w}�ty
���<� ��M����|��)�.OG��}C�|5m�g;��}�B�������b�>Ir����a���!|����U��
���=��w.�*�Y�Z�`jl8����Z���T#���&X�v ���7MC�KY��EJ��$��� ���y����{�<S��u����3T�-����l�f?M��"?��YPF�D7�jG�`R9E<i
~������
4�S�D$�<�(�C
���\GZ.��+�mY�v�g;�V0d�Rb���P �7�s��(_��"����H�
�)����6�c�(q��t�B75q��t�<T����;���.������'|��_����pV�H/�vf�ES�n|�o�{9[R�����R#��Dys�i?+tco��\)O<.�����u�nUmV���b���M���Qp��yA�n������.t��J��Ta�+�����	
.U�������l#��������v4/�����"QH|j�>�-+X�g7<Nw��U�oD��h;�*����S�bSYQi���}F\/�8���Y5k{

����I��0dNC�^�l�[��3�}��x�5#%j�z�Bt���e�� .���?�3�cri�)�O6eaA���+:�������p��2E�L����^����
�E���Y~�)L�Yn���~���`X�*
���T���g<p����mS?5B1Wf~A	�����IB�Tm?�Q16����(�g+s����G��������l,Q�6�\�����8��9����=��u� ��D���s����J�L��R��:���v�]�s���~��������'�
Aq��$z��PBtS!�m��%���/��E��&
�T%�T����I��z[�L��������D���2���2W���H��9m
�6D!�D���q�N���� ������,#�
�z��k����V�72g{<0+��69�v@#��8��������C��e+�%!P>��<g{�H
��}$[`�M���9��/9�2b8"&���"�OR�6�����k����'A��f4br�rL��B��P�3�I�tF!^�w���x9����(R�z	T�++��[d�h����Ape%���x5%H8'��r(E�C���e3bE�t�/��W"��"��m�mi�@�ZK�q���X���:	���@����J(E=��9��P����V�M�.�DuU�
����������$�"9�~-5��FI
Mq_X}i��i�ob.R�04Y���(�9����NMW2uLJ��?_Wj�mA�X����@#�u���������~���UP��U����;�����b�m�^��g{�t8]�L��Sge
'@U�u��'�+-�H�^�T���y�2{v�%c��._���;9�h����f��%4w���b]7����&#������l���M90�!����rx��W��E��V����LH+��K��;jUT�@���+�)^��X=���/~S�0�|�<���x�8$���;�'(0��	���pJ�K;/�kDi���T;x�4��G��XX��et��
.A_���8^�r����
� )0��N��	��� Q9�cKK��6YZ��~B�\�OQv,�=|@�n��a�������.��*�h��!�-d��8�,*"C��z3�;����2r�z�����]�-�u����92��3�����W������/���n�>���9h�����������F����r���_��d���S����0�p�����������n~�]�/�Y�xu���������.|��������7���+������nG��2���.dX��1n2���cV.K6�i���%����:���,}�!F����/����N���B&W[�������3�pV��Rc���J���Y�V�$V�F�Te'�,�Z��@�Zf��������	<8�	���Iq� ,u/\ZJZ����L+[V_���sZ����>�-�|�������Y��g�I���z����3�S3�7A�C+�j7���56�j�������nXZ�C������I�7����/�����E�ky��v�T.���������}��W�cXn�G8M��6��^�ip_�
!����j���H !��Q{�&S���m�{�'�^�����5�����j�a�endstream
endobj
70 0 obj
16691
endobj
75 0 obj
<</Length 76 0 R/Filter /FlateDecode>>
stream
x��}�����}?E�
��k����;!dY�2`����g�j�1a�1����;�Vef����V74�3�auUeFF����?N���O���<zy����p��������i<}��W�W���I;�5���g'�O~8=?Y�uz���?^?�o���?�?��~������j[��������l��������9���8L��P����N��<������;������n��n~���'7��o����v��v������Nn�������sO~�<����r
/�X�F��O��7�?������=�~���.�z�^��]�_���6u��7����n;��t?�����G�!����s��{�����k��i;��9n�j���{�v���f;N���e4��!�7�zn����W�A�^�j�}������������%�w&������'��0I�u��:���y3����SJ�a[�O��'���OL&���>�n���6��G���u��J�vk&��~��~Y�k������^����c��SG^4V�����x:������4�=�S�0t��7�t����-�I���u��E+��/�y�Xg�s�9�h�R�x�����������q���?I�����	��
�d��SN0�yr7������,Z�S/g+��>P�	����NQ��,���w���l1����w�r�k��f������b)��<�����@i��(����^Uv���r���,�%�L��H�3&7�@��>��S���3r"���E�����[7��~�GQ
���\e����p[YO��}/����6B�z�~�e�k��Us >��(��J4;2���L��`N�=��T�+Kt�_��7gL5 
�8U��C����P�G���^��3x���W��O��/��Z41b�T���O�"�L�$����4_�K���0.�%��>Y��!T�����{&��q[�f�l�Hm��N�������������=�Pa�n[a���e�E3�q���(���(�
�i&��������]�g��2������(�KEj��+C�6yX��8$f��y;mx�>�8[3��B�X}D������;�J y�q\�/xUb�cR��WT���WI�*F6N7��L��Q�d�RQ��=5�
�'#�jk���2�+�����5���3;V9�K�`e}�a��e��Wl����%&!:��r��J:`�tH
�I��z����(�����amM(�1�,	�u#��BrX��0��F)�R�[�"2H #�;o��|Y1EkO����N���d���N1�8��xmN{'8�K�aP.AJh)����y�o�.����R������t����8�����'9N�k�����p\�CH�G��]~XXn9���l�OVbV��x.�2$y�Sv`W MWi}��4�r�e]`;K�/�L�@$�b���m�����Z%�@����vY���/���!}[�-���d2�[A*��R��pl�~o��^(�k�G
��7(���>�y��6�q�+��gA��z0��e����M�%1�L��7<��Y*��� 
����Z�k*Xs#op�9�L6�Kf���gbM���r#��f���(Cd��`�������J+U�6�UE�����,�_l#���r�q��h7�*.7�FJ����i���A���v�k:Vo����?d�����.r.,g���n�#��^���"����������El�X�WND��l�T�-"�����wLr�N{�6�21N���Z������Ep�]��SF)"O)P����
���������(rdPS���n�]�2�'��������@�'�����%}o�8m�P6�U!Gk�����;��fb�q����I�F��n2[5�����x��8.���qQ9�6��G�Z��F�y���<��d�x���n�p�%4��i*�kj�����/�"J�F*@����2o3#;��8�G����H$x#��trB�	������j�o���/�����X�x�����p������f��R���E;Q�y`Z/w�+�4���FP��h�m%�P�K�?�T�CS�B�zk���R�&���0@�����j����f�x�t�,�uL0\9����J(���X��#i�d�u�h���,��:�e2�)�����c��!ZO(�`5�#=8+z�r�yp����Wm�������"�Ilt��j�Y�_:�q�L������~Yq���������L+�/�eRu������u�A|�#�Q&��)�F+��H��^�4��������1�LQ�A�2$��
��������c��(�!�Z�T&@^d�����2�Mk�_���Q2_{]\���Z@�����p�u�Fa��XKa�5p���9�j���"�h!���*��[-��5��>:zo�����i���%w��F��:��Um'�i������#�X��*���{Ai�7�L�����r��R/�Z��
	�V�b�OK�_��y|��\"6��t���F@�T����g������P/��!��F�� GJ���HU?7����#��<��r��:pVC%���l�	R�O��Kn���F.��wS	���+�� �7��;rD2���	�M���Y�.���An����/<�v�[Z���#�$����~p*Yy]�����TR��V]���|�\�y�H!/R�Hd�^����A��bb^�/�����(ZL�G:���Q��=(t%Ee>&�O�����RY��"���/�;��O���.������+��Z������"D�q�%�e�in���,�S��A@
�����j=������iv;8�.n^�/G��q��8���h����������m<�����Te0m*S8
U���J��^����ce&��!��i-iJ5��tw��2��G��^�Se)��8���M���{�F4G��*`-�.z�J]���
��r�k�'�Z%o�=��h�K��W(���!������s3��J������$�z/����D��Gn���Y�����c��Y*)7������:if@	�Z�t���VY��4n���*�d
<|!:��9F���Y@�6GZK�����Ja�	�,V$���XP�tn@�`���/H�LuY����.�?�T�^���W��xJ����f�}0k����YP�7����=����04����>����eYg��jIP"��"�� B��p��a"z��_�b!�6���O�����T��H��S~4�Za�[�?4��#��e��I%97�
I8���O�� B]/����
�N��r:]2��Q@j�&��B���d����|]\�>�4n(���
�q���Qw�1�Z-5U
{f��/�s���L`1wl�j.7S�PfA�����eZ�'��5����0rLG�3�Qx{��{W�6$�3'�v�s5�� %�(��os2)��H�����&����>43��g��/k%���Y���/��E�M�$�'�}2��:V�.R�I��d����.$�c���%����b��N�w�7^I����IA��v_��k������q7���k�-�X~����ka�����r2��(�]�oX�%3�OZ2^����_M����(x�;:c�$��4�y��U�C�U��~���"��^�����m�5{�]����o����Doe���6���x*�Y�^��u�"SA�y%����Qvg$i���������~��T,�]�y��a���h����W��s%`0�f�_3�ehwv�]���������Q����?��9����*�BS2���as��h�odr�{�M�P
�V� 1	c\��,��vf��}I����|���v��9������ j�G����L���zCwl�t�J��l������f�k`o3�&��_�uhS5{�_}��+j���m�|���,�c����U m��0nx~kq�,m��jE���7��Re	�����M7��A?\��%�i��
��(�.����A_��9_�[���[V\�������ap��(f��+P1
�,|j�~U�jf�"����9�F�8��=�X	��-;�����3S��������LLE���Sn�����N��tB���*\���p�7jGDwxx��=9�`p���|���5s�v���jF��&.T^v?=��@���S��;��81��'�(O,�RG6:O]�J�WC�����c��]�t���p����1?��E�~��w�^��
y��[����o���,�[�.��7�Q#Y�����y����^������C���ut�vN���a;���������l��7�O��{�z^��H���V���VC���!��
������0��+���zA���sq]��Q���l����5����R���������;|�<I�����6�>��j��)�a�@���f"�2��fg�>f3����n5�c��J���'^����m=W/���r�W�L�����N���\������il+���&.�3�gq�v���b���i���QL�����A7=V2]�N:F5�t�x����Uk��z�,5.�"���>e�.L�\C�q[w�6��7�!�V�Gn����/�1��
-5 ������u�-�,s{{w��B�������t�xr��Y�HoTL��"6��{����'�An�����=��d\��W��br��)xC_�KF���9H���h���$o�B/��8�r���oe�m�wP���Rd����.��������^�
Sqx����A��dei��1�`|�������p����A�a��^����{@l2+%�P���k�{����H&Z
�K��'�U���4�
�4%HCi���?v���r�q�������YT��GD�(���+�������Y�{��Q
���7�z�x#mgE�����9-�J`:�9��r�K�U(%��#B�a[���	�C7
O������>}R���9?��%S~����f��)�D\e%	i!m������Pf�2�{$*u��>d�@��I��-��91rCP@��d�g�/��l^JG�Y������ ��W�&�r�Do�or�Io2�����LN��Cf���#r1��;$�;a�7)��l�:j���4��w�����4��u�=dA����_�b1�x���7��=��p."�A7��%�L�(8��`j{�e�D�|��j�d�5��Ex��G��h��KFAr>[!`�Qf�S�� 8vB#���G?q��&CF����N����KY���D�q���@\��X���!d#��������2��&f��I������'�+����
2�4��Y��A�������>%V�FHbQ&�zjM��<��8��P��@cB����/��}��&�7�)���J���C!���%�	�QW{N������:�K1�Z�5��xA2��rW?���Tk
����������	4�v,�Xp�%������\d����1���NK�s�G���H���I���\���8k��>�2r8�oo���b@����G���]�\�9���3�o�z��Da!2*>H����6V�6����qy�q�' D�Yh��C}5����{����<��n?�{�\@�+��:���$yF7��&��b.�-	�Q�
~�}elRtw���2`B�����a�� �X9��&��s�����&i^�,��w����i����(��(�<p�i�����?���W�����7@k�[@XtL��%��a��*�p1�duuO�����i��=	����>�n���8\���M��^�J?v�J����?l���<��]G_|L�@e�%�1��D��Oi�6ML`�lTM[b�g�I��3M�!��|��y��+-x��id�c�N�����|�z�G��	
����i�q��P�Ky9��`I��Ba����u�R��x��N���8��W�����a;��T��,5x�&���s3{��_��Z�G�axR����{��
��%-���<H�������fm��?}�K��o���������;��>e����� ��.}T���:�.�0#Qm�k
Qk�*�
TV/�n#)-:7���7��	��=�$h_t��zGK���g0�p���������h�*|�1�.%���T��9j���4W���
�T�soI<�x�����J���G!H�A�{(^��c�SE�=�qJ���9]���\����_�y$B~�g��B�f,jp���B!���(,�������~{�\�ft>�E���f���V$@�����D����;�:����&q���i�p����(�M��W�]��jeL)��c5WM
��t���r��l�^E��NN�
wk����>���e
�|�/]�D;<�0��Y��(��s"k8����-Q���.<�4D0n[��>��A
9f�pj����:��Q@j�C��i����z=�*��]-R��PZV_��-�Xz85�$�:
u����V��
K�����=��R5�M7�/.9"9njI����H��!�?LC�q�L��`����4�W"��+��,*<����D�_D��s�������5}3tY�Tg��0A��
�@���P�}�M61�8�$r8��	Q�S��9|�=��GW�x)�����'�=�L��zW����?�.H�:����	��.zd��O��6S��������Xy
�O ��t�z��k�Vz��:�$q�`����QU�Iv�J��j�h������lO��$����l��5-�&f�����A�/�)w�m�r��3^������o�����D�m�6U f����3{"�,��%�ty7�M�?��J��yl�[p���D��*�����(����"��;k"q�W��u#������Jr^���P�#p`k���%�+���.Kh$��J}���	+�:�p�E?rOp."�@������5I���e�F��;�����)G&���u$�nq!���c���s�[m���a�a�ZJP>P����|n���6V:o��?��2�2�����S6���c�v~��*h���[�$��_P�M�M�,i�4;�=�&��n���C����9��,��+
*�M��.��(F�'<��Xee��F#>I��yy�q���]%�������7���da ��/_�]�m����������A�����L�����=G���]F����$w�#�%��$K���]�CT<�������.�r�c�����T�xq7��V@y���e�����l�t��
~`��_���G��J�������$�Dm5�d
�������\��dh�z����(L����6���Zu5���
�P�
�4������S��z���������cH�o��_���"V�8����^�;�o��e���q�.��Sz�[Z��)1�C ������f���]�t=�J��S:��R�(4������?t�����;�������5��L,��������D<=�'�
#���W�#O����Ag/�"]�D��)W���[��F�����qQ���^VC�W6����4���U`V��g��R�d7��s��Ku���-�U�<K/(3�a�1�p��a����Rv�o"ek������R�I��z,@~��J*������X����BWpKK6�_M6%��$N�F�����4�@I�����hlN�$`��3.�|����j1��,#']tuQ��X���*������}��m�F�=��<i�x�����,K@����*�����5L�!V�K�5�	���vK����p�:��
dU�	�E�������\�����Hv���5a>'�-����j�v����Y\_i������X7�9��vb���k0n�o0:8t����dimj���?(��<Z&V3��eg��M�d@HS�-�����*2���&>��X�s����������Po
�����a�oXq��l�f`R!%`A���B����&i�������L�8�;xk��2��Pe|3�����Z=�=��T�\��Y2o(���i��-����*m&�@���=0� q�	�	���[E��c��G���k/c��d.7V_W���1��Y+P�>#�N�	,9]��t�GA���ye�.�=�U�IPw���L��}I�=��PN�'M�1
��p^�9��"���u�����B�C������A�0Y��i�{�B[�D�bY�U�HZ\��i�Ns�%	�E�L�������w��I}5����@��t�8�Qzy�,�w�pM(�i����m�����9>plOv0�L���T����"�<�5e��9-5]+yew	��������z����h�����w��s�Q�}�s���j4�g(6u��t�:P}�y�6#�
�4���}:�~pcC��HD�z,S��bg��N"/g�-c��.�N-�M-�c�.��7��+��Mx~e�wd^4	�OT	|z����a��7D�F�fO0}����������TQ�(���������Yos�H��}�aF���a���2���M�qq����~G���k�`�Z��W�sHt��3P�����<6pJ�~�`����2�����C��2���%�Zb�fk�h�/�	B�/��#���_-�X�g���)n���f�`��`I�97��&{3��dbc/���:���jo*��\����WR�ze�u��/;cZS0����Si�9p�OH����?):M�������-Y�4MX���y��4_��!K��)+!Af0]�p�i�\��#�{�ey>��17�{��2��/��q���Z��(R������hN�C���J�f+������C��JZ!�����!.�mD�<j�sX�2��H �Po|�����`��EL���O�h�R�L�)����W3�7_����H����v��p�4]gJa�������F	���0�.��N�����D�������}���{�U�g�b���
��0~�Q���<K��d��j�i��B��E�
�9.4��C�����
�����f)"��A�@��O�2'n�G����%*�l��b���L�G�ZP[��R��������:��$z����C���e��h��k���������-�z�OU�O=Q(R�����%�n���ap��79T���!(�_LR.��a���	4�Y�������uG�����S7`�R|�*�BM�[������kd\��S$p-�T�q�����0��q�&�7��f�����
���,� ����eW!���Ee�4�PVl��
9k�����m���l��!�&�B1�%�$�1����I��Fv���W����&_UK�Ao0���T���p�|�`���qt�-F����)mlkI[<B���
�����I������ip���JR66
+�F$2�r���Z�p�L�N�, ��;��}a����c-��f�t��L����6Z�a���	�|�X>��
���[SOJo�_��Q������k?�z�$��cE�Ch,Eq&4VOL��j�}�XU�v�������h��O�$��=P<��WS����#����jI����U�Q��m�w3���eV�/�F��a���*�	O^Z�HiN��B���h����Dh\�1CsmTv�=B	��H���������\��V]l���_a���f�B�]5q^Y����b�8�A�8�������v�]1�|:�h�I�'�ak��8�p&�y|���Im
�����#���0e^TI�+5�P��FgUZj����.0����U��wLZ������-�%�i�3��(���%�C���
�>$����d�w9b�B�G^e�r1��@��{��f�
xlC��T��&�#��������f�U�A�B��������,T���i9l=P_��/vMH!�Q��sU���yuQ�
0&�HM�J�/x��)X,%Ur^� w>C�0t���������~/�2�\�r����}t������Q��<��4����~����p�`M>��+]������3���r8V0�B��Z"�Q�C:��1��e�ED�b���"#����/9V�W���j^?}+�<�)
TF}&Ex3{���L�~��hH�^�'���Y�k+���#E����"�l�Q���kW�E�t��4%���������r�h,���#D��bjv=����W������S����rq����
�������qjG)��a�WY.G+��$�v����j��5�G��rrv�$G�s@�9�wN�lX�d�p�zN��
~�K�x�j����a��EM�sV�����J��V��b�t�(��4������A�(y>�Q�f�kQ
=��z%z�"��N�|Q�u���:��ki�q��!�mh��~V����f�}����0m��YS�5������0�I���}J;��Pk��t�N��1A]=/�(3F��&�z��z���r��j-�,�bki��@��:
�K�P�n}6t��5��l�[&�~�8m��R<'��7���H������m�����?
[�fG%���gtc�T��ja��J�U�
p�
^Z��!��#�:7��{�Hx���r�����667�����t��f6���\1�X�U��)�(}���S���rK��.�w��������Y�9 *�|�Zv��}��M�DK���!sU}�����R�F�_�5��fj����I�2Q�7��ifm��2��,�����>�'IO�#.��Um;���{�)��
��SN�p���������
/j��k9��Z��#���5���(#�f� �=KQ.����gb�P�NN�
AU>@�������I/���6�*�>����j��G���GyH�E_��v5�|�@��
Eg�2~�����=i�i�
��+�~V���,
c��6FX��%[��!TM4����U���t�wC.�� ��Vr��}�����b�J���u���OX���zP;=5��C��X�\��4c�H���O�a�&�1�M��Ja���91G�za������!��E���$[Y��um �J+��Hdk��D"�����r�=��H \��w��������g�����Oh��@?�C�X����u��_+ M��>ud���H����$H�jPjek�^��Qs�#5�:�A;K�$�����<{���`�gc`�<���Y�/c�A
,��z����5��?�4�U�z��������Gg����W?���.�����!������������5��|��6�H����'��Mw��v<������l�����e�t�GuS�R��9;*��W���Y�l�h�<�`�%3Q�k>��\���
�<3�N�*����r���>�+�*���.4��9m����� ��J�
���*�����2��Ha,W��:���-��I:c�/�Vp�|Y�������Cf$����s��G�ED��vD8)�!�	���i��&NJf$	<�Yx��*��|Y�{n�c��c���>I��������Mi?;�\8)��F�������uD��L����{h<s!z!.YX!��fy�2
��S�C�ji��2�[�j)n���o�����K�O�u�����J�5.�����&������)��Lcij.%KW�}�{���\��}M����.�����lK����)�C���5gTxI�O�p������/fV,4�vm�iN�����U��
[���io�	����R�$�$�)�T��
=�}��{����,���g��<����)viq�G�����{M��"�#�3�qU�0j:�W��(�����o:���t��=�,������3����=���gWM�{<��t�}�����=%��?'�6J]��R^��Id%����E��L$���7���n�?��?_79�*�G�'����`���|���<�(�T{����Z���{h!c�d�x�f(��~���I���	�I�{l�K��4��RJKo�7YMB��,���	����(O���r���)��WKy:%��u��-g��BG����h�4���(�&�����&>sq��Fm`H`��*R��w|G=����/�@�(�}�
�j�*q�#�H������pz�6�5�X�pp��}��*i�J�[�#��x�%
���Yd
���&����=����|�
��R2����� Y�iU����;��/`a`x��{�Q�
�H#��n��^2vM��K��3%�h�3���Vo`�N���P?.;��G�l���"����t�MR/��&)�&\�&)�����_�A��6LK����"�1����sl
CV%-V��GTJ&
�������h��M��&�Zx��ifY[���{��n|rs!4	(�2aWW�g���T(�����Q\Y���y# C;-+���jKlxP�F�n����B�"�������,�_��d�nN�}5x���\F���c��P� �����p=��?-�D�J�=��{���?����R2>.�V�5��B���
gq����A�Y�s���f����$^�v���d�0��8U�:4~}*����8�x^��w(����	*�5��Wl�4��g��F�:�It�%���A��d�"�
�BD�?�]�r�+�3�S���J@;e�YE���fG���#HdZ�t�Oi�;������m��yT�]:E��)^�Q^���|�e<HZ����kx��xndV��q���.�S��Z�T�*��t�����O5L�*"���S�!U$�dn�&k�|m����,6����j��@���e�1|�:�~�A|���_����Sm��D3�|/Be,�h�|^�2��_/9�c�u�yC�ST^��y�����[������W�Xj�����R�m��$����`��p�.����#�����f��YY����������<E��t���/�.'���?]WU����UJj"4`h2�Sn��
�>9�V�t4dv
��o/O����VP��=�b���;���	�
�|�@����ad����v��Q��&���a�����	����	�I�P��Zm����USn������e����>h'��.q�����]u��H0�
G��z'����G�o�7� ��a�=��D���l�����5l/?J���7#�������:������T,�xL�>n�Y(@���E�c�]9g<�N��+x:YGJ2Ko�U��EKBcq���1M�>��Z�� ��D��y����h`����C���K=7���������ej�S�i(���
^����H��x�)��f%�]q*��x2��G~=��>�5&o�>u{��v	I}� <Z��
�uL �R�1`���A�g6c����� �0FO�������l�+�60�xo\��lsD{���j�M��Z�_:q.l���#�@����^�_��L�tr�4jS����/��0���^���"�?���^VC�5L���2���������p��G����^����f�{���'���%!rIP�<��VC��h�R���*�����jQ����
:.��LV������t��h����5��C*�i���TJ�65dS~�5��0V���P{Q��k!�W����v��9������Y����&,��a"�0��;|G���8�w����^�L�J�z,3�����lv���<���(Zv5I�����5G��&�������q�2IM��Lp��* t�	*%�K�g(����J���[���-Ct�4���ex%:k�(i��+��|��u/�*���2�i��wI����L=��%F[�6���m�hm �����_`��e��Kc������9��|h�0V�u�L�5��hta���hRpQ����R#^�<2��Vm}+����m��Z���X�o��t�3&�o�(&�8C���]���j"�7�/�t������n�����=����mK�!Z���WIZ
�I��1;Q��������>rq"�����H���$������K��o�p�����"0A�m_U|i��h�n27�
�E�.�t@F�>�b��/�M7yq<�������{nM���O��gVw�n�H�)�Kg	�4YB��'0
N\'���2+���;��N"�qB���uF�M6'5�H�Lw��Y&M!�&B
D����]��^�m���\��t���� ��vH�!������p+o����=<9�
0������T7�x������Q��
�X����W\9��p����K�WA�6{�b|���_���G/�3q�Eg8�Y�%M���8��b[�b������>sY��1�I#��Z����|�H��6���%�L)��N<�;�o�<��8���c����s��3����Y[��j�0�w}�je�n�#�Da8A��]U��	$%�NI���7�U�\E�cfE�:���{�z�4���5<��.>341������~R!�e�|%��y�(I^�[���I:H)ZA�`��0��~���c�)�0��[l������q��N��b��P�����A�����Y�H)i�u�m��S������9�+�K$��7��i���R�2jC ������>��IY����.-i_�?�����t����N�����q�����=�6�4������]��''O���?N��L�4��\���������
5���g'�O~8=_?q�k���m��Y�������7���a��iko���:{�v7/��������t�P�1c��g}���f������A��)���y.B�K�Ii~N���N���;����&������Z�RE3 v������w���������t������am�X����w�W��L#�7^{|�yg}k��T0�ll�\K?���t��Xw���������j7��6��
��h�n��w���������9�{z����z����k|�s
&0�~��#��q9iznB&U�7l���]�a�`G;��
]���������J��8���k0 �9c>��%"��
�8� �:>��o8�P3ml���(6A��������6w#���D/-MH�B(k{;.����s�Sg������sH a���A,�o!}*	~8���g���$W ��6�s���sN,J���H�0�!UD�L�P��i�H/)�4L� ��S����db��_(��p��^&heqk��`mX}} �WLV�
h(���0�&S�����{������+�x��dB�fTz�t�ORq�u$u?J���a��2[[r?�+����;���7V������H �T$@�v�XoMH�����c�R�5�r���g!�d��%��J�"�^�J�f����S��m���;���� os����KE�����T��M����&L�?����K�������15,��+����!�G�=����V�?�:S,�i����x#\������}�V�`��4=��v[7���u�W\������"��
���"Z�
��3@v�qq�����L��!�3(q�h�!wfi�~X����m�G�W0�!eW�pE��:�$�c�v����waUr��Ke�M0{�(�m�cGe���b
)T����\�X[u��#gH���/,�5#������d.���+l`Zk���^�9��S��=���Q>�����<5};��'�xy��*�j��}e�8{#H�� $o��e�@�����K3�p�1����n��+�R�����4Bq���KU&�P�;d��N���

7#�(KO�.�t����=��M �J�f��G����
��\~�$Ti^�V��}�o���|F]���(�m��[���3�I#BiM�h	%I+��5��_����G��R�+G���f��@�h�hT�&��v������d�5)����'��p�=:4JG;)UvS���x��~���2��up6.� ����h���~����.h�%��q�j|fv��B������1*���Kg�*Z
���B���x��2�:n!j������(^�M���v�.iu'��w�q?v�O�YC_����3����PCZ����O��d��gy,jzQ.dz�
�hT���W��EH�Z������6dcY)�9+�/)��Z!��W�w�;�:��x�/k�������
Q������Q:y#�+�`�Tx���dGG�o%}�7��	P����I���X�����7rVgF�E��r	���Y�.��8f5V��&��;lK�^�5��,�:�Q��1$	�.�� �6�uFa��y��w�����<^�8�flB�y4��"��j��l�z���E���=�d�b�%�"+�Q�i�?�	%����!1 ��=�gG�&oJ�vz\����C����dZ��7�}�/�&g�D�&�`��F/A����"��Q#`�n�J}Q{Lj��Mm'�u��o�
�k~���g�c�m����L4Jn���,���b����O7�J��U�Sd���AO#e{���TId(�����E���	MD�w�q�=`�.|z������o��G�_��c�������W�Sw�(#�F�t�����Cux�1x���d���1��'���3�_a�@=���%H+A��w�N#�e�t�%��Kph+Z��
�d��r���;Y���>��e]��LoJb����L��uB�w)j*17�C�����B`��,dk?�a�w��X��!`C&-k�,e/�!���������z�F�OB��>�e�gY���n�F�������&Q��k��Z�Q���;u�F�n��|�����r�YCE3�de����
T#����6�zm[�=��-h�:iP:2gx�b%<};���n�Q��%&e�s�VjRX,�h�FB�BN_k?)m+�h�enS�TJ��
}]��CZ3O�n^��F"����l�
q���v���G�gO�����U�gKp���NF�1���/]��U����hf�^����"=_�;�K V��i��k2�[&)(�}���\���rI�"���hID8���<3kw�Ob��*F��b����I�$����S������) 3hY���a|�S�!�;0��H��E*z�3��lJ�
�����Ly�����j��qun���o�(����F��?]_������E�G�)�P�D�d:�s���_��e�	�ol�4S
�������j�EO�'�/`��������
�`�^+?WsW���r:�S�U�\�N}�K	XBn��S�2
FE;��I5�C�uR���V{�R3���Bh����f���9�BX.oV�]�e���pl�;�Z�v�1�([k�n��n���Ug?��B=������'�5_=������B�%
��[���!BQ����`���N���m����o�����|-i[	�H�������Z'�D�.��1jkG�d/a�f~�>���*�:X��F+U���UY"�2Q����H���8�W����9���aI����.������$���7����!`�&����W�Y1������U��Q�B�-Q����)�n��>����N6=CF%��w2g���{��XIz8�	v���]Z�4����������2�KM��GN�wc�=V*���8�RT)o���\i�i��������
V]����H@�{K�����L�Hy��.��5b�n�����zy��	w�i�����������[X�[7�4C�K��/��Q/��B�J���=#]��p�>�)��d���5��,��E�V=tJ�~'#)�G�.��oQ��Q+%b��Q}������m��YG�z����c�}�i�j5��/7��C<�5"5�����>$���V��:F�jD�F/T�����<�����R�8j�����;,��^���c����@5o�S�����t��s.N�gm�����z�M�;w1m���>VN5�g^�����8�25�U��BazDK�3 *!��6���Z��~�r��>���Y��wWP3Be
���	�@��d
]���j��)~O��T���_���r��������R�lj�u������
����c�u�"0;�1�_1G�
'����-�jZ�7>�1��)�����<�q"58�0�IT�^�v������F����NT���n�R�9~���t����-���endstream
endobj
76 0 obj
17980
endobj
4 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 21 0 R
/Font 22 0 R
>>
/Contents 5 0 R
>>
endobj
23 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 34 0 R
/Font 35 0 R
>>
/Contents 24 0 R
>>
endobj
36 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 40 0 R
/Font 41 0 R
>>
/Contents 37 0 R
>>
endobj
42 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 46 0 R
/Font 47 0 R
>>
/Contents 43 0 R
>>
endobj
48 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 52 0 R
/Font 53 0 R
>>
/Contents 49 0 R
>>
endobj
54 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 58 0 R
/Font 59 0 R
>>
/Contents 55 0 R
>>
endobj
60 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 66 0 R
/Font 67 0 R
>>
/Contents 61 0 R
>>
endobj
68 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 72 0 R
/Font 73 0 R
>>
/Contents 69 0 R
>>
endobj
74 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/ExtGState 78 0 R
/Font 79 0 R
>>
/Contents 75 0 R
>>
endobj
3 0 obj
<< /Type /Pages /Kids [
4 0 R
23 0 R
36 0 R
42 0 R
48 0 R
54 0 R
60 0 R
68 0 R
74 0 R
] /Count 9
>>
endobj
1 0 obj
<</Type /Catalog /Pages 3 0 R
>>
endobj
7 0 obj
<</Type/ExtGState
/OPM 1>>endobj
21 0 obj
<</R7
7 0 R>>
endobj
22 0 obj
<</R17
17 0 R/R10
10 0 R/R15
15 0 R/R13
13 0 R/R8
8 0 R/R12
12 0 R/R14
14 0 R/R19
19 0 R/R9
9 0 R>>
endobj
26 0 obj
<</Type/ExtGState
/OPM 1>>endobj
34 0 obj
<</R26
26 0 R>>
endobj
35 0 obj
<</R33
33 0 R/R31
31 0 R/R10
10 0 R/R13
13 0 R/R8
8 0 R/R27
27 0 R/R14
14 0 R/R19
19 0 R>>
endobj
39 0 obj
<</Type/ExtGState
/OPM 1>>endobj
40 0 obj
<</R39
39 0 R>>
endobj
41 0 obj
<</R13
13 0 R/R8
8 0 R/R14
14 0 R>>
endobj
45 0 obj
<</Type/ExtGState
/OPM 1>>endobj
46 0 obj
<</R45
45 0 R>>
endobj
47 0 obj
<</R13
13 0 R/R8
8 0 R/R14
14 0 R>>
endobj
51 0 obj
<</Type/ExtGState
/OPM 1>>endobj
52 0 obj
<</R51
51 0 R>>
endobj
53 0 obj
<</R33
33 0 R/R13
13 0 R/R8
8 0 R/R27
27 0 R/R14
14 0 R/R19
19 0 R>>
endobj
57 0 obj
<</Type/ExtGState
/OPM 1>>endobj
58 0 obj
<</R57
57 0 R>>
endobj
59 0 obj
<</R33
33 0 R/R13
13 0 R/R8
8 0 R/R27
27 0 R/R14
14 0 R>>
endobj
63 0 obj
<</Type/ExtGState
/OPM 1>>endobj
66 0 obj
<</R63
63 0 R>>
endobj
67 0 obj
<</R33
33 0 R/R13
13 0 R/R8
8 0 R/R65
65 0 R/R27
27 0 R/R14
14 0 R>>
endobj
71 0 obj
<</Type/ExtGState
/OPM 1>>endobj
72 0 obj
<</R71
71 0 R>>
endobj
73 0 obj
<</R33
33 0 R/R13
13 0 R/R8
8 0 R/R65
65 0 R/R27
27 0 R>>
endobj
77 0 obj
<</Type/ExtGState
/OPM 1>>endobj
78 0 obj
<</R77
77 0 R>>
endobj
79 0 obj
<</R13
13 0 R/R8
8 0 R/R14
14 0 R>>
endobj
20 0 obj
<</Filter/FlateDecode/Length 16>>stream
x�32R0��C.��
endstream
endobj
28 0 obj
<</Filter/FlateDecode/Length 218>>stream
x�31U0P0V�5T01Q0�PH1�*��s�P �"���9\\�
y\�
&�@�
�����\N�\���
%E��\��@\�@y.}�g �%��� �K�M���
�
�������w��)��+j����tQP[p%a�Nf����������/��s|8�"P������}���������������7��\c������A@DZ������B W ��Nx
endstream
endobj
18 0 obj
<</Filter/FlateDecode/Length 16>>stream
x�32Q0��C.�
endstream
endobj
32 0 obj
<</Filter/FlateDecode/Length 164>>stream
x�34V0P0accs�C�B.7H�9\q0m�2����
��\�Hur.��'����BIQi*�~8P�K�(�����`���m�e�����������{+�e��sJs���Z���<]�����w�������v��W�^�.�08�p�z*rr��1�
endstream
endobj
29 0 obj
<</Filter/FlateDecode/Length 205>>stream
x�36S0P0V�5T06�C�B.0?Hr� `(�������U�`l�
�l�����������PRT���T������w
pV0��w�6�2���w�wvvrll���z����sJs���Z���<]��9|���F�����K�~Y�����_��9�9���5}��oo������[�Kox�)�n�u��������S!�+��	?�
endstream
endobj
16 0 obj
<</Filter/FlateDecode/Length 15>>stream
x�32P@�C.��
endstream
endobj
30 0 obj
<</Filter/FlateDecode/Length 178>>stream
x�32S0P0R�5T02S01QH1�*��s�P �"���9\\�
y\�
F&@�
&����\N�\���
%E��\��@\�@y.}�gC.}�hC.�X.}7}gg7 7��F���K�9?�47�������EAm�������^��E���jK*��#���nsw�NX����������|=�
endstream
endobj
80 0 obj
<</Subtype/Type1C/Filter/FlateDecode/Length 81 0 R>>stream
x�cd`ab`dd��
	16L6�L60	�t�0w���q1�th�ww�<��	���I������������������������}��d8ft����p20���
endstream
endobj
81 0 obj
101
endobj
82 0 obj
<</Subtype/Type1C/Filter/FlateDecode/Length 83 0 R>>stream
x�uSmLSW>���P�x��w�9A��)��@Ju� ��������b����E�����M\U`�/E��)�L�d�W��D��>����C���,��?''����y�������(j���~����.N��l�b�Px���	�h��^d�C�+��q�������'�����U3MQ��/���N�����-����p���e\\�s�[��
E&���/^��������S�l�����fg��"N��g���ek��|n��l*,�������11����,�T��W�ej-E�&Ne��,Eb[�������O!4������-�]�2��h&��4�h6
Ea(�AAb@�_�v���8e�NQ���;MG����������]�=�H�\/��T���4���d�,a�r�CBA2���*'3��|����i����>�4H��U�B���h�L�(�)�G��qG5��<�
�1
m<OD��� �p�>f���16�)J�����0���r����;I������[_��u��;{���O�p/��1B�K���,�i��_��cNC1/�G�zCC��&����*\}�����9�Wn��V����Xq��h}��L��D�����ngf`��0���[���������9�<D�����f�]��
3(��|���;���*�N]v���C/��h��^L��CI��zc�����frK�2��$x !P%D�d����qxa����L���.<c��1�&�5���������_��d���$����XQ�Yy��q���*[MASf�e�������]�;x����~&�&��l���x�<�y�qJGhA)�d���=p��������D����!R`o�����KWkn��O|BP����/�p���9w]os�e|7�9���J��]��,~g������t�����c���fPBd$5��H3*��`'Ha�OF�\\�}���Fo�����Xa
�����J����~B�!�����%�%���]o9��3����L�'����8�V�ka~���]��cu���9=c�����l�D_�f�Mnz����{��}��C����j��1���J2�O�+�����w�������o�M^#s=1���o���)�Yv�]gq�����-�������nIN�_����[�BZ�\.P��\~��4�:(h�DP0B.LD�
endstream
endobj
83 0 obj
1195
endobj
33 0 obj
<</BaseFont/Helvetica/Type/Font
/Subtype/Type1>>
endobj
17 0 obj
<</Type/Font
/Encoding 84 0 R/CharProcs <</G03 18 0 R
>>/FontMatrix[0.0103093 0 0 0.0103093 0 0]/FontBBox[-97 -97 97 97]/FirstChar 3/LastChar 3/Widths[ 24]
/Subtype/Type3>>
endobj
84 0 obj
<</Type/Encoding/Differences[
3/G03]>>
endobj
31 0 obj
<</Type/Font
/Encoding 85 0 R/CharProcs <</G4C 32 0 R
>>/FontMatrix[0.0178571 0 0 0.0178571 0 0]/FontBBox[-56 -56 56 56]/FirstChar 76/LastChar 76/Widths[ 13]
/Subtype/Type3>>
endobj
85 0 obj
<</Type/Encoding/Differences[
76/G4C]>>
endobj
10 0 obj
<</BaseFont/Symbol/Type/Font
/Subtype/Type1>>
endobj
15 0 obj
<</Type/Font
/Encoding 86 0 R/CharProcs <</G03 16 0 R
>>/FontMatrix[0.0123457 0 0 0.0123457 0 0]/FontBBox[-81 -81 81 81]/FirstChar 3/LastChar 3/Widths[ 20]
/Subtype/Type3>>
endobj
86 0 obj
<</Type/Encoding/Differences[
3/G03]>>
endobj
13 0 obj
<</BaseFont/Times-Roman/Type/Font
/Encoding 87 0 R/Subtype/Type1>>
endobj
87 0 obj
<</Type/Encoding/Differences[
39/quotesingle
145/quoteleft/quoteright/quotedblleft/quotedblright
150/endash
237/iacute]>>
endobj
8 0 obj
<</BaseFont/Times-Bold/Type/Font
/Subtype/Type1>>
endobj
12 0 obj
<</BaseFont/UCERTV+MSTT31c59c00/FontDescriptor 11 0 R/Type/Font
/FirstChar 3/LastChar 3/Widths[ 248]
/Encoding 88 0 R/Subtype/Type1>>
endobj
88 0 obj
<</Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[
3/G03]>>
endobj
65 0 obj
<</BaseFont/ZFPZDW+Helvetica-Narrow/FontDescriptor 64 0 R/Type/Font
/FirstChar 46/LastChar 57/Widths[ 228 0
456 456 456 456 456 456 456 456 456 456]
/Encoding/WinAnsiEncoding/Subtype/Type1>>
endobj
27 0 obj
<</BaseFont/Helvetica-Bold/Type/Font
/Subtype/Type1>>
endobj
14 0 obj
<</BaseFont/Times-Italic/Type/Font
/Encoding 89 0 R/Subtype/Type1>>
endobj
89 0 obj
<</Type/Encoding/Differences[
146/quoteright]>>
endobj
19 0 obj
<</Type/Font
/Encoding 90 0 R/CharProcs <</G57 30 0 R
/G44 29 0 R
/G47 28 0 R
/G03 20 0 R
>>/FontMatrix[0.011236 0 0 0.011236 0 0]/FontBBox[-89 -89 89 89]/FirstChar 3/LastChar 87/Widths[ 22 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 36 0 0 45 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 26]
/Subtype/Type3>>
endobj
90 0 obj
<</Type/Encoding/Differences[
3/G03
68/G44
71/G47
87/G57]>>
endobj
9 0 obj
<</BaseFont/Courier/Type/Font
/Subtype/Type1>>
endobj
11 0 obj
<</Type/FontDescriptor/FontName/UCERTV+MSTT31c59c00/FontBBox[0 0 1000 1000]/Flags 5
/Ascent 0
/CapHeight 0
/Descent 0
/ItalicAngle 0
/StemV 0
/AvgWidth 508
/MaxWidth 508
/MissingWidth 508
/CharSet(/G03)/FontFile3 80 0 R>>
endobj
64 0 obj
<</Type/FontDescriptor/FontName/ZFPZDW+Helvetica-Narrow/FontBBox[0 -19 429 703]/Flags 4
/Ascent 703
/CapHeight 703
/Descent -19
/ItalicAngle 0
/StemV 64
/MissingWidth 1000
/CharSet(/two/three/four/five/six/seven/eight/nine/period/zero/one)/FontFile3 82 0 R>>
endobj
2 0 obj
<</Producer(ESP Ghostscript 815.02)
/CreationDate(D:20071105154855)
/ModDate(D:20071105154855)>>endobj
xref
0 91
0000000000 65535 f 
0000156191 00000 n 
0000163246 00000 n 
0000156076 00000 n 
0000154620 00000 n 
0000000015 00000 n 
0000015086 00000 n 
0000156239 00000 n 
0000161476 00000 n 
0000162671 00000 n 
0000160949 00000 n 
0000162733 00000 n 
0000161541 00000 n 
0000161255 00000 n 
0000162052 00000 n 
0000161011 00000 n 
0000158556 00000 n 
0000160458 00000 n 
0000157968 00000 n 
0000162200 00000 n 
0000157599 00000 n 
0000156280 00000 n 
0000156310 00000 n 
0000154780 00000 n 
0000015107 00000 n 
0000034552 00000 n 
0000156426 00000 n 
0000161982 00000 n 
0000157682 00000 n 
0000158283 00000 n 
0000158638 00000 n 
0000160702 00000 n 
0000158051 00000 n 
0000160393 00000 n 
0000156468 00000 n 
0000156500 00000 n 
0000154942 00000 n 
0000034574 00000 n 
0000050874 00000 n 
0000156607 00000 n 
0000156649 00000 n 
0000156681 00000 n 
0000155104 00000 n 
0000050896 00000 n 
0000066463 00000 n 
0000156733 00000 n 
0000156775 00000 n 
0000156807 00000 n 
0000155266 00000 n 
0000066485 00000 n 
0000086508 00000 n 
0000156859 00000 n 
0000156901 00000 n 
0000156933 00000 n 
0000155428 00000 n 
0000086530 00000 n 
0000103042 00000 n 
0000157018 00000 n 
0000157060 00000 n 
0000157092 00000 n 
0000155590 00000 n 
0000103064 00000 n 
0000119739 00000 n 
0000157166 00000 n 
0000162971 00000 n 
0000161775 00000 n 
0000157208 00000 n 
0000157240 00000 n 
0000155752 00000 n 
0000119761 00000 n 
0000136524 00000 n 
0000157325 00000 n 
0000157367 00000 n 
0000157399 00000 n 
0000155914 00000 n 
0000136546 00000 n 
0000154598 00000 n 
0000157473 00000 n 
0000157515 00000 n 
0000157547 00000 n 
0000158884 00000 n 
0000159071 00000 n 
0000159091 00000 n 
0000160372 00000 n 
0000160647 00000 n 
0000160893 00000 n 
0000161200 00000 n 
0000161338 00000 n 
0000161691 00000 n 
0000162136 00000 n 
0000162595 00000 n 
trailer
<< /Size 91 /Root 1 0 R /Info 2 0 R
/ID [(�RKsyq�"A�7���)(�RKsyq�"A�7���)]
>>
startxref
163357
%%EOF
#9Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Haribabu Kommi (#8)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Hello,

The attached patch is a revised one for cache-only scan module
on top of custom-scan interface. Please check it.
The points I changed are below.

Also, the custom-scan patch, the basis of this functionality, is still
looking for the folks who can volunteer for reviewing.
https://commitfest.postgresql.org/action/patch_view?id=1282

1. +# contrib/dbcache/Makefile

Makefile header comment is not matched with file name location.

Fixed.

2.+   /*
+   * Estimation of average width of cached tuples - it does not make
+   * sense to construct a new cache if its average width is more than
+   * 30% of the raw data.
+   */

Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
otherwise it is an overhead for cache hit scenario.

This logic was moved to the block if existing cache was not hit.
So, usual path does not fall to the average width calculation.

In addition, I added one additional GUC (cache_scan.width_threshold) to
specify the threshold to determine usage of cache-scan.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, &old_cache->attrs_used);

can't we need the check to see the average width is more than 30%? During
estimation it doesn't
include the existing other attributes.

See the new ccache_new_attribute_set(). It almost works like bms_union
if total width of the union set of attributes is less than the threshold.
Elsewhere, it drops attributes included in the existing cache, but not
required this scan. Ideally, I want statistical information to know how
frequently/recently the column is referenced, but no such information right
now. So, I put a logic to drop larger columns first.

4. + lchunk->right = cchunk;
+ lchunk->l_depth = TTREE_DEPTH(lchunk->right);

I think it should be lchunk->r_depth needs to be set in a clock wise
rotation.

Fixed. Also, I found a bug that didn't update upper pointer of child
nodes when rotation was kicked.

5. can you add some comments in the code with how the block is used?

I added a source code comment around the definition of shmseg_head.
Is it understandable?

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
in the starting of the block in an array. As and when the new tuple inserts
this array increases from block start and tuples from block end.
Just need to sort this array based on item pointers, no need to update
their reference pointers.

In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
free space is not available to insert the new tuple. you can decide based
on how frequent the sorting will happen in general.

I newly added a "deadspace" field in the ccache_chunk. It shows the payload
being consumed by tuples already deleted. Because of the format of ccache_chunk,
actual free space is cchunk->usage - offsetof(ccache_chunk,
tuples[cchunk->ntups]),
but deadspace is potential free space. So, I adjusted the do_insert to kick
chunk compaction code, if target chunk has enough "potential" free space, but
no available "actual" free space to store the new tuple.
It makes the overhead around insertion of tuples since all we need to move
is pointer within cchunk->tuples, not contents itself.

8. I am not able to find a protection mechanism in insert/delete and etc
of a tuple in Ttree. As this is a shared memory it can cause problems.

I didn't change the existing logic (a giant lock per ccache_head) yet,
because I still wonder this kind of optimization may lose the simpleness
of implementation for the proof-of-concept purpose towards the hook.
Does it really needed for this purpose?

In addition, it seems to me we don't have intent-lock feature, even though
the paper you suggested gave me an impression...

9. + /* merge chunks if this chunk has enough space to merge */
+ ccache_merge_chunk(ccache, cchunk);

calling the merge chunks for every call back of heap page prune is a
overhead for vacuum. After the merge which may again leads
to node splits because of new data.

I adjusted the logic little bit. Now ccache_merge_chunk() is called
if vacuumed cache_chunk is different from the previous one. It allows
to reduce number of (trial) merge operation.

10. "columner" is present in some places of the patch. correct it.

Fixed.

11. In cache_scan_next function, incase of cache insert fails because of
shared memory the tuple pointer is not reset and cache is NULL.
Because of this during next record fetch it leads to assert as cache !=
NULL.

Fixed.

12. + if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+ cs_put_ccache(ccache);

The cache is created with refcnt as 2 and in some times two times put
cache is called to eliminate it and in some times with a different approach.
It is little bit confusing, can you explain in with comments with why
2 is required and how it maintains?

I adjusted the logic around reference count. When a cache is created,
the creator process does not decrease reference count if creation was
successfully done. Once some error happen during creation, it shall be
decreased, so we can drop the cache with one "put" operation.

13. A performance report is required to see how much impact it can cause
on insert/delete/update operations because of cache synchronizer.

I tried to assign synchronizer trigger on "pgbench_account" then run
pgbench toward the table with/without columnar-cache.
It seems to me here is no big difference more than variants.

* with columnar-cache

$ pgbench -T 15 postgres
tps = 113.586430 (including connections establishing)
tps = 113.597228 (excluding connections establishing)

* without columnar-cache
$ pgbench -T 15 postgres
tps = 110.765080 (including connections establishing)
tps = 110.775890 (excluding connections establishing)

14. The Guc variable "cache_scan_disabled" is missed in docs description.

Its description was added.

Thanks,

2014-02-14 14:21 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:

On Thu, Feb 13, 2014 at 3:27 PM, Kouhei Kaigai wrote:

8. I am not able to find a protection mechanism in insert/delete

and etc of

a tuple in Ttree. As this is a shared memory it can cause

problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to
invocation of
ccache_insert_tuple / ccache_delete_tuple.

Correct. But this lock can be a bottleneck for the concurrency. Better
to
analyze the same once we have the performance report.

Well, concurrent updates towards a particular table may cause lock
contention
due to a giant lock.
On the other hands, one my headache is how to avoid dead-locking if we try
to
implement it using finer granularity locking. Please assume per-chunk
locking.
It also needs to take a lock on the neighbor nodes when a record is moved
out.
Concurrently, some other process may try to move another record with
inverse
order. That is a ticket for dead-locking.

Is there idea or reference to implement concurrent tree structure
updating?

Anyway, it is a good idea to measure the impact of concurrent updates on
cached tables, to find out the significance of lock splitting.

we can do some of the following things,
1. Let only insert can take the exclusive lock.
2. Always follow the locking order from root to the children.
3. For delete take exclusive lock only on the exact node where the delete is
happening.
and etc, we will identify some more based on the performance data.

And one more interesting document i found in the net while searching for the
concurrency in Ttree,
which says that B-tree can outperform Ttree as an in-memory index also with
an little bit expensive of more memory
usage. The document is attached in the mail.

Regards,
Hari Babu
Fujitsu Australia

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-4.v7.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-4.v7.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    3 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   81 ++
 contrib/cache_scan/ccache.c                        | 1504 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         |  921 ++++++++++++
 doc/src/sgml/cache-scan.sgml                       |  266 ++++
 doc/src/sgml/contrib.sgml                          |    1 +
 doc/src/sgml/custom-scan.sgml                      |   14 +
 doc/src/sgml/filelist.sgml                         |    1 +
 src/backend/access/heap/pruneheap.c                |   13 +
 src/backend/utils/time/tqual.c                     |    7 +
 src/include/access/heapam.h                        |    7 +
 14 files changed, 2868 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..c409817
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/cache_scan/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..4bd04d1
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_synchronizer()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..718a2de
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,3 @@
+DROP FUNCTION public.cache_scan_synchronizer() CASCADE;
+DROP FUNCTION public.cache_scan_debuginfo() CASCADE;
+DROP TYPE public.__cache_scan_debuginfo;
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..c9cb259
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,81 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	uint32			deadspace;	/* payload by dead tuples */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+/*
+ * Status flag of columnar cache. A ccache_head is created with status of
+ * CCACHE_STATUS_INITIALIZED, then someone picks up the cache_head from
+ * the hash table and marks it as CCACHE_STATUS_IN_PROGRESS; that means
+ * this cache is under construction by a particular scan. Once it got
+ * constructed, it shall have CCACHE_STATUS_CONSTRUCTED state.
+ */
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLock			lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] or
+								 * free_list. Elsewhere, unlinked */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern Bitmapset *ccache_new_attribute_set(Oid tableoid,
+										   Bitmapset *required,
+										   Bitmapset *existing);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+extern void untrack_ccache_locally(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern void ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_synchronizer(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..2fdd2ae
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1504 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/barrier.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	dlist_head		slots[FLEXIBLE_ARRAY_MEMBER];
+} ccache_hash;
+
+/*
+ * shmseg_head
+ *
+ * A data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory segment on its startup time, then it shall be split into
+ * multiple fixed-length memory blocks. All (internal) memory allocation and
+ * release shall be done by a block, to avoid memory fragmentation that
+ * eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Any elements in its blocks[] array represents the state of a particular
+ * block being associated with. If it is chained to the free_list, it means
+ * this block is not allocated yet. Elsewhere, it is allocated to someone,
+ * thus unavailable to allocate it.
+ *
+ * A block-mapped region is dealt with a ccache_chunk. This structure has
+ * some fixed-length field and variable length array to store pointers of
+ * HeapTupleData. This array will grow up from the head to tail direction
+ * according to the number of tuples being stored on the block. On the
+ * other hands, contents of heap-tuple shall be put on the tail of blocks,
+ * then its usage will grow up from the tail to head direction.
+ * Thus, a chunk (= a block) can store multiple heap-tuples unless its
+ * usage for the pointer array does not cross its usage for the contents
+ * of heap-tuples.
+ *
+ * [layout of a block]
+ * +------------------------+  +0
+ * | struct ccache_chunk {  |
+ * |       :                |
+ * |       :                |
+ * |   HeapTuple tuples[];  |
+ * | };    :                |
+ * |       |                |
+ * |       v                |
+ * |                        |
+ * |                        |
+ * |       ^                |
+ * |       |                |
+ * |   buffer for           |
+ * | tuple contents         |
+ * |       |                |
+ * |       :                |
+ * +------------------------+  +(shmseg_blocksize - 1)
+ */
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	dlist_node		blocks[FLEXIBLE_ARRAY_MEMBER];
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+#define AssertIfNotShmem(addr)										\
+	Assert((addr) == NULL ||										\
+		   (((Size)(addr)) >= cs_shmseg_head->base_address &&		\
+			((Size)(addr)) < (cs_shmseg_head->base_address +		\
+						(Size)shmseg_num_blocks * (Size)shmseg_blocksize)))
+
+/*
+ * cchunk_sanity_check - for debugging
+ */
+static void
+cchunk_sanity_check(ccache_chunk *cchunk)
+{
+#ifdef USE_ASSERT_CHECKING
+	ccache_chunk   *uchunk = cchunk->upper;
+
+	Assert(!uchunk || uchunk->left == cchunk || uchunk->right == cchunk);
+	AssertIfNotShmem(cchunk->right);
+	AssertIfNotShmem(cchunk->left);
+
+	Assert(cchunk->usage <= shmseg_blocksize);
+	Assert(offsetof(ccache_chunk, tuples[cchunk->ntups]) <= cchunk->usage);
+#if 0	/* more nervous sanity checks */
+	{
+		int		i;
+		for (i=0; i < cchunk->ntups; i++)
+		{
+			HeapTuple	tuple = cchunk->tuples[i];
+
+			Assert(tuple != NULL &&
+				   (ulong)tuple >= (ulong)(&cchunk->tuples[cchunk->ntups]) &&
+				   (ulong)tuple < (ulong)cchunk + shmseg_blocksize);
+			Assert(tuple->t_data != NULL &&
+				   (ulong)tuple->t_data >= (ulong)tuple &&
+				   (ulong)tuple->t_data < (ulong)cchunk + shmseg_blocksize);
+		}
+	}
+#endif
+#endif
+}
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+		cchunk->deadspace = 0;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define TTREE_DEPTH(chunk)	\
+	((chunk) == 0 ? 0 : Max((chunk)->l_depth, (chunk)->r_depth) + 1)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		cchunk->upper = rchunk;
+		if (cchunk->right)
+			cchunk->right->upper = cchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = TTREE_DEPTH(rchunk->left);
+		rchunk->upper = upper;
+		cchunk->upper = rchunk;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = TTREE_DEPTH(rchunk);
+		}
+		else
+		{
+			Assert(upper->right == cchunk);
+			upper->right = rchunk;
+			upper->r_depth = TTREE_DEPTH(rchunk);
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(rchunk->left);
+		AssertIfNotShmem(rchunk->right);
+		AssertIfNotShmem(rchunk->upper);
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		cchunk->upper = lchunk;
+		if (cchunk->left)
+			cchunk->left->upper = cchunk;
+
+		lchunk->right = cchunk;
+		lchunk->r_depth = TTREE_DEPTH(lchunk->right);
+		lchunk->upper = upper;
+		cchunk->upper = lchunk;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			Assert(upper->left == cchunk);
+			upper->left = lchunk;
+			upper->l_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(lchunk->left);
+		AssertIfNotShmem(lchunk->right);
+		AssertIfNotShmem(lchunk->upper);
+	}
+	cchunk_sanity_check(cchunk);
+}
+
+/* it computes "actual" free space we can use right now */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups + 1]))
+/* it computes "expected" free space we can use if compaction */
+#define cchunk_availablespace(cchunk)	\
+	(cchunk_freespace(cchunk) + (cchunk)->deadspace)
+
+/*
+ * ccache_chunk_compaction
+ *
+ * It moves existing tuples to eliminate dead spaces of the chunk.
+ * Eventually, chunk's deadspace shall become zero.
+ */
+static void
+ccache_chunk_compaction(ccache_chunk *cchunk)
+{
+	ccache_chunk   *temp = alloca(shmseg_blocksize);
+	int				i;
+
+	/* setting up temporary chunk */
+	temp->upper		= cchunk->upper;
+	temp->right		= cchunk->right;
+	temp->left		= cchunk->left;
+	temp->r_depth	= cchunk->r_depth;
+	temp->l_depth	= cchunk->l_depth;
+	temp->ntups		= cchunk->ntups;
+	temp->usage		= shmseg_blocksize;
+	temp->deadspace	= 0;
+
+	for (i=0; i < cchunk->ntups; i++)
+	{
+		HeapTuple	tuple = cchunk->tuples[i];
+		HeapTuple	dest;
+		uint32		required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);
+		uint32		offset;
+
+		Assert(required <= cchunk_freespace(temp));
+
+		temp->usage -= required;
+		offset = temp->usage;
+
+		dest = (HeapTuple)((char *)temp + offset);
+		memcpy(dest, tuple, HEAPTUPLESIZE);
+		memcpy((char *)dest + HEAPTUPLESIZE,
+			   tuple->t_data, tuple->t_len);
+		/*
+		 * contents of temp chunk shall be copied later, so all the pointer
+		 * values needs to point a field on the cchunk, with same offset.
+		 */
+		dest->t_data = (HeapTupleHeader)((char *)cchunk +
+										 temp->usage + HEAPTUPLESIZE);
+		temp->tuples[i] = (HeapTuple)((char *)cchunk + offset);
+	}
+	elog(LOG, "chunk compaction: old usage=%u -> new usage=%u",
+		 cchunk->usage, temp->usage);
+	memcpy(cchunk, temp, shmseg_blocksize);
+	cchunk_sanity_check(cchunk);
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	uint32		required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);
+
+	if (required > cchunk_freespace(cchunk))
+		ccache_chunk_compaction(cchunk);
+	Assert(required <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		memmove(&cchunk->tuples[i_min + 1],
+				&cchunk->tuples[i_min],
+				sizeof(HeapTuple) * (cchunk->ntups - i_min));
+	}
+	cchunk->usage -= required;
+	newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+	memcpy(newtup, tuple, HEAPTUPLESIZE);
+	newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+	memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+	cchunk->tuples[i_min] = newtup;
+	cchunk->ntups++;
+
+	cchunk_sanity_check(cchunk);
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+
+		return true;
+	}
+
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left && required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+			cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right && required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		}
+	}
+	else
+	{
+		if (required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->deadspace += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	int			i, j;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	return ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion. 
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_chunk
+ *
+ * It merges two chunks if these have enough free space to consolidate
+ * its contents into one.
+ */
+static void
+ccache_merge_chunk(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	ccache_chunk   *curr;
+	ccache_chunk  **upper;
+	int			   *p_depth;
+	int				i;
+	long			required;
+	bool			needs_rebalance = false;
+
+	cchunk_sanity_check(cchunk);
+
+	/*
+	 * Find the least right node that has no left node; that is the neighbor
+	 * one with greater ctid's
+	 */
+	upper = &cchunk->right;
+	p_depth = &cchunk->r_depth;
+	curr = cchunk->right;
+	while (curr != NULL)
+	{
+		cchunk_sanity_check(curr);
+		if (curr->left)
+		{
+			upper = &curr->left;
+			p_depth = &curr->l_depth;
+			curr = curr->left;
+			continue;
+		}
+
+		required = (shmseg_blocksize - curr->usage +
+					sizeof(HeapTuple) * curr->ntups);
+		if (required <= cchunk_availablespace(cchunk))
+		{
+			if (required > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			for (i=0; i < curr->ntups; i++)
+			{
+				HeapTuple	oldtup = curr->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data
+					= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data,
+					   MAXALIGN(oldtup->t_len));
+				cchunk->tuples[cchunk->ntups++] = newtup;
+			}
+			/* detach the current chunk */
+			*upper = curr->right;
+			*p_depth = curr->r_depth;
+			if (curr->right)
+				curr->right->upper = curr->upper;
+
+			/* release it */
+			memset(curr, 0xdeadbeaf, shmseg_blocksize);
+			cs_free_shmblock(curr);
+			needs_rebalance = true;
+		}
+		cchunk_sanity_check(cchunk);
+		break;
+	}
+
+	/*
+	 * Find the greatest left node that has no right node; that is the neighbor
+	 * one with less ctid's
+	 */
+	upper = &cchunk->left;
+	p_depth = &cchunk->l_depth;
+	curr = cchunk->left;
+
+	while (curr != NULL)
+	{
+		cchunk_sanity_check(curr);
+		if (curr->right)
+		{
+			upper = &curr->right;
+			p_depth = &curr->r_depth;
+			curr = curr->right;
+			continue;
+		}
+
+	    required = (shmseg_blocksize - curr->usage +
+					sizeof(HeapTuple) * curr->ntups);
+		if (required <= cchunk_availablespace(cchunk))
+		{
+			if (required > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+
+			/* merge contents */
+			memmove(&cchunk->tuples[curr->ntups],
+					&cchunk->tuples[0],
+					sizeof(HeapTuple) * cchunk->ntups);
+			cchunk->ntups += curr->ntups;
+
+			for (i=0; i < curr->ntups; i++)
+			{
+				HeapTuple	oldtup = curr->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= HEAPTUPLESIZE + MAXALIGN(oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data
+					= (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data,
+					   MAXALIGN(oldtup->t_len));
+				cchunk->tuples[i] = newtup;
+			}
+			/* detach the current chunk */
+			*upper = curr->left;
+			*p_depth = curr->l_depth;
+			if (curr->left)
+				curr->left->upper = curr->upper;
+			/* release it */
+			memset(curr, 0xfee1dead, shmseg_blocksize);
+			cs_free_shmblock(curr);
+			needs_rebalance = true;
+		}
+		cchunk_sanity_check(cchunk);
+		break;
+	}
+	/* Rebalance the tree, if needed */
+	if (needs_rebalance)
+		ccache_rebalance_tree(ccache, cchunk);
+}
+
+/*
+ * ccache_vacuum_page
+ *
+ * It reclaims the tuples being already vacuumed. It shall be kicked on
+ * the callback function of heap_page_prune_hook to synchronize contents
+ * of the cache with on-disk image.
+ */
+static ccache_chunk *
+ccache_vacuum_tuple(ccache_head *ccache,
+					ccache_chunk *cchunk,
+					ItemPointer ctid)
+{
+	ItemPointer	min_ctid;
+	ItemPointer	max_ctid;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+
+	if (cchunk->ntups == 0)
+		return NULL;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (cchunk->left)
+			return ccache_vacuum_tuple(ccache, cchunk->left, ctid);
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (cchunk->right)
+			return ccache_vacuum_tuple(ccache, cchunk->right, ctid);
+	}
+	else
+	{
+		while (i_min < i_max)
+		{
+			int	i_mid = (i_min + i_max) / 2;
+
+			if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+				i_max = i_mid;
+			else
+				i_min = i_mid + 1;
+		}
+		Assert(i_min == i_max && i_min < cchunk->ntups);
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+		{
+			HeapTuple	tuple = cchunk->tuples[i_min];
+			int			length = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+			int			j;
+
+			for (j=i_min+1; j < cchunk->ntups; j++)
+				cchunk->tuples[j-1] = cchunk->tuples[j];
+			cchunk->ntups--;
+			cchunk->deadspace += length;
+		}
+		else
+			elog(LOG, "ctid (%u,%u) was not on columnar cache",
+				 ItemPointerGetBlockNumber(ctid),
+				 ItemPointerGetOffsetNumber(ctid));
+
+		return cchunk;
+	}
+	return NULL;
+}
+
+void
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* Note that it needs buffer being valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	Page			page = BufferGetPage(buffer);
+	OffsetNumber	maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber	offnum;
+	ccache_chunk   *pchunk = NULL;
+	ccache_chunk   *cchunk;
+
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemPointerData	ctid;
+		ItemId			itemid = PageGetItemId(page, offnum);
+
+		if (ItemIdIsNormal(itemid))
+			continue;
+
+		ItemPointerSetBlockNumber(&ctid, blknum);
+		ItemPointerSetOffsetNumber(&ctid, offnum);
+
+		cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+		if (pchunk != NULL && pchunk != cchunk)
+			ccache_merge_chunk(ccache, pchunk);
+		pchunk = cchunk;
+	}
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		int		i;
+
+		PG_TRY();
+		{
+			for (i=0; i < 20; i++)
+			{
+				entry = MemoryContextAlloc(TopMemoryContext,
+										   sizeof(ccache_entry));
+				dlist_push_tail(&ccache_free_list, &entry->chain);
+			}
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	LWLockInitialize(&new_cache->lock, 0);
+	new_cache->refcnt = 1;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			/* chose a set of columns to be cached */
+			if (old_cache)
+				attrs_used = ccache_new_attribute_set(tableoid,
+													  attrs_used,
+													  &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(&ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(&ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	void		   *address = NULL;
+	int				index;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			pg_memory_barrier();
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache_nolock(ccache);
+			else
+				dlist_move_head(&cs_ccache_hash->lru_list, &ccache->lru_chain);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+
+		index = dnode - cs_shmseg_head->blocks;
+		Assert(index >= 0 && index < shmseg_num_blocks);
+
+		memset(dnode, 0, sizeof(dlist_node));
+		address = (void *)((char *)cs_shmseg_head->base_address + 
+						   index * shmseg_blocksize);
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size		curr = (Size) address;
+	Size		base = cs_shmseg_head->base_address;
+	ulong		index;
+	dlist_node *dnode;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	dnode = &cs_shmseg_head->blocks[index];
+	Assert(dnode->prev == NULL && dnode->next == NULL);
+
+	dlist_push_head(&cs_shmseg_head->free_list, dnode);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	int		i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash
+		= ShmemInitStruct("cache_scan: hash of columnar cache",
+						  MAXALIGN(offsetof(ccache_hash,
+											slots[ccache_hash_size])),
+						  &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 (Size)shmseg_num_blocks *
+									 (Size)shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+	cs_shmseg_head->base_address
+		= MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		dlist_push_tail(&cs_shmseg_head->free_list,
+						&cs_shmseg_head->blocks[i]);
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							256,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   (Size)shmseg_num_blocks * (Size)shmseg_blocksize);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..c28295b
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,921 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/trigger.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/var.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+
+static bool		cache_scan_enabled;
+static double	cache_scan_width_threshold;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Try to get existing cache. If exist, we assume this cache is probably 
+	 * available on the time when this plan is executed.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		double	usage_ratio;
+		int		total_width = 0;
+		int		tuple_width = 0;
+
+		/*
+		 * Estimation of average width of cached columns - it does not make
+		 * sense to construct a new cache, if its average width is more than
+		 * the configured threshold; usually 30%.
+		 */
+		for (i=0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = tupdesc->attrs[i];
+			int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+			int		width;
+
+			if (attr->attlen > 0)
+				width = attr->attlen;
+			else
+				width = get_attavgwidth(tableoid, attr->attnum);
+
+			total_width += width;
+			if (bms_is_member(attidx, *attrs_used))
+				tuple_width += width;
+		}
+		usage_ratio = (double)tuple_width / (double)total_width;
+		if (usage_ratio > cache_scan_width_threshold / 100.0)
+			return false;
+
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		/*
+		 * If and when existing cache hold all the required attributes,
+		 * we don't need to care about width of cached columnes (because
+		 * it is obvious the width is less than threshold).
+		 */
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+/*
+ * ccache_new_attribute_set
+ *
+ * It selects attributes to be cached. In case when a part of newly required
+ * attributes are not cached, we will re-construct a new one that has union
+ * set of attributes, unless its width does not grow up larger than the
+ * configured threshold. If (required | existing) set has larger width than
+ * the threshold, we will drop attribute in (~required & existing).
+ * Usually, total width of required columns shall be less than threshold
+ * because of the checks in planner stage.
+ */
+Bitmapset *
+ccache_new_attribute_set(Oid tableoid,
+						 Bitmapset *required, Bitmapset *existing)
+{
+	Form_pg_class	relform;
+	HeapTuple		reltup;
+	Bitmapset	   *difference;
+	int			   *attrs_width;
+	int				i, anum;
+	int				total_width;
+	int				required_width;
+	int				union_width;
+	double			usage_ratio;
+
+	reltup = SearchSysCache1(RELOID, ObjectIdGetDatum(tableoid));
+	if (!HeapTupleIsValid(reltup))
+		elog(ERROR, "cache lookup failed for relation %u", tableoid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	attrs_width = palloc0(sizeof(int) * relform->relnatts);
+
+	total_width = 0;
+	required_width = 0;
+	union_width = 0;
+	for (anum = 1; anum <= relform->relnatts; anum++)
+	{
+		Form_pg_attribute	attform;
+		HeapTuple			atttup;
+
+		atttup = SearchSysCache2(ATTNUM,
+								 ObjectIdGetDatum(tableoid),
+								 Int16GetDatum(anum));
+		if (!HeapTupleIsValid(atttup))
+			elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+				 anum, tableoid);
+		attform = (Form_pg_attribute) GETSTRUCT(atttup);
+
+		if (attform->attisdropped)
+		{
+			ReleaseSysCache(atttup);
+			continue;
+		}
+
+		if (attform->attlen > 0)
+			attrs_width[anum - 1] = attform->attlen;
+		else
+			attrs_width[anum - 1] = get_attavgwidth(tableoid, anum);
+
+		total_width += attrs_width[anum - 1];
+		i = anum - FirstLowInvalidHeapAttributeNumber;
+		if (bms_is_member(i, required))
+		{
+			required_width += attrs_width[anum - 1];
+			union_width += attrs_width[anum - 1];
+		}
+		else if (bms_is_member(i, existing))
+			union_width += attrs_width[anum - 1];
+
+		ReleaseSysCache(atttup);
+	}
+	ReleaseSysCache(reltup);
+
+	/*
+	 * An easy case: if total_width is still less than the threshold,
+	 * we don't need to drop columns to cache; just propagation.
+	 */
+	usage_ratio = (double) union_width / (double) total_width;
+	if (usage_ratio <= cache_scan_width_threshold / 100.0)
+		return bms_union(required, existing);
+
+	/*
+	 * Elsewhere, we will drop a column that is not referenced with
+	 * the upcoming query, but has largest width within them, until
+	 * width of the cache is larger than the threshold.
+	 */
+	difference = bms_difference(existing, required);
+	do {
+		Bitmapset  *tempset = bms_copy(difference);
+		int			maxwidth = -1;
+		AttrNumber	maxwidth_anum = 0;
+
+		Assert(!bms_is_empty(tempset));
+		union_width = required_width;
+		while ((i = bms_first_member(tempset)) >= 0)
+		{
+			anum += FirstLowInvalidHeapAttributeNumber;
+
+			union_width += attrs_width[anum - 1];
+			if (attrs_width[anum - 1] > maxwidth)
+			{
+				maxwidth = attrs_width[anum - 1];
+				maxwidth_anum = anum;
+			}
+		}
+		pfree(tempset);
+
+		/* drop a column that has largest length */
+		Assert(maxwidth_anum > 0);
+		i = maxwidth_anum - FirstLowInvalidHeapAttributeNumber;
+		difference = bms_del_member(difference, i);
+		union_width -= maxwidth;
+
+		usage_ratio = (double) union_width / (double) total_width;
+	} while (usage_ratio > cache_scan_width_threshold / 100.0);
+
+	pfree(attrs_width);
+
+	return bms_union(required, difference);
+}
+
+/*
+ * cs_relation_has_synchronizer
+ *
+ * A table that can have columnar-cache also needs to have trigger for
+ * synchronization, to ensure the on-memory cache keeps the latest contents
+ * of the heap. It returns TRUE, if supplied relation has triggers that
+ * invokes cache_scan_synchronizer on appropriate context. Elsewhere, FALSE
+ * shall be returned.
+ */
+static bool
+cs_relation_has_synchronizer(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_synchronizer = false;
+	bool	has_on_update_synchronizer = false;
+	bool	has_on_delete_synchronizer = false;
+	bool	has_on_truncate_synchronizer = false;
+
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_synchronizer") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0)
+			{
+				int16		tgtype = trig->tgtype;
+
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_INSERT))
+					has_on_insert_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_UPDATE))
+					has_on_update_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_DELETE))
+					has_on_delete_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_STATEMENT,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_TRUNCATE))
+					has_on_truncate_synchronizer = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+
+	if (has_on_insert_synchronizer &&
+		has_on_update_synchronizer &&
+		has_on_delete_synchronizer &&
+		has_on_truncate_synchronizer)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (!cache_scan_enabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_synchronizer(rel))
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		if (cs_estimate_costs(root, baserel, rel, cpath, &attrs_used))
+		{
+			cpath->custom_name = pstrdup("cache scan");
+			cpath->custom_flags = 0;
+			cpath->custom_private
+				= list_make1(makeString(bms_to_string(attrs_used)));
+
+			add_path(baserel, &cpath->path);
+		}
+	}
+	heap_close(rel, NoLock);
+}
+
+static void
+cs_init_custom_scan_plan(PlannerInfo *root,
+						 CustomScan *cscan_plan,
+						 CustomPath *cscan_path,
+						 List *tlist,
+						 List *scan_clauses)
+{
+	List	   *quals = NIL;
+	ListCell   *lc;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* extract the supplied RestrictInfo */
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst(lc);
+		quals = lappend(quals, rinfo->clause);
+	}
+
+	/* do nothing something special pushing-down */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = quals;
+	cscan_plan->custom_private = cscan_path->custom_private;
+}
+
+typedef struct
+{
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	bool			normal_seqscan;
+	bool			with_construction;
+} cs_state;
+
+static void
+cs_begin_custom_scan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Relation		rel = node->ss.ss_currentRelation;
+	EState		   *estate = node->ss.ps.state;
+	HeapScanDesc	scandesc = NULL;
+	cs_state	   *csstate;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	csstate = palloc0(sizeof(cs_state));
+
+	attrs_used = bms_from_string(strVal(linitial(cscan->custom_private)));
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_SHARED);
+		if (ccache->status < CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(&ccache->lock);
+			LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(&ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	node->ss.ss_currentScanDesc = scandesc;
+
+	node->custom_state = csstate;
+}
+
+/*
+ * cache_scan_needs_next
+ *
+ * We may fetch a tuple to be invisible because columnar cache stores
+ * all the living tuples, including ones updated / deleted by concurrent
+ * sessions. So, it is a job of the caller to check MVCC visibility.
+ * It decides whether we need to move the next tuple due to the visibility
+ * condition, or not. If given tuple was NULL, it is obviously a time to
+ * break searching because it means no more tuples on the cache.
+ */
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+	Relation		rel = node->ss.ss_currentRelation;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	do {
+		ccache_head	   *ccache = csstate->ccache;
+
+		if (!ccache)
+		{
+			/*
+			 * ccache == NULL implies two cases; (1) a fallback path using
+			 * regular sequential scan instead of cache-only scan (2) cache
+			 * construction got failed during scan. We need to pay attention
+			 * for the later case because it uses SnapshotAny, thus it fetches
+			 * all the tuples including invisible ones.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+			buffer = scan->rs_cbuf;
+		}
+		else if (csstate->with_construction)
+		{
+			/*
+			 * "with_construction" means the columnar cache is under
+			 * construction, so we need to fetch a tuple from heap of
+			 * the target relation and insert it into the cache.
+			 * Note that we use SnapshotAny to fetch all the tuples both
+			 * of visible and invisible ones, so it is our responsibility
+			 * to check tuple visibility according to snapshot or the
+			 * current estate.
+			 * It is same even when we fetch tuples from the cache, without
+			 * referencing heap buffer.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+
+			if (HeapTupleIsValid(tuple))
+			{
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				if (ccache_insert_tuple(ccache, rel, tuple))
+					LWLockRelease(&ccache->lock);
+				else
+				{
+					/*
+					 * If ccache_insert_tuple got failed, it usually
+					 * implies lack of shared memory, thus unable to
+					 * continue construction of the columnar cacher.
+					 * So, we put the cache under construction status;
+					 * that prevents others to grab it again, and
+					 * moves to regular sequential scan for remaining
+					 * portion.
+					 */
+					cs_put_ccache(ccache);
+					LWLockRelease(&ccache->lock);
+					csstate->ccache = NULL;
+				}
+				buffer = scan->rs_cbuf;
+			}
+			else
+			{
+				/*
+				 * Once we reached end of the relation, it implies the
+				 * columnar-cache becomes constructed.
+				 */
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				ccache->status = CCACHE_STATUS_CONSTRUCTED;
+				LWLockRelease(&ccache->lock);
+				buffer = scan->rs_cbuf;
+			}
+		}
+		else
+		{
+			LWLockAcquire(&ccache->lock, LW_SHARED);
+			tuple = ccache_find_tuple(ccache->root_chunk,
+									  &csstate->curr_ctid,
+									  estate->es_direction);
+			if (HeapTupleIsValid(tuple))
+			{
+				ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+				tuple = heap_copytuple(tuple);
+			}
+			LWLockRelease(&ccache->lock);
+			buffer = InvalidBuffer;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_scan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_scan(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+
+	if (csstate->ccache)
+	{
+		ccache_head	   *ccache = csstate->ccache;
+		bool			needs_remove = false;
+
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+		if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			needs_remove = true;
+		LWLockRelease(&ccache->lock);
+
+		/*
+		 * In case when status of columnar-cache is "in-progress",
+		 * it implies the table scan didn't reach to the end of relation,
+		 * thus columnar-cache was not constructed completely.
+		 * Elsewhere, we keep the ccache that was originally created with
+		 * refcnt=1, but untrack this ccache.
+		 */
+		if (needs_remove || !csstate->with_construction)
+			cs_put_ccache(ccache);
+		else if (csstate->with_construction)
+			untrack_ccache_locally(ccache);
+	}
+	if (node->ss.ss_currentScanDesc)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+static void
+cs_rescan_custom_scan(CustomScanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+/*
+ * cache_scan_synchronizer
+ *
+ * trigger function to synchronize the columnar-cache with heap contents.
+ */
+Datum
+cache_scan_synchronizer(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		TriggerEvent	tg_event = trigdata->tg_event;
+
+		if (TRIGGER_FIRED_AFTER(tg_event) &&
+			TRIGGER_FIRED_FOR_ROW(tg_event) &&
+			TRIGGER_FIRED_BY_INSERT(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+				 TRIGGER_FIRED_BY_UPDATE(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+                 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+                 TRIGGER_FIRED_BY_DELETE(tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_STATEMENT(tg_event) &&
+				 TRIGGER_FIRED_BY_TRUNCATE(tg_event))
+		{
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache(ccache);
+		}
+		else
+			elog(ERROR, "%s: fired by unexpected context (%08x)",
+				 tg_name, tg_event);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(&ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_synchronizer);
+
+/*
+ * ccache_on_object_access
+ *
+ * It dropps an existing columnar-cache if the cached table was altered or
+ * dropped.
+ */
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	if (!ccache)
+		return;
+
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+/*
+ * ccache_on_page_prune
+ *
+ * It is a callback function when a particular heap block got vacuumed.
+ * On vacuuming, its dead space, being allocated by dead tuples, got
+ * reclaimed and tuple's location was ought to be moved.
+ * This routine also reclaims the space by dead tuples on the columnar
+ * cache according to layout changes on the heap.
+ */
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	/*
+	 * If relation already has a columnar-cache, it needs to be cleaned up
+	 * according to the heap vacuuming, also.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+		ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(&ccache->lock);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.enabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_enabled,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	DefineCustomRealVariable("cache_scan.width_threshold",
+							 "threshold percentage to be cached",
+							 NULL,
+							 &cache_scan_width_threshold,
+							 30.0,
+							 0.0,
+							 100.0,
+							 PGC_SIGHUP,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	memset(&provider, 0, sizeof(provider));
+	strncpy(provider.name, "cache scan", sizeof(provider.name));
+	provider.InitCustomScanPlan	= cs_init_custom_scan_plan;
+	provider.BeginCustomScan	= cs_begin_custom_scan;
+	provider.ExecCustomScan		= cs_exec_custom_scan;
+	provider.EndCustomScan		= cs_end_custom_scan;
+	provider.ReScanCustomScan	= cs_rescan_custom_scan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/cache-scan.sgml b/doc/src/sgml/cache-scan.sgml
new file mode 100644
index 0000000..e988b7a
--- /dev/null
+++ b/doc/src/sgml/cache-scan.sgml
@@ -0,0 +1,266 @@
+<!-- doc/src/sgml/cache-scan.sgml -->
+
+<sect1 id="cache-scan" xreflabel="cache-scan">
+ <title>cache-scan</title>
+
+ <indexterm zone="cache-scan">
+  <primary>cache-scan</primary>
+ </indexterm>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   The <filename>cache-scan</> module provides an alternative way to scan
+   relations using on-memory columner cache, instead of usual heap scan,
+   in case when previous scan already holds contents of the table on the
+   cache.
+   Unlike buffer cache, it holds contents of the limited number of columns,
+   but not whole of the record, thus it allows to hold larger number of records
+   per same amount of RAM. Probably, this characteristic makes sense to run
+   analytic queries on a table with many columns and records.
+  </para>
+  <para>
+   Once this module gets loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   on-memory columner cache, instead of regular heap scan.
+   It also performs as a proof-of-concept implementation that works on
+   the custom-scan API that enables to extend the core executor system.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Installation</title>
+  <para>
+   This module has to be loaded using
+   <xref linkend="guc-shared-preload-libraries"> parameter to acquired
+   a particular amount of shared memory on startup time.
+   In addition, the relation to be cached has special triggers, called
+   synchronizer, are implemented with <literal>cache_scan_synchronizer</>
+   function that synchronizes the cache contents according to the latest
+   heap on <command>INSERT</>, <command>UPDATE</>, <command>DELETE</> or
+   <command>TRUNCATE</>.
+  </para>
+  <para>
+   You can run this extension according to the following steps.
+  </para>
+  <procedure>
+   <step>
+    <para>
+     Adjust <xref linkend="guc-shared-preload-libraries"> parameter to
+     load <filename>cache_scan</> binary on startup time, then restart
+     the postmaster.
+    </para>
+   </step>
+   <step>
+    <para>
+     Run <xref linkend="sql-createextension"> to create synchronizer
+     function of <filename>cache_scan</>.
+<programlisting>
+CREATE EXTENSION cache_scan;
+</programlisting>
+    </para>
+   </step>
+   <step>
+    <para>
+     Create triggers of synchronizer on the target relation.
+<programlisting>
+CREATE TRIGGER t1_cache_row_sync
+    AFTER INSERT OR UPDATE OR DELETE ON t1 FOR ROW
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+CREATE TRIGGER t1_cache_stmt_sync
+    AFTER TRUNCATE ON t1 FOR STATEMENT
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+</programlisting>
+    </para>
+   </step>
+  </procedure>
+ </sect2>
+
+ <sect2>
+  <title>How does it works</title>
+  <para>
+   This module performs according to the usual fashion of
+   <xref linkend="custom-scan">.
+   It offers an alternative way to scan a relation if relation has synchronizer
+   triggers and width of referenced columns are less than 30% of average
+   record width.
+   Then, query optimizer will pick up the cheapest path. If the path chosen
+   is a custom-scan path managed by <filename>cache_scan</>, it runs on the
+   target relation using columner cache.
+   On the first time running, it tries to construct relation's cache along
+   with regular sequential scan. Next time or later, it can run on
+   the columner cache without referencing the heap.
+  </para>
+  <para>
+   You can check whether the query plan uses <filename>cache_scan</> using
+   <xref linkend="sql-explain"> command, as follows:
+<programlisting>
+postgres=# EXPLAIN (costs off) SELECT a,b FROM t1 WHERE b < pi();
+                     QUERY PLAN
+----------------------------------------------------
+ Custom Scan (cache scan) on t1
+   Filter: (b < 3.14159265358979::double precision)
+(2 rows)
+</programlisting>
+  </para>
+  <para>
+   A columner cache, associated with a particular relation, has one or more chunks
+   that performs as node or leaf of t-tree structure.
+   The <literal>cache_scan_debuginfo()</> function can dump useful informationl;
+   properties of all the active chunks as follows.
+<programlisting>
+postgres=# SELECT * FROM cache_scan_debuginfo();
+ tableoid |   status    |     chunk      |     upper      | l_depth |    l_chunk     | r_depth |    r_chunk     | ntuples |  usage  | min_ctid  | max_ct
+id
+----------+-------------+----------------+----------------+---------+----------------+---------+----------------+---------+---------+-----------+-----------
+    16400 | constructed | 0x7f2b8ad84740 | 0x7f2b8af84740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (0,1)     | (677,15)
+    16400 | constructed | 0x7f2b8af84740 | (nil)          |       1 | 0x7f2b8ad84740 |       2 | 0x7f2b8b384740 |   29126 |  233088 | (677,16)  | (1354,30)
+    16400 | constructed | 0x7f2b8b184740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (1354,31) | (2032,2)
+    16400 | constructed | 0x7f2b8b384740 | 0x7f2b8af84740 |       1 | 0x7f2b8b184740 |       1 | 0x7f2b8b584740 |   29126 |  233088 | (2032,3)  | (2709,33)
+    16400 | constructed | 0x7f2b8b584740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |    3478 | 1874560 | (2709,34) | (2790,28)
+(5 rows)
+</programlisting>
+  </para>
+  <para>
+   All the cached tuples are indexed with <literal>ctid</> order, and each chunk has
+   an array of partial tuples with min- and max- values. Its left node is linked to
+   the chunks that have tuples with smaller <literal>ctid</>, and its right node is
+   linked to the chunks that have larger ones.
+   It enables to find out tuples in timely fashion when it needs to be invalidated
+   according to heap updates by DDL, DML or vacuuming.
+  </para>
+  <para>
+   The columner cache are not owned by a particular session, so it retains the cache
+   unless it does not dropped or postmaster does not restart.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>GUC Parameters</title>
+  <variablelist>
+   <varlistentry id="guc-cache-scan-block_size" xreflabel="cache_scan.block_size">
+    <term><varname>cache_scan.block_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.block_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls length of the block on shared memory segment
+      for the columner-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columner cache on demand.
+      Too large block size damages flexibility of memory assignment, and
+      too small block size consumes much management are for each block.
+      So, we recommend to keep is as the default value; that is 2MB per block.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-num_blocks" xreflabel="cache_scan.num_blocks">
+    <term><varname>cache_scan.num_blocks</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.num_blocks</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls number of the block on shared memory segment
+      for the columner-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columner cache on demand.
+      Too small number of blocks damages flexibility of memory assignment
+      and may cause undesired cache dropping.
+      So, we recommend to set enough number of blocks to keep contents of
+      the target relations on memory.
+      Its default is <literal>64</literal>; probably too small for most of
+      real use cases.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-hash_size" xreflabel="cache_scan.hash_size">
+    <term><varname>cache_scan.hash_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.hash_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls width of the internal hash table slots; that
+      link every columnar cache distributed by table's oid.
+      Its default is <literal>128</>; no need to adjust it usually.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-max_cached_attnum" xreflabel="cache_scan.max_cached_attnum">
+    <term><varname>cache_scan.max_cached_attnum</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.max_cached_attnum</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the maximum attribute number we can cache on
+      the columner cache. Because of internal data representation, a bitmap set
+      to track attributes being cached has to be fixed-length.
+      Thus, the largest attribute number needs to be fixed preliminary.
+      Its default is <literal>128</>; although most tables likely have less than
+      100 columns.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-enabled" xreflabel="cache_scan.enabled">
+    <term><varname>cache_scan.enabled</> (<type>boolean</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.enabled</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter enables or disables the query planner's use of
+      cache-only scan, even if it is ready to run.
+      Note that this parameter does not affect to synchronizer triggers,
+      so existing columnar cache being already constructed shall be
+      synchronized, even if cache-only scan is disabled later.
+      The default is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-width_threshold" xreflabel="cache_scan.width_threshold">
+    <term><varname>cache_scan.width_threshold</> (<type>float</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.width_threshold</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the threshold to construct cache-only scan plan
+      for planner. (If the proposed scan plan cost is enough cheap, planner
+      will choose it, instead of the built-in ones.)
+      This extension tries to built a cache-only scan plan if average width of
+      the referenced columns is less than the threshold in percentage.
+      The default is <literal>30.0</> that means a cache-only scan plan shall
+      be proposed to the planner if sum of width of referenced columns is
+      less than <literal>(30.0 / 100.0) x (average width of table)</>.
+     </para>
+     <para>
+      Because columnar cache feature makes sense if width of cached columns
+      is much less than total width of table definition, it needs to control
+      table scans that references many columns that will consume unignorable
+      amount of shared memory, and eventually kills the benefit.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 7042d76..d588c3d 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -107,6 +107,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &auto-explain;
  &btree-gin;
  &btree-gist;
+ &cache-scan;
  &chkpass;
  &citext;
  &ctidscan;
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index f53902d..218a5fd 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -55,6 +55,20 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term><xref linkend="cache-scan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan refering the on-memory
+      columner cache instead of the heap, if the target relation already has
+      this cache being constructed already.
+      Unlike buffer cache, it holds limited number of columns that have been
+      referenced before, but not all the columns in the table definition.
+      Thus, it allows to cache much larger number of records on-memory than
+      buffer cache.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
  </para>
  <para>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index aa2be4b..10c7666 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -103,6 +103,7 @@
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
+<!ENTITY cache-scan      SYSTEM "cache-scan.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
 <!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 27cbac8..1fb5f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,6 +42,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bfdadc3..9775aad 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 					TransactionId OldestXmin);
 extern int heap_page_prune(Relation relation, Buffer buffer,
#10Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kohei KaiGai (#9)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Hello,

The attached patch is a revised one for cache-only scan module
on top of custom-scan interface. Please check it.

Thanks for the revised patch. Please find some minor comments.

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,
+   tuple->t_data, tuple->t_len);

For a normal tuple these two addresses are different but in case of
ccache, it is a continuous memory.
Better write a comment as even if it continuous memory, it is treated as
different only.

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);

t_len is already maxaligned. No problem of using it again, The required
length calculation is differing function to function.
For example, in below part of the same function, the same t_len is used
directly. It didn't generate any problem, but it may give some confusion.

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)
+ ccache_merge_chunk(ccache, pchunk);
+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread across two
cache chunks. Actually one cache chunk can accommodate one or more than
heap pages. it needs some other way of handling.

4. for (i=0; i < 20; i++)

Better to replace this magic number with a meaningful macro.

5. "columner" is present in sgml file. correct it.

6. "max_cached_attnum" value in the document saying as 128 by default but
in the code it set as 256.

I will start regress and performance tests. I will inform you the same once
i finish.

Regards,
Hari Babu
Fujitsu Australia

#11Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#10)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Mon, Feb 24, 2014 at 2:41 PM, Haribabu Kommi <kommi.haribabu@gmail.com>wrote:

On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Hello,

The attached patch is a revised one for cache-only scan module
on top of custom-scan interface. Please check it.

I will start regress and performance tests. I will inform you the same
once i finish.

Getting some compilation warnings while compiling the extension and also I
am not able to load the extension because of undefined symbol
"get_restriction_qual_cost".

cscan.c: In function 'cs_estimate_costs':
cscan.c:163: warning: implicit declaration of function
'get_restriction_qual_cost'
cscan.c: In function 'cs_add_scan_path':
cscan.c:437: warning: implicit declaration of function 'bms_to_string'
cscan.c:437: warning: passing argument 1 of 'makeString' makes pointer from
integer without a cast
cscan.c: In function 'cs_begin_custom_scan':
cscan.c:493: warning: implicit declaration of function 'bms_from_string'
cscan.c:493: warning: assignment makes pointer from integer without a cast

FATAL: could not load library "/postgresql/cache_scan.so":
/postgresql/cache_scan.so: undefined symbol: get_restriction_qual_cost

Regards,
Hari Babu
Fujitsu Australia

#12Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#11)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Thanks for your testing,

Getting some compilation warnings while compiling the extension and also
I am not able to load the extension because of undefined symbol
"get_restriction_qual_cost".

It seems to me you applied only part-1 portion of the custom-scan patches.

The get_restriction_qual_cost() is re-defined as extern function, from
static one, on the part-2 portion (ctidscan) to estimate cost value of the
qualifiers in extension. Also, bms_to_string() and bms_from_string() are
added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set
according to the copyObject manner.

It may make sense to include the above supplemental changes in the cache-
scan feature also, until either of them getting upstreamed.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
Sent: Tuesday, February 25, 2014 8:16 AM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

On Mon, Feb 24, 2014 at 2:41 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp>
wrote:

Hello,

The attached patch is a revised one for cache-only scan
module
on top of custom-scan interface. Please check it.

I will start regress and performance tests. I will inform you the
same once i finish.

Getting some compilation warnings while compiling the extension and also
I am not able to load the extension because of undefined symbol
"get_restriction_qual_cost".

cscan.c: In function ‘cs_estimate_costs’:
cscan.c:163: warning: implicit declaration of function
‘get_restriction_qual_cost’
cscan.c: In function ‘cs_add_scan_path’:
cscan.c:437: warning: implicit declaration of function ‘bms_to_string’
cscan.c:437: warning: passing argument 1 of ‘makeString’ makes pointer from
integer without a cast
cscan.c: In function ‘cs_begin_custom_scan’:
cscan.c:493: warning: implicit declaration of function ‘bms_from_string’
cscan.c:493: warning: assignment makes pointer from integer without a cast

FATAL: could not load library "/postgresql/cache_scan.so":
/postgresql/cache_scan.so: undefined symbol: get_restriction_qual_cost

Regards,
Hari Babu

Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kouhei Kaigai (#12)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Tue, Feb 25, 2014 at 10:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>wrote:

Thanks for your testing,

Getting some compilation warnings while compiling the extension and also
I am not able to load the extension because of undefined symbol
"get_restriction_qual_cost".

It seems to me you applied only part-1 portion of the custom-scan patches.

The get_restriction_qual_cost() is re-defined as extern function, from
static one, on the part-2 portion (ctidscan) to estimate cost value of the
qualifiers in extension. Also, bms_to_string() and bms_from_string() are
added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set
according to the copyObject manner.

It may make sense to include the above supplemental changes in the cache-
scan feature also, until either of them getting upstreamed.

Thanks for the information, I will apply other patches also and start
testing.

Regards,
Hari Babu
Fujitsu Australia

#14Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#13)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi
<kommi.haribabu@gmail.com>wrote:

Thanks for the information, I will apply other patches also and start
testing.

When try to run the pgbench test, by default the cache-scan plan is not
chosen because of more cost. So I increased the cpu_index_tuple_cost to a
maximum value or by turning off index_scan, so that the plan can chose the
cache_scan as the least cost.

The configuration parameters changed during the test are,

shared_buffers - 2GB, cache_scan.num_blocks - 1024
wal_buffers - 16MB, checkpoint_segments - 255
checkpoint_timeout - 15 min, cpu_index_tuple_cost - 100000 or
enable_indexscan=off

Test procedure:
1. Initialize the database with pgbench with 75 scale factor.
2. Create the triggers on pgbench_accounts
3. Use a select query to load all the data into cache.
4. Run a simple update pgbench test.

Plan details of pgbench simple update queries:

postgres=# explain update pgbench_accounts set abalance = abalance where
aid = 100000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Update on pgbench_accounts (cost=0.43..100008.44 rows=1 width=103)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.43..100008.44 rows=1 width=103)
Index Cond: (aid = 100000)
Planning time: 0.045 ms
(4 rows)

postgres=# explain select abalance from pgbench_accounts where aid = 100000;
QUERY PLAN
------------------------------------------------------------------------------------
Custom Scan (cache scan) on pgbench_accounts (cost=0.00..99899.99 rows=1
width=4)
Filter: (aid = 100000)
Planning time: 0.042 ms
(3 rows)

I am observing a too much delay in performance results. The performance
test script is attached in the mail.
please let me know if you find any problem in the test.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

run_reading.shapplication/x-sh; name=run_reading.shDownload
#15Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#14)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Thanks for the information, I will apply other patches also and
start testing.

When try to run the pgbench test, by default the cache-scan plan is not
chosen because of more cost. So I increased the cpu_index_tuple_cost to
a maximum value or by turning off index_scan, so that the plan can chose
the cache_scan as the least cost.

It's expected. In case of index-scan is available, its cost is obviously
cheaper than cache-scan, even if it does not issue disk-i/o.

The configuration parameters changed during the test are,

shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB,
checkpoint_segments - 255 checkpoint_timeout - 15 min,
cpu_index_tuple_cost - 100000 or enable_indexscan=off

Test procedure:
1. Initialize the database with pgbench with 75 scale factor.
2. Create the triggers on pgbench_accounts 3. Use a select query to load
all the data into cache.
4. Run a simple update pgbench test.

Plan details of pgbench simple update queries:

postgres=# explain update pgbench_accounts set abalance = abalance where
aid = 100000;
QUERY PLAN
----------------------------------------------------------------------
-------------------------------------
Update on pgbench_accounts (cost=0.43..100008.44 rows=1 width=103)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.43..100008.44 rows=1 width=103)
Index Cond: (aid = 100000)
Planning time: 0.045 ms
(4 rows)

postgres=# explain select abalance from pgbench_accounts where aid =
100000;
QUERY PLAN
----------------------------------------------------------------------
--------------
Custom Scan (cache scan) on pgbench_accounts (cost=0.00..99899.99
rows=1 width=4)
Filter: (aid = 100000)
Planning time: 0.042 ms
(3 rows)

I am observing a too much delay in performance results. The performance
test script is attached in the mail.

I want you to compare two different cases between sequential scan but
part of buffers have to be loaded from storage and cache-only scan.
It probably takes a difference.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#10)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Thanks for your reviewing.

According to the discussion in the Custom-Scan API thread, I moved
all the supplemental facilities (like bms_to/from_string) into the
main patch. So, you don’t need to apply ctidscan and postgres_fdw
patch for testing any more.
(I'll submit the revised one later)

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,

+ tuple->t_data, tuple->t_len);

For a normal tuple these two addresses are different but in case of ccache,
it is a continuous memory.
Better write a comment as even if it continuous memory, it is treated
as different only.

OK, I put a source code comment as follows:

/*
* Even though we put the body of HeapTupleHeaderData just after
* HeapTupleData, usually, here is no guarantee that both of data
* structures are located on continuous memory address.
* So, we explicitly adjust tuple->t_data to point the area just
* behind of itself, to reference the HeapTuple on columnar-cache
* as like regular ones.
*/

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);

t_len is already maxaligned. No problem of using it again, The required
length calculation is differing function to function.
For example, in below part of the same function, the same t_len is used
directly. It didn't generate any problem, but it may give some confusion.

Once I tried to trust t_len is aligned well, however, Assert() macro
said it is not a right assumption. See heap_compute_data_size(), it
computes length of tuple body and adjusts alignment according to the
"attalign" value of pg_attribute; that is not usually same with
sizeof(Datum).

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)

+ ccache_merge_chunk(ccache, pchunk);

+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread across
two cache chunks. Actually one cache chunk can accommodate one or more than
heap pages. it needs some other way of handling.

I adjusted the logic to merge the chunks as follows:

Once a tuple is vacuumed from a chunk, it also checks whether it can be merged
with its child leafs. A chunk has up to two child leafs; left one has less ctid
that the parent, and right one has greater ctid. It means a chunk without right
child in the left sub-tree or a chunk without left child in the right sub-tree
are neighbor of the chunk being vacuumed. In addition, if vacuumed chunk does not
have either (or both) of children, it can be merged with parent node.
I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down, if
vacuumed chunk has enough free space.

4. for (i=0; i < 20; i++)

Better to replace this magic number with a meaningful macro.

I rethought here is no good reason why we should construct multiple
ccache_entries at once. So, I adjusted the code to create a new
ccache_entry on demand, to track a columnar-cache being acquired.

5. "columner" is present in sgml file. correct it.

Sorry, fixed it.

6. "max_cached_attnum" value in the document saying as 128 by default but
in the code it set as 256.

Sorry, fixed it.

Also, I tried to run a benchmark of cache_scan on the case when this module
performs most effectively.

The table t1 is declared as follows:
create table t1 (a int, b float, c float, d text, e date, f char(200));
Its width is almost 256bytes/record, and contains 4milion records, thus
total table size is almost 1GB.

* 1st trial - it takes longer time than sequential scan because of columnar-
cache construction
postgres=# explain analyze select count(*) from t1 where a % 10 = 5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=200791.62..200791.64 rows=1 width=0) (actual time=63105.036..63105.037 rows=1 loops=1)
-> Custom Scan (cache scan) on t1 (cost=0.00..200741.62 rows=20000 width=0) (actual time=7.397..62832.728 rows=400000 loops=1)
Filter: ((a % 10) = 5)
Rows Removed by Filter: 3600000
Planning time: 214.506 ms
Total runtime: 64629.296 ms
(6 rows)

* 2nd trial - it takes much faster than sequential scan because of no disk access
postgres=# explain analyze select count(*) from t1 where a % 10 = 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=67457.53..67457.54 rows=1 width=0) (actual time=7833.313..7833.313 rows=1 loops=1)
-> Custom Scan (cache scan) on t1 (cost=0.00..67407.53 rows=20000 width=0) (actual time=0.154..7615.914 rows=400000 loops=1)
Filter: ((a % 10) = 5)
Rows Removed by Filter: 3600000
Planning time: 1.019 ms
Total runtime: 7833.761 ms
(6 rows)

* 3rd trial - turn off the cache_scan, so planner chooses the built-in SeqScan.
postgres=# set cache_scan.enabled = off;
SET
postgres=# explain analyze select count(*) from t1 where a % 10 = 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=208199.08..208199.09 rows=1 width=0) (actual time=59700.810..59700.810 rows=1 loops=1)
-> Seq Scan on t1 (cost=0.00..208149.08 rows=20000 width=0) (actual time=715.489..59518.095 rows=400000 loops=1)
Filter: ((a % 10) = 5)
Rows Removed by Filter: 3600000
Planning time: 0.630 ms
Total runtime: 59701.104 ms
(6 rows)

The reason why such an extreme result.
I adjusted the system page cache usage to constrain disk cache hit in the operating
system level, so sequential scan is dominated by disk access performance in this
case. On the other hand, columnar cache allowed to host whole of the records because
it omits to cache unreferenced columns.

* GUCs
shared_buffers = 512MB
shared_preload_libraries = 'cache_scan'
cache_scan.num_blocks = 400

[kaigai@iwashi backend]$ free -m
total used free shared buffers cached
Mem: 7986 7839 146 0 2 572
-/+ buffers/cache: 7265 721
Swap: 8079 265 7814

Please don't throw me stones. :-)
The primary purpose of this extension is to demonstrate usage of custom-scan
interface and heap_page_prune_hook().

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
Sent: Monday, February 24, 2014 12:42 PM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Hello,

The attached patch is a revised one for cache-only scan module
on top of custom-scan interface. Please check it.

Thanks for the revised patch. Please find some minor comments.

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,

+ tuple->t_data, tuple->t_len);

For a normal tuple these two addresses are different but in case of ccache,
it is a continuous memory.
Better write a comment as even if it continuous memory, it is treated
as different only.

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);

t_len is already maxaligned. No problem of using it again, The required
length calculation is differing function to function.
For example, in below part of the same function, the same t_len is used
directly. It didn't generate any problem, but it may give some confusion.

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)

+ ccache_merge_chunk(ccache, pchunk);

+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread across
two cache chunks. Actually one cache chunk can accommodate one or more than
heap pages. it needs some other way of handling.

4. for (i=0; i < 20; i++)

Better to replace this magic number with a meaningful macro.

5. "columner" is present in sgml file. correct it.

6. "max_cached_attnum" value in the document saying as 128 by default but
in the code it set as 256.

I will start regress and performance tests. I will inform you the same once
i finish.

Regards,
Hari Babu

Fujitsu Australia

Attachments:

pgsql-v9.4-custom-scan.part-4.v9.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-4.v9.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    3 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   81 +
 contrib/cache_scan/ccache.c                        | 1553 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         |  929 ++++++++++++
 doc/src/sgml/cache-scan.sgml                       |  266 ++++
 doc/src/sgml/contrib.sgml                          |    1 +
 doc/src/sgml/custom-scan.sgml                      |   14 +
 doc/src/sgml/filelist.sgml                         |    1 +
 src/backend/access/heap/pruneheap.c                |   13 +
 src/backend/utils/time/tqual.c                     |    7 +
 src/include/access/heapam.h                        |    7 +
 14 files changed, 2925 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..c409817
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/cache_scan/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..4bd04d1
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_synchronizer()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..718a2de
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,3 @@
+DROP FUNCTION public.cache_scan_synchronizer() CASCADE;
+DROP FUNCTION public.cache_scan_debuginfo() CASCADE;
+DROP TYPE public.__cache_scan_debuginfo;
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..c9cb259
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,81 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	uint32			deadspace;	/* payload by dead tuples */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+/*
+ * Status flag of columnar cache. A ccache_head is created with status of
+ * CCACHE_STATUS_INITIALIZED, then someone picks up the cache_head from
+ * the hash table and marks it as CCACHE_STATUS_IN_PROGRESS; that means
+ * this cache is under construction by a particular scan. Once it got
+ * constructed, it shall have CCACHE_STATUS_CONSTRUCTED state.
+ */
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLock			lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] or
+								 * free_list. Elsewhere, unlinked */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern Bitmapset *ccache_new_attribute_set(Oid tableoid,
+										   Bitmapset *required,
+										   Bitmapset *existing);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+extern void untrack_ccache_locally(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern void ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_synchronizer(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..357fbfb
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1553 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/barrier.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	dlist_head		slots[FLEXIBLE_ARRAY_MEMBER];
+} ccache_hash;
+
+/*
+ * shmseg_head
+ *
+ * A data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory segment on its startup time, then it shall be split into
+ * multiple fixed-length memory blocks. All (internal) memory allocation and
+ * release shall be done by a block, to avoid memory fragmentation that
+ * eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Any elements in its blocks[] array represents the state of a particular
+ * block being associated with. If it is chained to the free_list, it means
+ * this block is not allocated yet. Elsewhere, it is allocated to someone,
+ * thus unavailable to allocate it.
+ *
+ * A block-mapped region is dealt with a ccache_chunk. This structure has
+ * some fixed-length field and variable length array to store pointers of
+ * HeapTupleData. This array will grow up from the head to tail direction
+ * according to the number of tuples being stored on the block. On the
+ * other hands, contents of heap-tuple shall be put on the tail of blocks,
+ * then its usage will grow up from the tail to head direction.
+ * Thus, a chunk (= a block) can store multiple heap-tuples unless its
+ * usage for the pointer array does not cross its usage for the contents
+ * of heap-tuples.
+ *
+ * [layout of a block]
+ * +------------------------+  +0
+ * | struct ccache_chunk {  |
+ * |       :                |
+ * |       :                |
+ * |   HeapTuple tuples[];  |
+ * | };    :                |
+ * |       |                |
+ * |       v                |
+ * |                        |
+ * |                        |
+ * |       ^                |
+ * |       |                |
+ * |   buffer for           |
+ * | tuple contents         |
+ * |       |                |
+ * |       :                |
+ * +------------------------+  +(shmseg_blocksize - 1)
+ */
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	dlist_node		blocks[FLEXIBLE_ARRAY_MEMBER];
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+#define AssertIfNotShmem(addr)										\
+	Assert((addr) == NULL ||										\
+		   (((Size)(addr)) >= cs_shmseg_head->base_address &&		\
+			((Size)(addr)) < (cs_shmseg_head->base_address +		\
+						(Size)shmseg_num_blocks * (Size)shmseg_blocksize)))
+
+/*
+ * cchunk_sanity_check - for debugging
+ */
+static void
+cchunk_sanity_check(ccache_chunk *cchunk)
+{
+#ifdef USE_ASSERT_CHECKING
+	ccache_chunk   *uchunk = cchunk->upper;
+
+	Assert(!uchunk || uchunk->left == cchunk || uchunk->right == cchunk);
+	AssertIfNotShmem(cchunk->right);
+	AssertIfNotShmem(cchunk->left);
+
+	Assert(cchunk->usage <= shmseg_blocksize);
+	Assert(offsetof(ccache_chunk, tuples[cchunk->ntups]) <= cchunk->usage);
+#if 0	/* more nervous sanity checks */
+	{
+		int		i;
+		for (i=0; i < cchunk->ntups; i++)
+		{
+			HeapTuple	tuple = cchunk->tuples[i];
+
+			Assert(tuple != NULL &&
+				   (ulong)tuple >= (ulong)(&cchunk->tuples[cchunk->ntups]) &&
+				   (ulong)tuple < (ulong)cchunk + shmseg_blocksize);
+			Assert(tuple->t_data != NULL &&
+				   (ulong)tuple->t_data >= (ulong)tuple &&
+				   (ulong)tuple->t_data < (ulong)cchunk + shmseg_blocksize);
+		}
+	}
+#endif
+#endif
+}
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+		cchunk->deadspace = 0;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define TTREE_DEPTH(chunk)	\
+	((chunk) == 0 ? 0 : Max((chunk)->l_depth, (chunk)->r_depth) + 1)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		cchunk->upper = rchunk;
+		if (cchunk->right)
+			cchunk->right->upper = cchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = TTREE_DEPTH(rchunk->left);
+		rchunk->upper = upper;
+		cchunk->upper = rchunk;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = TTREE_DEPTH(rchunk);
+		}
+		else
+		{
+			Assert(upper->right == cchunk);
+			upper->right = rchunk;
+			upper->r_depth = TTREE_DEPTH(rchunk);
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(rchunk->left);
+		AssertIfNotShmem(rchunk->right);
+		AssertIfNotShmem(rchunk->upper);
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		cchunk->upper = lchunk;
+		if (cchunk->left)
+			cchunk->left->upper = cchunk;
+
+		lchunk->right = cchunk;
+		lchunk->r_depth = TTREE_DEPTH(lchunk->right);
+		lchunk->upper = upper;
+		cchunk->upper = lchunk;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			Assert(upper->left == cchunk);
+			upper->left = lchunk;
+			upper->l_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(lchunk->left);
+		AssertIfNotShmem(lchunk->right);
+		AssertIfNotShmem(lchunk->upper);
+	}
+	cchunk_sanity_check(cchunk);
+}
+
+/* it computes "actual" free space we can use right now */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups + 1]))
+/* it computes "expected" free space we can use if compaction */
+#define cchunk_availablespace(cchunk)	\
+	(cchunk_freespace(cchunk) + (cchunk)->deadspace)
+
+/*
+ * ccache_chunk_compaction
+ *
+ * It moves existing tuples to eliminate dead spaces of the chunk.
+ * Eventually, chunk's deadspace shall become zero.
+ */
+static void
+ccache_chunk_compaction(ccache_chunk *cchunk)
+{
+	ccache_chunk   *temp = alloca(shmseg_blocksize);
+	int				i;
+
+	/* setting up temporary chunk */
+	temp->upper		= cchunk->upper;
+	temp->right		= cchunk->right;
+	temp->left		= cchunk->left;
+	temp->r_depth	= cchunk->r_depth;
+	temp->l_depth	= cchunk->l_depth;
+	temp->ntups		= cchunk->ntups;
+	temp->usage		= shmseg_blocksize;
+	temp->deadspace	= 0;
+
+	for (i=0; i < cchunk->ntups; i++)
+	{
+		HeapTuple	tuple = cchunk->tuples[i];
+		HeapTuple	dest;
+		uint32		required = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+		uint32		offset;
+
+		Assert(required <= cchunk_freespace(temp));
+
+		temp->usage -= required;
+		offset = temp->usage;
+
+		/*
+		 * Even though we put the body of HeapTupleHeaderData just after
+		 * HeapTupleData, usually, here is no guarantee that both of data
+		 * structures are located on continuous memory address.
+		 * So, we explicitly adjust tuple->t_data to point the area just
+		 * behind of itself, to reference the HeapTuple on columnar-cache
+		 * as like regular ones.
+		 */
+		dest = (HeapTuple)((char *)temp + offset);
+		dest->t_data = (HeapTupleHeader)((char *)dest + HEAPTUPLESIZE);
+		memcpy(dest, tuple, HEAPTUPLESIZE);
+		memcpy(dest->t_data, tuple->t_data, tuple->t_len);
+
+		temp->tuples[i] = (HeapTuple)((char *)cchunk + offset);
+	}
+	elog(LOG, "chunk (%p) compaction: freespace %u -> %u",
+		 cchunk, cchunk_freespace(temp), cchunk_freespace(cchunk));
+	memcpy(cchunk, temp, shmseg_blocksize);
+	cchunk_sanity_check(cchunk);
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	uint32		required = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+
+	if (required > cchunk_freespace(cchunk))
+		ccache_chunk_compaction(cchunk);
+	Assert(required <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		memmove(&cchunk->tuples[i_min + 1],
+				&cchunk->tuples[i_min],
+				sizeof(HeapTuple) * (cchunk->ntups - i_min));
+	}
+	cchunk->usage -= required;
+	newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+	memcpy(newtup, tuple, HEAPTUPLESIZE);
+	newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+	memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+	cchunk->tuples[i_min] = newtup;
+	cchunk->ntups++;
+
+	cchunk_sanity_check(cchunk);
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+
+		return true;
+	}
+
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left && required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+			cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right && required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		}
+	}
+	else
+	{
+		if (required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->deadspace += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	int			i, j;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	return ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion. 
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_right_chunk
+ *
+ * It tries to find out the least 
+ *
+ *
+ *
+ */
+static bool
+ccache_merge_right_chunk(ccache_chunk *cchunk, ccache_chunk *target)
+{
+	ccache_chunk   *upper;
+	int		i;
+	long	required;
+	bool	result = false;
+
+	cchunk_sanity_check(cchunk);
+
+	while (target != NULL)
+	{
+		cchunk_sanity_check(target);
+		if (target->left)
+		{
+			target = target->left;
+			continue;
+		}
+
+		required = (shmseg_blocksize - target->usage - target->deadspace +
+					sizeof(HeapTuple) * target->ntups);
+		if (required <= cchunk_availablespace(cchunk))
+		{
+			if (required > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			for (i=0; i < target->ntups; i++)
+			{
+				HeapTuple	oldtup = target->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= MAXALIGN(HEAPTUPLESIZE + oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data = (HeapTupleHeader)((char *)newtup +
+												   HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data, oldtup->t_len);
+				cchunk->tuples[cchunk->ntups++] = newtup;
+			}
+
+			/* detach the target chunk */
+			upper = target->upper;
+			Assert(upper != NULL && (upper->right == target ||
+									 upper->left == target));
+			if (upper->right == target)
+			{
+				upper->right = target->right;
+				upper->r_depth = target->r_depth;
+			}
+			else
+			{
+				upper->left = target->right;
+				upper->l_depth = target->r_depth;
+			}
+			if (target->right)
+				target->right->upper = target->upper;
+
+			/* release it */
+			memset(target, 0xdeadbeaf, shmseg_blocksize);
+			cs_free_shmblock(target);
+
+			cchunk_sanity_check(cchunk);
+			result = true;
+		}
+		break;
+	}
+	return result;
+}
+
+static bool
+ccache_merge_left_chunk(ccache_chunk *cchunk, ccache_chunk *target)
+{
+	ccache_chunk   *upper;
+	int		i;
+	long	required;
+	bool	result = false;
+
+	cchunk_sanity_check(cchunk);
+	
+	while (target != NULL)
+	{
+		cchunk_sanity_check(target);
+		if (target->right)
+		{
+			target = target->right;
+			continue;
+		}
+
+	    required = (shmseg_blocksize - target->usage - target->deadspace +
+					sizeof(HeapTuple) * target->ntups);
+		if (required <= cchunk_availablespace(cchunk))
+		{
+			if (required > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			memmove(&cchunk->tuples[target->ntups],
+					&cchunk->tuples[0],
+					sizeof(HeapTuple) * cchunk->ntups);
+			cchunk->ntups += target->ntups;
+
+			for (i=0; i < target->ntups; i++)
+			{
+				HeapTuple	oldtup = target->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= MAXALIGN(HEAPTUPLESIZE + oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data = (HeapTupleHeader)((char *)newtup +
+												   HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data, oldtup->t_len);
+				cchunk->tuples[i] = newtup;
+			}
+			/* detach the target chunk */
+			upper = target->upper;
+			Assert(upper != NULL && (upper->right == target ||
+									 upper->left == target));
+			if (upper->right == target)
+			{
+				upper->right = target->left;
+				upper->r_depth = target->l_depth;
+			}
+			else
+			{
+				upper->left = target->left;
+				upper->l_depth = target->l_depth;
+			}
+			if (target->left)
+				target->left->upper = target->upper;
+
+			/* release it */
+			memset(target, 0xfee1dead, shmseg_blocksize);
+			cs_free_shmblock(target);
+
+			cchunk_sanity_check(cchunk);
+			result = true;
+		}
+		cchunk_sanity_check(cchunk);
+		break;
+	}
+	return result;
+}
+
+/*
+ * ccache_vacuum_page
+ *
+ * It reclaims the tuples being already vacuumed. It shall be kicked on
+ * the callback function of heap_page_prune_hook to synchronize contents
+ * of the cache with on-disk image.
+ */
+static ccache_chunk *
+ccache_vacuum_tuple(ccache_head *ccache,
+					ccache_chunk *cchunk,
+					ItemPointer ctid)
+{
+	ItemPointer	min_ctid;
+	ItemPointer	max_ctid;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+
+	if (cchunk->ntups == 0)
+		return NULL;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (cchunk->left)
+		{
+			ccache_chunk   *vchunk
+				= ccache_vacuum_tuple(ccache, cchunk->left, ctid);
+			/*
+			 * If vacuumed chunk has no right child, it means this chunk
+			 * is the greatest one in the chunks with less ctid than the
+			 * current chunk, so it may be able to be merged if enough
+			 * space is here.
+			 */
+			if (vchunk && !vchunk->right)
+				ccache_merge_left_chunk(cchunk, vchunk);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (cchunk->right)
+		{
+			ccache_chunk   *vchunk
+				= ccache_vacuum_tuple(ccache, cchunk->right, ctid);
+			/*
+			 * If vacuumed chunk has no left child, it means this chunk
+			 * is the least one in the chunks with greater ctid than the
+			 * current chunk, so it may be able to be merged if enough
+			 * space is here.
+			 */
+			if (vchunk && !vchunk->left)
+				ccache_merge_right_chunk(cchunk, vchunk);
+		}
+	}
+	else
+	{
+		bool	rebalance;
+
+		while (i_min < i_max)
+		{
+			int	i_mid = (i_min + i_max) / 2;
+
+			if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+				i_max = i_mid;
+			else
+				i_min = i_mid + 1;
+		}
+		Assert(i_min == i_max && i_min < cchunk->ntups);
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+		{
+			HeapTuple	tuple = cchunk->tuples[i_min];
+			int			length = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+			int			j;
+
+			for (j=i_min+1; j < cchunk->ntups; j++)
+				cchunk->tuples[j-1] = cchunk->tuples[j];
+			cchunk->ntups--;
+			cchunk->deadspace += length;
+		}
+		else
+			elog(LOG, "ctid (%u,%u) was not on columnar cache",
+				 ItemPointerGetBlockNumber(ctid),
+				 ItemPointerGetOffsetNumber(ctid));
+
+		rebalance = false;
+		if (cchunk->left)
+			rebalance |= ccache_merge_left_chunk(cchunk, cchunk->left);
+		if (cchunk->right)
+			rebalance |= ccache_merge_right_chunk(cchunk, cchunk->right);
+		if (rebalance)
+			ccache_rebalance_tree(ccache, cchunk);
+
+		return cchunk;
+	}
+	return NULL;
+}
+
+void
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* Note that it needs buffer being valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	Page			page = BufferGetPage(buffer);
+	OffsetNumber	maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemPointerData	ctid;
+		ItemId			itemid = PageGetItemId(page, offnum);
+
+		if (ItemIdIsNormal(itemid))
+			continue;
+
+		ItemPointerSetBlockNumber(&ctid, blknum);
+		ItemPointerSetOffsetNumber(&ctid, offnum);
+
+		ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+	}
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		/*
+		 * If no free ccache_entry is available, it construct a new one
+		 * on demand, to track the locally acquired columnar-cache.
+		 * Because get/put columnar-cache is a very frequent job, we
+		 * allocate tracking entries on the TopMemoryContext to reuse,
+		 * instead of allocation for each operation.
+		 */
+		PG_TRY();
+		{
+			entry = MemoryContextAlloc(TopMemoryContext,
+									   sizeof(ccache_entry));
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	LWLockInitialize(&new_cache->lock, 0);
+	new_cache->refcnt = 1;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			/* chose a set of columns to be cached */
+			if (old_cache)
+				attrs_used = ccache_new_attribute_set(tableoid,
+													  attrs_used,
+													  &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(&ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(&ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	void		   *address = NULL;
+	int				index;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			pg_memory_barrier();
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache_nolock(ccache);
+			else
+				dlist_move_head(&cs_ccache_hash->lru_list, &ccache->lru_chain);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+
+		index = dnode - cs_shmseg_head->blocks;
+		Assert(index >= 0 && index < shmseg_num_blocks);
+
+		memset(dnode, 0, sizeof(dlist_node));
+		address = (void *)((char *)cs_shmseg_head->base_address + 
+						   index * shmseg_blocksize);
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size		curr = (Size) address;
+	Size		base = cs_shmseg_head->base_address;
+	ulong		index;
+	dlist_node *dnode;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	dnode = &cs_shmseg_head->blocks[index];
+	Assert(dnode->prev == NULL && dnode->next == NULL);
+
+	dlist_push_head(&cs_shmseg_head->free_list, dnode);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	int		i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash
+		= ShmemInitStruct("cache_scan: hash of columnar cache",
+						  MAXALIGN(offsetof(ccache_hash,
+											slots[ccache_hash_size])),
+						  &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 (Size)shmseg_num_blocks *
+									 (Size)shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+	cs_shmseg_head->base_address
+		= MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		dlist_push_tail(&cs_shmseg_head->free_list,
+						&cs_shmseg_head->blocks[i]);
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							128,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   (Size)shmseg_num_blocks * (Size)shmseg_blocksize);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..9fea6ee
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,929 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/trigger.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/var.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+
+static bool		cache_scan_enabled;
+static double	cache_scan_width_threshold;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Try to get existing cache. If exist, we assume this cache is probably 
+	 * available on the time when this plan is executed.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		double	usage_ratio;
+		int		total_width = 0;
+		int		tuple_width = 0;
+
+		/*
+		 * Estimation of average width of cached columns - it does not make
+		 * sense to construct a new cache, if its average width is more than
+		 * the configured threshold; usually 30%.
+		 */
+		for (i=0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = tupdesc->attrs[i];
+			int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+			int		width;
+
+			if (attr->attlen > 0)
+				width = attr->attlen;
+			else
+				width = get_attavgwidth(tableoid, attr->attnum);
+
+			total_width += width;
+			if (bms_is_member(attidx, *attrs_used))
+				tuple_width += width;
+		}
+		usage_ratio = (double)tuple_width / (double)total_width;
+		if (usage_ratio > cache_scan_width_threshold / 100.0)
+			return false;
+
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		/*
+		 * If and when existing cache hold all the required attributes,
+		 * we don't need to care about width of cached columnes (because
+		 * it is obvious the width is less than threshold).
+		 */
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+/*
+ * ccache_new_attribute_set
+ *
+ * It selects attributes to be cached. In case when a part of newly required
+ * attributes are not cached, we will re-construct a new one that has union
+ * set of attributes, unless its width does not grow up larger than the
+ * configured threshold. If (required | existing) set has larger width than
+ * the threshold, we will drop attribute in (~required & existing).
+ * Usually, total width of required columns shall be less than threshold
+ * because of the checks in planner stage.
+ */
+Bitmapset *
+ccache_new_attribute_set(Oid tableoid,
+						 Bitmapset *required, Bitmapset *existing)
+{
+	Form_pg_class	relform;
+	HeapTuple		reltup;
+	Bitmapset	   *difference;
+	int			   *attrs_width;
+	int				i, anum;
+	int				total_width;
+	int				required_width;
+	int				union_width;
+	double			usage_ratio;
+
+	reltup = SearchSysCache1(RELOID, ObjectIdGetDatum(tableoid));
+	if (!HeapTupleIsValid(reltup))
+		elog(ERROR, "cache lookup failed for relation %u", tableoid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	attrs_width = palloc0(sizeof(int) * relform->relnatts);
+
+	total_width = 0;
+	required_width = 0;
+	union_width = 0;
+	for (anum = 1; anum <= relform->relnatts; anum++)
+	{
+		Form_pg_attribute	attform;
+		HeapTuple			atttup;
+
+		atttup = SearchSysCache2(ATTNUM,
+								 ObjectIdGetDatum(tableoid),
+								 Int16GetDatum(anum));
+		if (!HeapTupleIsValid(atttup))
+			elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+				 anum, tableoid);
+		attform = (Form_pg_attribute) GETSTRUCT(atttup);
+
+		if (attform->attisdropped)
+		{
+			ReleaseSysCache(atttup);
+			continue;
+		}
+
+		if (attform->attlen > 0)
+			attrs_width[anum - 1] = attform->attlen;
+		else
+			attrs_width[anum - 1] = get_attavgwidth(tableoid, anum);
+
+		total_width += attrs_width[anum - 1];
+		i = anum - FirstLowInvalidHeapAttributeNumber;
+		if (bms_is_member(i, required))
+		{
+			required_width += attrs_width[anum - 1];
+			union_width += attrs_width[anum - 1];
+		}
+		else if (bms_is_member(i, existing))
+			union_width += attrs_width[anum - 1];
+
+		ReleaseSysCache(atttup);
+	}
+	ReleaseSysCache(reltup);
+
+	/*
+	 * An easy case: if total_width is still less than the threshold,
+	 * we don't need to drop columns to cache; just propagation.
+	 */
+	usage_ratio = (double) union_width / (double) total_width;
+	if (usage_ratio <= cache_scan_width_threshold / 100.0)
+		return bms_union(required, existing);
+
+	/*
+	 * Elsewhere, we will drop a column that is not referenced with
+	 * the upcoming query, but has largest width within them, until
+	 * width of the cache is larger than the threshold.
+	 */
+	difference = bms_difference(existing, required);
+	do {
+		Bitmapset  *tempset = bms_copy(difference);
+		int			maxwidth = -1;
+		AttrNumber	maxwidth_anum = 0;
+
+		Assert(!bms_is_empty(tempset));
+		union_width = required_width;
+		while ((i = bms_first_member(tempset)) >= 0)
+		{
+			anum += FirstLowInvalidHeapAttributeNumber;
+
+			union_width += attrs_width[anum - 1];
+			if (attrs_width[anum - 1] > maxwidth)
+			{
+				maxwidth = attrs_width[anum - 1];
+				maxwidth_anum = anum;
+			}
+		}
+		pfree(tempset);
+
+		/* drop a column that has largest length */
+		Assert(maxwidth_anum > 0);
+		i = maxwidth_anum - FirstLowInvalidHeapAttributeNumber;
+		difference = bms_del_member(difference, i);
+		union_width -= maxwidth;
+
+		usage_ratio = (double) union_width / (double) total_width;
+	} while (usage_ratio > cache_scan_width_threshold / 100.0);
+
+	pfree(attrs_width);
+
+	return bms_union(required, difference);
+}
+
+/*
+ * cs_relation_has_synchronizer
+ *
+ * A table that can have columnar-cache also needs to have trigger for
+ * synchronization, to ensure the on-memory cache keeps the latest contents
+ * of the heap. It returns TRUE, if supplied relation has triggers that
+ * invokes cache_scan_synchronizer on appropriate context. Elsewhere, FALSE
+ * shall be returned.
+ */
+static bool
+cs_relation_has_synchronizer(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_synchronizer = false;
+	bool	has_on_update_synchronizer = false;
+	bool	has_on_delete_synchronizer = false;
+	bool	has_on_truncate_synchronizer = false;
+
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_synchronizer") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0)
+			{
+				int16		tgtype = trig->tgtype;
+
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_INSERT))
+					has_on_insert_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_UPDATE))
+					has_on_update_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_DELETE))
+					has_on_delete_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_STATEMENT,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_TRUNCATE))
+					has_on_truncate_synchronizer = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+
+	if (has_on_insert_synchronizer &&
+		has_on_update_synchronizer &&
+		has_on_delete_synchronizer &&
+		has_on_truncate_synchronizer)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (!cache_scan_enabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_synchronizer(rel))
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		if (cs_estimate_costs(root, baserel, rel, cpath, &attrs_used))
+		{
+			cpath->custom_name = pstrdup("cache scan");
+			cpath->custom_flags = 0;
+			cpath->custom_private
+				= list_make1(makeString(bms_to_string(attrs_used)));
+
+			add_path(baserel, &cpath->path);
+		}
+	}
+	heap_close(rel, NoLock);
+}
+
+static void
+cs_init_custom_scan_plan(PlannerInfo *root,
+						 CustomScan *cscan_plan,
+						 CustomPath *cscan_path,
+						 List *tlist,
+						 List *scan_clauses)
+{
+	List	   *quals = NIL;
+	ListCell   *lc;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* extract the supplied RestrictInfo */
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst(lc);
+		quals = lappend(quals, rinfo->clause);
+	}
+
+	/* do nothing something special pushing-down */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = quals;
+	cscan_plan->custom_private = cscan_path->custom_private;
+}
+
+typedef struct
+{
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	bool			normal_seqscan;
+	bool			with_construction;
+} cs_state;
+
+static void
+cs_begin_custom_scan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Relation		rel = node->ss.ss_currentRelation;
+	EState		   *estate = node->ss.ps.state;
+	HeapScanDesc	scandesc = NULL;
+	cs_state	   *csstate;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	/* Do nothing if EXPLAIN without ANALYZE */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	csstate = palloc0(sizeof(cs_state));
+
+	attrs_used = bms_from_string(strVal(linitial(cscan->custom_private)));
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_SHARED);
+		if (ccache->status < CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(&ccache->lock);
+			LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(&ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	node->ss.ss_currentScanDesc = scandesc;
+
+	node->custom_state = csstate;
+}
+
+/*
+ * cache_scan_needs_next
+ *
+ * We may fetch a tuple to be invisible because columnar cache stores
+ * all the living tuples, including ones updated / deleted by concurrent
+ * sessions. So, it is a job of the caller to check MVCC visibility.
+ * It decides whether we need to move the next tuple due to the visibility
+ * condition, or not. If given tuple was NULL, it is obviously a time to
+ * break searching because it means no more tuples on the cache.
+ */
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+	Relation		rel = node->ss.ss_currentRelation;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	do {
+		ccache_head	   *ccache = csstate->ccache;
+
+		if (!ccache)
+		{
+			/*
+			 * ccache == NULL implies two cases; (1) a fallback path using
+			 * regular sequential scan instead of cache-only scan (2) cache
+			 * construction got failed during scan. We need to pay attention
+			 * for the later case because it uses SnapshotAny, thus it fetches
+			 * all the tuples including invisible ones.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+			buffer = scan->rs_cbuf;
+		}
+		else if (csstate->with_construction)
+		{
+			/*
+			 * "with_construction" means the columnar cache is under
+			 * construction, so we need to fetch a tuple from heap of
+			 * the target relation and insert it into the cache.
+			 * Note that we use SnapshotAny to fetch all the tuples both
+			 * of visible and invisible ones, so it is our responsibility
+			 * to check tuple visibility according to snapshot or the
+			 * current estate.
+			 * It is same even when we fetch tuples from the cache, without
+			 * referencing heap buffer.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+
+			if (HeapTupleIsValid(tuple))
+			{
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				if (ccache_insert_tuple(ccache, rel, tuple))
+					LWLockRelease(&ccache->lock);
+				else
+				{
+					/*
+					 * If ccache_insert_tuple got failed, it usually
+					 * implies lack of shared memory, thus unable to
+					 * continue construction of the columnar cacher.
+					 * So, we put the cache under construction status;
+					 * that prevents others to grab it again, and
+					 * moves to regular sequential scan for remaining
+					 * portion.
+					 */
+					cs_put_ccache(ccache);
+					LWLockRelease(&ccache->lock);
+					csstate->ccache = NULL;
+				}
+				buffer = scan->rs_cbuf;
+			}
+			else
+			{
+				/*
+				 * Once we reached end of the relation, it implies the
+				 * columnar-cache becomes constructed.
+				 */
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				ccache->status = CCACHE_STATUS_CONSTRUCTED;
+				LWLockRelease(&ccache->lock);
+				buffer = scan->rs_cbuf;
+			}
+		}
+		else
+		{
+			LWLockAcquire(&ccache->lock, LW_SHARED);
+			tuple = ccache_find_tuple(ccache->root_chunk,
+									  &csstate->curr_ctid,
+									  estate->es_direction);
+			if (HeapTupleIsValid(tuple))
+			{
+				ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+				tuple = heap_copytuple(tuple);
+			}
+			LWLockRelease(&ccache->lock);
+			buffer = InvalidBuffer;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_scan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_scan(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+
+	/* nothing to cleanup, if EXPLAIN without ANALYZE */
+	if (!csstate)
+		return;
+
+	if (csstate->ccache)
+	{
+		ccache_head	   *ccache = csstate->ccache;
+		bool			needs_remove = false;
+
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+		if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			needs_remove = true;
+		LWLockRelease(&ccache->lock);
+
+		/*
+		 * In case when status of columnar-cache is "in-progress",
+		 * it implies the table scan didn't reach to the end of relation,
+		 * thus columnar-cache was not constructed completely.
+		 * Elsewhere, we keep the ccache that was originally created with
+		 * refcnt=1, but untrack this ccache.
+		 */
+		if (needs_remove || !csstate->with_construction)
+			cs_put_ccache(ccache);
+		else if (csstate->with_construction)
+			untrack_ccache_locally(ccache);
+	}
+	if (node->ss.ss_currentScanDesc)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+static void
+cs_rescan_custom_scan(CustomScanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+/*
+ * cache_scan_synchronizer
+ *
+ * trigger function to synchronize the columnar-cache with heap contents.
+ */
+Datum
+cache_scan_synchronizer(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		TriggerEvent	tg_event = trigdata->tg_event;
+
+		if (TRIGGER_FIRED_AFTER(tg_event) &&
+			TRIGGER_FIRED_FOR_ROW(tg_event) &&
+			TRIGGER_FIRED_BY_INSERT(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+				 TRIGGER_FIRED_BY_UPDATE(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+                 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+                 TRIGGER_FIRED_BY_DELETE(tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_STATEMENT(tg_event) &&
+				 TRIGGER_FIRED_BY_TRUNCATE(tg_event))
+		{
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache(ccache);
+		}
+		else
+			elog(ERROR, "%s: fired by unexpected context (%08x)",
+				 tg_name, tg_event);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(&ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_synchronizer);
+
+/*
+ * ccache_on_object_access
+ *
+ * It dropps an existing columnar-cache if the cached table was altered or
+ * dropped.
+ */
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	if (!ccache)
+		return;
+
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+/*
+ * ccache_on_page_prune
+ *
+ * It is a callback function when a particular heap block got vacuumed.
+ * On vacuuming, its dead space, being allocated by dead tuples, got
+ * reclaimed and tuple's location was ought to be moved.
+ * This routine also reclaims the space by dead tuples on the columnar
+ * cache according to layout changes on the heap.
+ */
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	/*
+	 * If relation already has a columnar-cache, it needs to be cleaned up
+	 * according to the heap vacuuming, also.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+		ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(&ccache->lock);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.enabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_enabled,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	DefineCustomRealVariable("cache_scan.width_threshold",
+							 "threshold percentage to be cached",
+							 NULL,
+							 &cache_scan_width_threshold,
+							 30.0,
+							 0.0,
+							 100.0,
+							 PGC_SIGHUP,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	memset(&provider, 0, sizeof(provider));
+	strncpy(provider.name, "cache scan", sizeof(provider.name));
+	provider.InitCustomScanPlan	= cs_init_custom_scan_plan;
+	provider.BeginCustomScan	= cs_begin_custom_scan;
+	provider.ExecCustomScan		= cs_exec_custom_scan;
+	provider.EndCustomScan		= cs_end_custom_scan;
+	provider.ReScanCustomScan	= cs_rescan_custom_scan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/cache-scan.sgml b/doc/src/sgml/cache-scan.sgml
new file mode 100644
index 0000000..df8d0de
--- /dev/null
+++ b/doc/src/sgml/cache-scan.sgml
@@ -0,0 +1,266 @@
+<!-- doc/src/sgml/cache-scan.sgml -->
+
+<sect1 id="cache-scan" xreflabel="cache-scan">
+ <title>cache-scan</title>
+
+ <indexterm zone="cache-scan">
+  <primary>cache-scan</primary>
+ </indexterm>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   The <filename>cache-scan</> module provides an alternative way to scan
+   relations using on-memory columnar cache, instead of usual heap scan,
+   in case when previous scan already holds contents of the table on the
+   cache.
+   Unlike buffer cache, it holds contents of the limited number of columns,
+   but not whole of the record, thus it allows to hold larger number of records
+   per same amount of RAM. Probably, this characteristic makes sense to run
+   analytic queries on a table with many columns and records.
+  </para>
+  <para>
+   Once this module gets loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   on-memory columnar cache, instead of regular heap scan.
+   It also performs as a proof-of-concept implementation that works on
+   the custom-scan API that enables to extend the core executor system.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Installation</title>
+  <para>
+   This module has to be loaded using
+   <xref linkend="guc-shared-preload-libraries"> parameter to acquired
+   a particular amount of shared memory on startup time.
+   In addition, the relation to be cached has special triggers, called
+   synchronizer, are implemented with <literal>cache_scan_synchronizer</>
+   function that synchronizes the cache contents according to the latest
+   heap on <command>INSERT</>, <command>UPDATE</>, <command>DELETE</> or
+   <command>TRUNCATE</>.
+  </para>
+  <para>
+   You can run this extension according to the following steps.
+  </para>
+  <procedure>
+   <step>
+    <para>
+     Adjust <xref linkend="guc-shared-preload-libraries"> parameter to
+     load <filename>cache_scan</> binary on startup time, then restart
+     the postmaster.
+    </para>
+   </step>
+   <step>
+    <para>
+     Run <xref linkend="sql-createextension"> to create synchronizer
+     function of <filename>cache_scan</>.
+<programlisting>
+CREATE EXTENSION cache_scan;
+</programlisting>
+    </para>
+   </step>
+   <step>
+    <para>
+     Create triggers of synchronizer on the target relation.
+<programlisting>
+CREATE TRIGGER t1_cache_row_sync
+    AFTER INSERT OR UPDATE OR DELETE ON t1 FOR ROW
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+CREATE TRIGGER t1_cache_stmt_sync
+    AFTER TRUNCATE ON t1 FOR STATEMENT
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+</programlisting>
+    </para>
+   </step>
+  </procedure>
+ </sect2>
+
+ <sect2>
+  <title>How does it works</title>
+  <para>
+   This module performs according to the usual fashion of
+   <xref linkend="custom-scan">.
+   It offers an alternative way to scan a relation if relation has synchronizer
+   triggers and width of referenced columns are less than 30% of average
+   record width.
+   Then, query optimizer will pick up the cheapest path. If the path chosen
+   is a custom-scan path managed by <filename>cache_scan</>, it runs on the
+   target relation using columnar cache.
+   On the first time running, it tries to construct relation's cache along
+   with regular sequential scan. Next time or later, it can run on
+   the columnar cache without referencing the heap.
+  </para>
+  <para>
+   You can check whether the query plan uses <filename>cache_scan</> using
+   <xref linkend="sql-explain"> command, as follows:
+<programlisting>
+postgres=# EXPLAIN (costs off) SELECT a,b FROM t1 WHERE b < pi();
+                     QUERY PLAN
+----------------------------------------------------
+ Custom Scan (cache scan) on t1
+   Filter: (b < 3.14159265358979::double precision)
+(2 rows)
+</programlisting>
+  </para>
+  <para>
+   A columnar cache, associated with a particular relation, has one or more chunks
+   that performs as node or leaf of t-tree structure.
+   The <literal>cache_scan_debuginfo()</> function can dump useful informationl;
+   properties of all the active chunks as follows.
+<programlisting>
+postgres=# SELECT * FROM cache_scan_debuginfo();
+ tableoid |   status    |     chunk      |     upper      | l_depth |    l_chunk     | r_depth |    r_chunk     | ntuples |  usage  | min_ctid  | max_ct
+id
+----------+-------------+----------------+----------------+---------+----------------+---------+----------------+---------+---------+-----------+-----------
+    16400 | constructed | 0x7f2b8ad84740 | 0x7f2b8af84740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (0,1)     | (677,15)
+    16400 | constructed | 0x7f2b8af84740 | (nil)          |       1 | 0x7f2b8ad84740 |       2 | 0x7f2b8b384740 |   29126 |  233088 | (677,16)  | (1354,30)
+    16400 | constructed | 0x7f2b8b184740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (1354,31) | (2032,2)
+    16400 | constructed | 0x7f2b8b384740 | 0x7f2b8af84740 |       1 | 0x7f2b8b184740 |       1 | 0x7f2b8b584740 |   29126 |  233088 | (2032,3)  | (2709,33)
+    16400 | constructed | 0x7f2b8b584740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |    3478 | 1874560 | (2709,34) | (2790,28)
+(5 rows)
+</programlisting>
+  </para>
+  <para>
+   All the cached tuples are indexed with <literal>ctid</> order, and each chunk has
+   an array of partial tuples with min- and max- values. Its left node is linked to
+   the chunks that have tuples with smaller <literal>ctid</>, and its right node is
+   linked to the chunks that have larger ones.
+   It enables to find out tuples in timely fashion when it needs to be invalidated
+   according to heap updates by DDL, DML or vacuuming.
+  </para>
+  <para>
+   The columnar cache are not owned by a particular session, so it retains the cache
+   unless it does not dropped or postmaster does not restart.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>GUC Parameters</title>
+  <variablelist>
+   <varlistentry id="guc-cache-scan-block_size" xreflabel="cache_scan.block_size">
+    <term><varname>cache_scan.block_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.block_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls length of the block on shared memory segment
+      for the columnar-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columnar cache on demand.
+      Too large block size damages flexibility of memory assignment, and
+      too small block size consumes much management are for each block.
+      So, we recommend to keep is as the default value; that is 2MB per block.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-num_blocks" xreflabel="cache_scan.num_blocks">
+    <term><varname>cache_scan.num_blocks</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.num_blocks</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls number of the block on shared memory segment
+      for the columnar-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columnar cache on demand.
+      Too small number of blocks damages flexibility of memory assignment
+      and may cause undesired cache dropping.
+      So, we recommend to set enough number of blocks to keep contents of
+      the target relations on memory.
+      Its default is <literal>64</literal>; probably too small for most of
+      real use cases.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-hash_size" xreflabel="cache_scan.hash_size">
+    <term><varname>cache_scan.hash_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.hash_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls width of the internal hash table slots; that
+      link every columnar cache distributed by table's oid.
+      Its default is <literal>128</>; no need to adjust it usually.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-max_cached_attnum" xreflabel="cache_scan.max_cached_attnum">
+    <term><varname>cache_scan.max_cached_attnum</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.max_cached_attnum</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the maximum attribute number we can cache on
+      the columnar cache. Because of internal data representation, a bitmap set
+      to track attributes being cached has to be fixed-length.
+      Thus, the largest attribute number needs to be fixed preliminary.
+      Its default is <literal>128</>; although most tables likely have less
+      than 100 columns.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-enabled" xreflabel="cache_scan.enabled">
+    <term><varname>cache_scan.enabled</> (<type>boolean</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.enabled</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter enables or disables the query planner's use of
+      cache-only scan, even if it is ready to run.
+      Note that this parameter does not affect to synchronizer triggers,
+      so existing columnar cache being already constructed shall be
+      synchronized, even if cache-only scan is disabled later.
+      The default is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-width_threshold" xreflabel="cache_scan.width_threshold">
+    <term><varname>cache_scan.width_threshold</> (<type>float</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.width_threshold</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the threshold to construct cache-only scan plan
+      for planner. (If the proposed scan plan cost is enough cheap, planner
+      will choose it, instead of the built-in ones.)
+      This extension tries to built a cache-only scan plan if average width of
+      the referenced columns is less than the threshold in percentage.
+      The default is <literal>30.0</> that means a cache-only scan plan shall
+      be proposed to the planner if sum of width of referenced columns is
+      less than <literal>(30.0 / 100.0) x (average width of table)</>.
+     </para>
+     <para>
+      Because columnar cache feature makes sense if width of cached columns
+      is much less than total width of table definition, it needs to control
+      table scans that references many columns that will consume unignorable
+      amount of shared memory, and eventually kills the benefit.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 336ba0c..fdc4ba3 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -107,6 +107,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &auto-explain;
  &btree-gin;
  &btree-gist;
+ &cache-scan;
  &chkpass;
  &citext;
  &cube;
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..db125c7 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -55,6 +55,20 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term><xref linkend="cache-scan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan refering the on-memory
+      columner cache instead of the heap, if the target relation already has
+      this cache being constructed already.
+      Unlike buffer cache, it holds limited number of columns that have been
+      referenced before, but not all the columns in the table definition.
+      Thus, it allows to cache much larger number of records on-memory than
+      buffer cache.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
  </para>
  <para>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d63b1a8..b75d7df 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -103,6 +103,7 @@
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
+<!ENTITY cache-scan      SYSTEM "cache-scan.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 27cbac8..1fb5f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,6 +42,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bfdadc3..9775aad 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 					TransactionId OldestXmin);
 extern int heap_page_prune(Relation relation, Buffer buffer,
#17Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#16)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Sorry, the previous one still has "columner" in the sgml files.
Please see the attached one, instead.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Tuesday, March 04, 2014 12:35 PM
To: Haribabu Kommi; Kohei KaiGai
Cc: Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

Thanks for your reviewing.

According to the discussion in the Custom-Scan API thread, I moved all the
supplemental facilities (like bms_to/from_string) into the main patch. So,
you don’t need to apply ctidscan and postgres_fdw patch for testing any
more.
(I'll submit the revised one later)

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,

+ tuple->t_data, tuple->t_len);

For a normal tuple these two addresses are different but in case of
ccache, it is a continuous memory.
Better write a comment as even if it continuous memory, it is
treated as different only.

OK, I put a source code comment as follows:

/*
* Even though we put the body of HeapTupleHeaderData just after
* HeapTupleData, usually, here is no guarantee that both of data
* structures are located on continuous memory address.
* So, we explicitly adjust tuple->t_data to point the area just
* behind of itself, to reference the HeapTuple on columnar-cache
* as like regular ones.
*/

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);

t_len is already maxaligned. No problem of using it again, The
required length calculation is differing function to function.
For example, in below part of the same function, the same t_len is
used directly. It didn't generate any problem, but it may give some

confusion.

Once I tried to trust t_len is aligned well, however, Assert() macro said
it is not a right assumption. See heap_compute_data_size(), it computes
length of tuple body and adjusts alignment according to the "attalign" value
of pg_attribute; that is not usually same with sizeof(Datum).

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)

+ ccache_merge_chunk(ccache, pchunk);

+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread
across two cache chunks. Actually one cache chunk can accommodate one
or more than heap pages. it needs some other way of handling.

I adjusted the logic to merge the chunks as follows:

Once a tuple is vacuumed from a chunk, it also checks whether it can be
merged with its child leafs. A chunk has up to two child leafs; left one
has less ctid that the parent, and right one has greater ctid. It means
a chunk without right child in the left sub-tree or a chunk without left
child in the right sub-tree are neighbor of the chunk being vacuumed. In
addition, if vacuumed chunk does not have either (or both) of children,
it can be merged with parent node.
I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down,
if vacuumed chunk has enough free space.

4. for (i=0; i < 20; i++)

Better to replace this magic number with a meaningful macro.

I rethought here is no good reason why we should construct multiple
ccache_entries at once. So, I adjusted the code to create a new ccache_entry
on demand, to track a columnar-cache being acquired.

5. "columner" is present in sgml file. correct it.

Sorry, fixed it.

6. "max_cached_attnum" value in the document saying as 128 by default
but in the code it set as 256.

Sorry, fixed it.

Also, I tried to run a benchmark of cache_scan on the case when this module
performs most effectively.

The table t1 is declared as follows:
create table t1 (a int, b float, c float, d text, e date, f char(200));
Its width is almost 256bytes/record, and contains 4milion records, thus
total table size is almost 1GB.

* 1st trial - it takes longer time than sequential scan because of columnar-
cache construction
postgres=# explain analyze select count(*) from t1 where a % 10 = 5;
QUERY PLAN
----------------------------------------------------------------------
--------------------------------------------------------------
Aggregate (cost=200791.62..200791.64 rows=1 width=0) (actual
time=63105.036..63105.037 rows=1 loops=1)
-> Custom Scan (cache scan) on t1 (cost=0.00..200741.62 rows=20000
width=0) (actual time=7.397..62832.728 rows=400000 loops=1)
Filter: ((a % 10) = 5)
Rows Removed by Filter: 3600000 Planning time: 214.506 ms Total
runtime: 64629.296 ms
(6 rows)

* 2nd trial - it takes much faster than sequential scan because of no disk
access postgres=# explain analyze select count(*) from t1 where a % 10 =
5;
QUERY PLAN
----------------------------------------------------------------------
------------------------------------------------------------
Aggregate (cost=67457.53..67457.54 rows=1 width=0) (actual
time=7833.313..7833.313 rows=1 loops=1)
-> Custom Scan (cache scan) on t1 (cost=0.00..67407.53 rows=20000
width=0) (actual time=0.154..7615.914 rows=400000 loops=1)
Filter: ((a % 10) = 5)
Rows Removed by Filter: 3600000 Planning time: 1.019 ms Total
runtime: 7833.761 ms
(6 rows)

* 3rd trial - turn off the cache_scan, so planner chooses the built-in
SeqScan.
postgres=# set cache_scan.enabled = off; SET postgres=# explain analyze
select count(*) from t1 where a % 10 = 5;
QUERY PLAN
----------------------------------------------------------------------
------------------------------------------------
Aggregate (cost=208199.08..208199.09 rows=1 width=0) (actual
time=59700.810..59700.810 rows=1 loops=1)
-> Seq Scan on t1 (cost=0.00..208149.08 rows=20000 width=0) (actual
time=715.489..59518.095 rows=400000 loops=1)
Filter: ((a % 10) = 5)
Rows Removed by Filter: 3600000 Planning time: 0.630 ms Total
runtime: 59701.104 ms
(6 rows)

The reason why such an extreme result.
I adjusted the system page cache usage to constrain disk cache hit in the
operating system level, so sequential scan is dominated by disk access
performance in this case. On the other hand, columnar cache allowed to host
whole of the records because it omits to cache unreferenced columns.

* GUCs
shared_buffers = 512MB
shared_preload_libraries = 'cache_scan'
cache_scan.num_blocks = 400

[kaigai@iwashi backend]$ free -m
total used free shared buffers
cached
Mem: 7986 7839 146 0 2
572
-/+ buffers/cache: 7265 721
Swap: 8079 265 7814

Please don't throw me stones. :-)
The primary purpose of this extension is to demonstrate usage of custom-scan
interface and heap_page_prune_hook().

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
Sent: Monday, February 24, 2014 12:42 PM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for
cache-only table scan?)

On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Hello,

The attached patch is a revised one for cache-only scan module
on top of custom-scan interface. Please check it.

Thanks for the revised patch. Please find some minor comments.

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,

+ tuple->t_data, tuple->t_len);

For a normal tuple these two addresses are different but in case of
ccache, it is a continuous memory.
Better write a comment as even if it continuous memory, it is
treated as different only.

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);

t_len is already maxaligned. No problem of using it again, The
required length calculation is differing function to function.
For example, in below part of the same function, the same t_len is
used directly. It didn't generate any problem, but it may give some

confusion.

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)

+ ccache_merge_chunk(ccache, pchunk);

+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread
across two cache chunks. Actually one cache chunk can accommodate one
or more than heap pages. it needs some other way of handling.

4. for (i=0; i < 20; i++)

Better to replace this magic number with a meaningful macro.

5. "columner" is present in sgml file. correct it.

6. "max_cached_attnum" value in the document saying as 128 by default
but in the code it set as 256.

I will start regress and performance tests. I will inform you the same
once i finish.

Regards,
Hari Babu

Fujitsu Australia

Attachments:

pgsql-v9.4-custom-scan.part-4.v9.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-4.v9.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    3 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   81 +
 contrib/cache_scan/ccache.c                        | 1553 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         |  929 ++++++++++++
 doc/src/sgml/cache-scan.sgml                       |  266 ++++
 doc/src/sgml/contrib.sgml                          |    1 +
 doc/src/sgml/custom-scan.sgml                      |   14 +
 doc/src/sgml/filelist.sgml                         |    1 +
 src/backend/access/heap/pruneheap.c                |   13 +
 src/backend/utils/time/tqual.c                     |    7 +
 src/include/access/heapam.h                        |    7 +
 14 files changed, 2925 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..c409817
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/cache_scan/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..4bd04d1
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_synchronizer()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..718a2de
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,3 @@
+DROP FUNCTION public.cache_scan_synchronizer() CASCADE;
+DROP FUNCTION public.cache_scan_debuginfo() CASCADE;
+DROP TYPE public.__cache_scan_debuginfo;
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..c9cb259
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,81 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	uint32			deadspace;	/* payload by dead tuples */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+/*
+ * Status flag of columnar cache. A ccache_head is created with status of
+ * CCACHE_STATUS_INITIALIZED, then someone picks up the cache_head from
+ * the hash table and marks it as CCACHE_STATUS_IN_PROGRESS; that means
+ * this cache is under construction by a particular scan. Once it got
+ * constructed, it shall have CCACHE_STATUS_CONSTRUCTED state.
+ */
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLock			lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] or
+								 * free_list. Elsewhere, unlinked */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern Bitmapset *ccache_new_attribute_set(Oid tableoid,
+										   Bitmapset *required,
+										   Bitmapset *existing);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+extern void untrack_ccache_locally(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern void ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_synchronizer(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..357fbfb
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1553 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/barrier.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	dlist_head		slots[FLEXIBLE_ARRAY_MEMBER];
+} ccache_hash;
+
+/*
+ * shmseg_head
+ *
+ * A data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory segment on its startup time, then it shall be split into
+ * multiple fixed-length memory blocks. All (internal) memory allocation and
+ * release shall be done by a block, to avoid memory fragmentation that
+ * eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Any elements in its blocks[] array represents the state of a particular
+ * block being associated with. If it is chained to the free_list, it means
+ * this block is not allocated yet. Elsewhere, it is allocated to someone,
+ * thus unavailable to allocate it.
+ *
+ * A block-mapped region is dealt with a ccache_chunk. This structure has
+ * some fixed-length field and variable length array to store pointers of
+ * HeapTupleData. This array will grow up from the head to tail direction
+ * according to the number of tuples being stored on the block. On the
+ * other hands, contents of heap-tuple shall be put on the tail of blocks,
+ * then its usage will grow up from the tail to head direction.
+ * Thus, a chunk (= a block) can store multiple heap-tuples unless its
+ * usage for the pointer array does not cross its usage for the contents
+ * of heap-tuples.
+ *
+ * [layout of a block]
+ * +------------------------+  +0
+ * | struct ccache_chunk {  |
+ * |       :                |
+ * |       :                |
+ * |   HeapTuple tuples[];  |
+ * | };    :                |
+ * |       |                |
+ * |       v                |
+ * |                        |
+ * |                        |
+ * |       ^                |
+ * |       |                |
+ * |   buffer for           |
+ * | tuple contents         |
+ * |       |                |
+ * |       :                |
+ * +------------------------+  +(shmseg_blocksize - 1)
+ */
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	dlist_node		blocks[FLEXIBLE_ARRAY_MEMBER];
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+#define AssertIfNotShmem(addr)										\
+	Assert((addr) == NULL ||										\
+		   (((Size)(addr)) >= cs_shmseg_head->base_address &&		\
+			((Size)(addr)) < (cs_shmseg_head->base_address +		\
+						(Size)shmseg_num_blocks * (Size)shmseg_blocksize)))
+
+/*
+ * cchunk_sanity_check - for debugging
+ */
+static void
+cchunk_sanity_check(ccache_chunk *cchunk)
+{
+#ifdef USE_ASSERT_CHECKING
+	ccache_chunk   *uchunk = cchunk->upper;
+
+	Assert(!uchunk || uchunk->left == cchunk || uchunk->right == cchunk);
+	AssertIfNotShmem(cchunk->right);
+	AssertIfNotShmem(cchunk->left);
+
+	Assert(cchunk->usage <= shmseg_blocksize);
+	Assert(offsetof(ccache_chunk, tuples[cchunk->ntups]) <= cchunk->usage);
+#if 0	/* more nervous sanity checks */
+	{
+		int		i;
+		for (i=0; i < cchunk->ntups; i++)
+		{
+			HeapTuple	tuple = cchunk->tuples[i];
+
+			Assert(tuple != NULL &&
+				   (ulong)tuple >= (ulong)(&cchunk->tuples[cchunk->ntups]) &&
+				   (ulong)tuple < (ulong)cchunk + shmseg_blocksize);
+			Assert(tuple->t_data != NULL &&
+				   (ulong)tuple->t_data >= (ulong)tuple &&
+				   (ulong)tuple->t_data < (ulong)cchunk + shmseg_blocksize);
+		}
+	}
+#endif
+#endif
+}
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+		cchunk->deadspace = 0;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define TTREE_DEPTH(chunk)	\
+	((chunk) == 0 ? 0 : Max((chunk)->l_depth, (chunk)->r_depth) + 1)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		cchunk->upper = rchunk;
+		if (cchunk->right)
+			cchunk->right->upper = cchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = TTREE_DEPTH(rchunk->left);
+		rchunk->upper = upper;
+		cchunk->upper = rchunk;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = TTREE_DEPTH(rchunk);
+		}
+		else
+		{
+			Assert(upper->right == cchunk);
+			upper->right = rchunk;
+			upper->r_depth = TTREE_DEPTH(rchunk);
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(rchunk->left);
+		AssertIfNotShmem(rchunk->right);
+		AssertIfNotShmem(rchunk->upper);
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		cchunk->upper = lchunk;
+		if (cchunk->left)
+			cchunk->left->upper = cchunk;
+
+		lchunk->right = cchunk;
+		lchunk->r_depth = TTREE_DEPTH(lchunk->right);
+		lchunk->upper = upper;
+		cchunk->upper = lchunk;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			Assert(upper->left == cchunk);
+			upper->left = lchunk;
+			upper->l_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(lchunk->left);
+		AssertIfNotShmem(lchunk->right);
+		AssertIfNotShmem(lchunk->upper);
+	}
+	cchunk_sanity_check(cchunk);
+}
+
+/* it computes "actual" free space we can use right now */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups + 1]))
+/* it computes "expected" free space we can use if compaction */
+#define cchunk_availablespace(cchunk)	\
+	(cchunk_freespace(cchunk) + (cchunk)->deadspace)
+
+/*
+ * ccache_chunk_compaction
+ *
+ * It moves existing tuples to eliminate dead spaces of the chunk.
+ * Eventually, chunk's deadspace shall become zero.
+ */
+static void
+ccache_chunk_compaction(ccache_chunk *cchunk)
+{
+	ccache_chunk   *temp = alloca(shmseg_blocksize);
+	int				i;
+
+	/* setting up temporary chunk */
+	temp->upper		= cchunk->upper;
+	temp->right		= cchunk->right;
+	temp->left		= cchunk->left;
+	temp->r_depth	= cchunk->r_depth;
+	temp->l_depth	= cchunk->l_depth;
+	temp->ntups		= cchunk->ntups;
+	temp->usage		= shmseg_blocksize;
+	temp->deadspace	= 0;
+
+	for (i=0; i < cchunk->ntups; i++)
+	{
+		HeapTuple	tuple = cchunk->tuples[i];
+		HeapTuple	dest;
+		uint32		required = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+		uint32		offset;
+
+		Assert(required <= cchunk_freespace(temp));
+
+		temp->usage -= required;
+		offset = temp->usage;
+
+		/*
+		 * Even though we put the body of HeapTupleHeaderData just after
+		 * HeapTupleData, usually, here is no guarantee that both of data
+		 * structures are located on continuous memory address.
+		 * So, we explicitly adjust tuple->t_data to point the area just
+		 * behind of itself, to reference the HeapTuple on columnar-cache
+		 * as like regular ones.
+		 */
+		dest = (HeapTuple)((char *)temp + offset);
+		dest->t_data = (HeapTupleHeader)((char *)dest + HEAPTUPLESIZE);
+		memcpy(dest, tuple, HEAPTUPLESIZE);
+		memcpy(dest->t_data, tuple->t_data, tuple->t_len);
+
+		temp->tuples[i] = (HeapTuple)((char *)cchunk + offset);
+	}
+	elog(LOG, "chunk (%p) compaction: freespace %u -> %u",
+		 cchunk, cchunk_freespace(temp), cchunk_freespace(cchunk));
+	memcpy(cchunk, temp, shmseg_blocksize);
+	cchunk_sanity_check(cchunk);
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	uint32		required = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+
+	if (required > cchunk_freespace(cchunk))
+		ccache_chunk_compaction(cchunk);
+	Assert(required <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		memmove(&cchunk->tuples[i_min + 1],
+				&cchunk->tuples[i_min],
+				sizeof(HeapTuple) * (cchunk->ntups - i_min));
+	}
+	cchunk->usage -= required;
+	newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+	memcpy(newtup, tuple, HEAPTUPLESIZE);
+	newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+	memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+	cchunk->tuples[i_min] = newtup;
+	cchunk->ntups++;
+
+	cchunk_sanity_check(cchunk);
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+
+		return true;
+	}
+
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left && required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+			cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right && required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		}
+	}
+	else
+	{
+		if (required <= cchunk_availablespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->deadspace += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	int			i, j;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	return ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion. 
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_right_chunk
+ *
+ * It tries to find out the least 
+ *
+ *
+ *
+ */
+static bool
+ccache_merge_right_chunk(ccache_chunk *cchunk, ccache_chunk *target)
+{
+	ccache_chunk   *upper;
+	int		i;
+	long	required;
+	bool	result = false;
+
+	cchunk_sanity_check(cchunk);
+
+	while (target != NULL)
+	{
+		cchunk_sanity_check(target);
+		if (target->left)
+		{
+			target = target->left;
+			continue;
+		}
+
+		required = (shmseg_blocksize - target->usage - target->deadspace +
+					sizeof(HeapTuple) * target->ntups);
+		if (required <= cchunk_availablespace(cchunk))
+		{
+			if (required > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			for (i=0; i < target->ntups; i++)
+			{
+				HeapTuple	oldtup = target->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= MAXALIGN(HEAPTUPLESIZE + oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data = (HeapTupleHeader)((char *)newtup +
+												   HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data, oldtup->t_len);
+				cchunk->tuples[cchunk->ntups++] = newtup;
+			}
+
+			/* detach the target chunk */
+			upper = target->upper;
+			Assert(upper != NULL && (upper->right == target ||
+									 upper->left == target));
+			if (upper->right == target)
+			{
+				upper->right = target->right;
+				upper->r_depth = target->r_depth;
+			}
+			else
+			{
+				upper->left = target->right;
+				upper->l_depth = target->r_depth;
+			}
+			if (target->right)
+				target->right->upper = target->upper;
+
+			/* release it */
+			memset(target, 0xdeadbeaf, shmseg_blocksize);
+			cs_free_shmblock(target);
+
+			cchunk_sanity_check(cchunk);
+			result = true;
+		}
+		break;
+	}
+	return result;
+}
+
+static bool
+ccache_merge_left_chunk(ccache_chunk *cchunk, ccache_chunk *target)
+{
+	ccache_chunk   *upper;
+	int		i;
+	long	required;
+	bool	result = false;
+
+	cchunk_sanity_check(cchunk);
+	
+	while (target != NULL)
+	{
+		cchunk_sanity_check(target);
+		if (target->right)
+		{
+			target = target->right;
+			continue;
+		}
+
+	    required = (shmseg_blocksize - target->usage - target->deadspace +
+					sizeof(HeapTuple) * target->ntups);
+		if (required <= cchunk_availablespace(cchunk))
+		{
+			if (required > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			memmove(&cchunk->tuples[target->ntups],
+					&cchunk->tuples[0],
+					sizeof(HeapTuple) * cchunk->ntups);
+			cchunk->ntups += target->ntups;
+
+			for (i=0; i < target->ntups; i++)
+			{
+				HeapTuple	oldtup = target->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= MAXALIGN(HEAPTUPLESIZE + oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data = (HeapTupleHeader)((char *)newtup +
+												   HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data, oldtup->t_len);
+				cchunk->tuples[i] = newtup;
+			}
+			/* detach the target chunk */
+			upper = target->upper;
+			Assert(upper != NULL && (upper->right == target ||
+									 upper->left == target));
+			if (upper->right == target)
+			{
+				upper->right = target->left;
+				upper->r_depth = target->l_depth;
+			}
+			else
+			{
+				upper->left = target->left;
+				upper->l_depth = target->l_depth;
+			}
+			if (target->left)
+				target->left->upper = target->upper;
+
+			/* release it */
+			memset(target, 0xfee1dead, shmseg_blocksize);
+			cs_free_shmblock(target);
+
+			cchunk_sanity_check(cchunk);
+			result = true;
+		}
+		cchunk_sanity_check(cchunk);
+		break;
+	}
+	return result;
+}
+
+/*
+ * ccache_vacuum_page
+ *
+ * It reclaims the tuples being already vacuumed. It shall be kicked on
+ * the callback function of heap_page_prune_hook to synchronize contents
+ * of the cache with on-disk image.
+ */
+static ccache_chunk *
+ccache_vacuum_tuple(ccache_head *ccache,
+					ccache_chunk *cchunk,
+					ItemPointer ctid)
+{
+	ItemPointer	min_ctid;
+	ItemPointer	max_ctid;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+
+	if (cchunk->ntups == 0)
+		return NULL;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (cchunk->left)
+		{
+			ccache_chunk   *vchunk
+				= ccache_vacuum_tuple(ccache, cchunk->left, ctid);
+			/*
+			 * If vacuumed chunk has no right child, it means this chunk
+			 * is the greatest one in the chunks with less ctid than the
+			 * current chunk, so it may be able to be merged if enough
+			 * space is here.
+			 */
+			if (vchunk && !vchunk->right)
+				ccache_merge_left_chunk(cchunk, vchunk);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (cchunk->right)
+		{
+			ccache_chunk   *vchunk
+				= ccache_vacuum_tuple(ccache, cchunk->right, ctid);
+			/*
+			 * If vacuumed chunk has no left child, it means this chunk
+			 * is the least one in the chunks with greater ctid than the
+			 * current chunk, so it may be able to be merged if enough
+			 * space is here.
+			 */
+			if (vchunk && !vchunk->left)
+				ccache_merge_right_chunk(cchunk, vchunk);
+		}
+	}
+	else
+	{
+		bool	rebalance;
+
+		while (i_min < i_max)
+		{
+			int	i_mid = (i_min + i_max) / 2;
+
+			if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+				i_max = i_mid;
+			else
+				i_min = i_mid + 1;
+		}
+		Assert(i_min == i_max && i_min < cchunk->ntups);
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+		{
+			HeapTuple	tuple = cchunk->tuples[i_min];
+			int			length = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+			int			j;
+
+			for (j=i_min+1; j < cchunk->ntups; j++)
+				cchunk->tuples[j-1] = cchunk->tuples[j];
+			cchunk->ntups--;
+			cchunk->deadspace += length;
+		}
+		else
+			elog(LOG, "ctid (%u,%u) was not on columnar cache",
+				 ItemPointerGetBlockNumber(ctid),
+				 ItemPointerGetOffsetNumber(ctid));
+
+		rebalance = false;
+		if (cchunk->left)
+			rebalance |= ccache_merge_left_chunk(cchunk, cchunk->left);
+		if (cchunk->right)
+			rebalance |= ccache_merge_right_chunk(cchunk, cchunk->right);
+		if (rebalance)
+			ccache_rebalance_tree(ccache, cchunk);
+
+		return cchunk;
+	}
+	return NULL;
+}
+
+void
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* Note that it needs buffer being valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	Page			page = BufferGetPage(buffer);
+	OffsetNumber	maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemPointerData	ctid;
+		ItemId			itemid = PageGetItemId(page, offnum);
+
+		if (ItemIdIsNormal(itemid))
+			continue;
+
+		ItemPointerSetBlockNumber(&ctid, blknum);
+		ItemPointerSetOffsetNumber(&ctid, offnum);
+
+		ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+	}
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		/*
+		 * If no free ccache_entry is available, it construct a new one
+		 * on demand, to track the locally acquired columnar-cache.
+		 * Because get/put columnar-cache is a very frequent job, we
+		 * allocate tracking entries on the TopMemoryContext to reuse,
+		 * instead of allocation for each operation.
+		 */
+		PG_TRY();
+		{
+			entry = MemoryContextAlloc(TopMemoryContext,
+									   sizeof(ccache_entry));
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	LWLockInitialize(&new_cache->lock, 0);
+	new_cache->refcnt = 1;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			/* chose a set of columns to be cached */
+			if (old_cache)
+				attrs_used = ccache_new_attribute_set(tableoid,
+													  attrs_used,
+													  &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(&ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(&ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	void		   *address = NULL;
+	int				index;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			pg_memory_barrier();
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache_nolock(ccache);
+			else
+				dlist_move_head(&cs_ccache_hash->lru_list, &ccache->lru_chain);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+
+		index = dnode - cs_shmseg_head->blocks;
+		Assert(index >= 0 && index < shmseg_num_blocks);
+
+		memset(dnode, 0, sizeof(dlist_node));
+		address = (void *)((char *)cs_shmseg_head->base_address + 
+						   index * shmseg_blocksize);
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size		curr = (Size) address;
+	Size		base = cs_shmseg_head->base_address;
+	ulong		index;
+	dlist_node *dnode;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	dnode = &cs_shmseg_head->blocks[index];
+	Assert(dnode->prev == NULL && dnode->next == NULL);
+
+	dlist_push_head(&cs_shmseg_head->free_list, dnode);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	int		i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash
+		= ShmemInitStruct("cache_scan: hash of columnar cache",
+						  MAXALIGN(offsetof(ccache_hash,
+											slots[ccache_hash_size])),
+						  &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 (Size)shmseg_num_blocks *
+									 (Size)shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+	cs_shmseg_head->base_address
+		= MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		dlist_push_tail(&cs_shmseg_head->free_list,
+						&cs_shmseg_head->blocks[i]);
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							128,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   (Size)shmseg_num_blocks * (Size)shmseg_blocksize);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..9fea6ee
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,929 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/trigger.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/var.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+
+static bool		cache_scan_enabled;
+static double	cache_scan_width_threshold;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Try to get existing cache. If exist, we assume this cache is probably 
+	 * available on the time when this plan is executed.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		double	usage_ratio;
+		int		total_width = 0;
+		int		tuple_width = 0;
+
+		/*
+		 * Estimation of average width of cached columns - it does not make
+		 * sense to construct a new cache, if its average width is more than
+		 * the configured threshold; usually 30%.
+		 */
+		for (i=0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = tupdesc->attrs[i];
+			int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+			int		width;
+
+			if (attr->attlen > 0)
+				width = attr->attlen;
+			else
+				width = get_attavgwidth(tableoid, attr->attnum);
+
+			total_width += width;
+			if (bms_is_member(attidx, *attrs_used))
+				tuple_width += width;
+		}
+		usage_ratio = (double)tuple_width / (double)total_width;
+		if (usage_ratio > cache_scan_width_threshold / 100.0)
+			return false;
+
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		/*
+		 * If and when existing cache hold all the required attributes,
+		 * we don't need to care about width of cached columnes (because
+		 * it is obvious the width is less than threshold).
+		 */
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+/*
+ * ccache_new_attribute_set
+ *
+ * It selects attributes to be cached. In case when a part of newly required
+ * attributes are not cached, we will re-construct a new one that has union
+ * set of attributes, unless its width does not grow up larger than the
+ * configured threshold. If (required | existing) set has larger width than
+ * the threshold, we will drop attribute in (~required & existing).
+ * Usually, total width of required columns shall be less than threshold
+ * because of the checks in planner stage.
+ */
+Bitmapset *
+ccache_new_attribute_set(Oid tableoid,
+						 Bitmapset *required, Bitmapset *existing)
+{
+	Form_pg_class	relform;
+	HeapTuple		reltup;
+	Bitmapset	   *difference;
+	int			   *attrs_width;
+	int				i, anum;
+	int				total_width;
+	int				required_width;
+	int				union_width;
+	double			usage_ratio;
+
+	reltup = SearchSysCache1(RELOID, ObjectIdGetDatum(tableoid));
+	if (!HeapTupleIsValid(reltup))
+		elog(ERROR, "cache lookup failed for relation %u", tableoid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	attrs_width = palloc0(sizeof(int) * relform->relnatts);
+
+	total_width = 0;
+	required_width = 0;
+	union_width = 0;
+	for (anum = 1; anum <= relform->relnatts; anum++)
+	{
+		Form_pg_attribute	attform;
+		HeapTuple			atttup;
+
+		atttup = SearchSysCache2(ATTNUM,
+								 ObjectIdGetDatum(tableoid),
+								 Int16GetDatum(anum));
+		if (!HeapTupleIsValid(atttup))
+			elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+				 anum, tableoid);
+		attform = (Form_pg_attribute) GETSTRUCT(atttup);
+
+		if (attform->attisdropped)
+		{
+			ReleaseSysCache(atttup);
+			continue;
+		}
+
+		if (attform->attlen > 0)
+			attrs_width[anum - 1] = attform->attlen;
+		else
+			attrs_width[anum - 1] = get_attavgwidth(tableoid, anum);
+
+		total_width += attrs_width[anum - 1];
+		i = anum - FirstLowInvalidHeapAttributeNumber;
+		if (bms_is_member(i, required))
+		{
+			required_width += attrs_width[anum - 1];
+			union_width += attrs_width[anum - 1];
+		}
+		else if (bms_is_member(i, existing))
+			union_width += attrs_width[anum - 1];
+
+		ReleaseSysCache(atttup);
+	}
+	ReleaseSysCache(reltup);
+
+	/*
+	 * An easy case: if total_width is still less than the threshold,
+	 * we don't need to drop columns to cache; just propagation.
+	 */
+	usage_ratio = (double) union_width / (double) total_width;
+	if (usage_ratio <= cache_scan_width_threshold / 100.0)
+		return bms_union(required, existing);
+
+	/*
+	 * Elsewhere, we will drop a column that is not referenced with
+	 * the upcoming query, but has largest width within them, until
+	 * width of the cache is larger than the threshold.
+	 */
+	difference = bms_difference(existing, required);
+	do {
+		Bitmapset  *tempset = bms_copy(difference);
+		int			maxwidth = -1;
+		AttrNumber	maxwidth_anum = 0;
+
+		Assert(!bms_is_empty(tempset));
+		union_width = required_width;
+		while ((i = bms_first_member(tempset)) >= 0)
+		{
+			anum += FirstLowInvalidHeapAttributeNumber;
+
+			union_width += attrs_width[anum - 1];
+			if (attrs_width[anum - 1] > maxwidth)
+			{
+				maxwidth = attrs_width[anum - 1];
+				maxwidth_anum = anum;
+			}
+		}
+		pfree(tempset);
+
+		/* drop a column that has largest length */
+		Assert(maxwidth_anum > 0);
+		i = maxwidth_anum - FirstLowInvalidHeapAttributeNumber;
+		difference = bms_del_member(difference, i);
+		union_width -= maxwidth;
+
+		usage_ratio = (double) union_width / (double) total_width;
+	} while (usage_ratio > cache_scan_width_threshold / 100.0);
+
+	pfree(attrs_width);
+
+	return bms_union(required, difference);
+}
+
+/*
+ * cs_relation_has_synchronizer
+ *
+ * A table that can have columnar-cache also needs to have trigger for
+ * synchronization, to ensure the on-memory cache keeps the latest contents
+ * of the heap. It returns TRUE, if supplied relation has triggers that
+ * invokes cache_scan_synchronizer on appropriate context. Elsewhere, FALSE
+ * shall be returned.
+ */
+static bool
+cs_relation_has_synchronizer(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_synchronizer = false;
+	bool	has_on_update_synchronizer = false;
+	bool	has_on_delete_synchronizer = false;
+	bool	has_on_truncate_synchronizer = false;
+
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_synchronizer") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0)
+			{
+				int16		tgtype = trig->tgtype;
+
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_INSERT))
+					has_on_insert_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_UPDATE))
+					has_on_update_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_DELETE))
+					has_on_delete_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_STATEMENT,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_TRUNCATE))
+					has_on_truncate_synchronizer = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+
+	if (has_on_insert_synchronizer &&
+		has_on_update_synchronizer &&
+		has_on_delete_synchronizer &&
+		has_on_truncate_synchronizer)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (!cache_scan_enabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_synchronizer(rel))
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		if (cs_estimate_costs(root, baserel, rel, cpath, &attrs_used))
+		{
+			cpath->custom_name = pstrdup("cache scan");
+			cpath->custom_flags = 0;
+			cpath->custom_private
+				= list_make1(makeString(bms_to_string(attrs_used)));
+
+			add_path(baserel, &cpath->path);
+		}
+	}
+	heap_close(rel, NoLock);
+}
+
+static void
+cs_init_custom_scan_plan(PlannerInfo *root,
+						 CustomScan *cscan_plan,
+						 CustomPath *cscan_path,
+						 List *tlist,
+						 List *scan_clauses)
+{
+	List	   *quals = NIL;
+	ListCell   *lc;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* extract the supplied RestrictInfo */
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst(lc);
+		quals = lappend(quals, rinfo->clause);
+	}
+
+	/* do nothing something special pushing-down */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = quals;
+	cscan_plan->custom_private = cscan_path->custom_private;
+}
+
+typedef struct
+{
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	bool			normal_seqscan;
+	bool			with_construction;
+} cs_state;
+
+static void
+cs_begin_custom_scan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Relation		rel = node->ss.ss_currentRelation;
+	EState		   *estate = node->ss.ps.state;
+	HeapScanDesc	scandesc = NULL;
+	cs_state	   *csstate;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	/* Do nothing if EXPLAIN without ANALYZE */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	csstate = palloc0(sizeof(cs_state));
+
+	attrs_used = bms_from_string(strVal(linitial(cscan->custom_private)));
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_SHARED);
+		if (ccache->status < CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(&ccache->lock);
+			LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(&ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	node->ss.ss_currentScanDesc = scandesc;
+
+	node->custom_state = csstate;
+}
+
+/*
+ * cache_scan_needs_next
+ *
+ * We may fetch a tuple to be invisible because columnar cache stores
+ * all the living tuples, including ones updated / deleted by concurrent
+ * sessions. So, it is a job of the caller to check MVCC visibility.
+ * It decides whether we need to move the next tuple due to the visibility
+ * condition, or not. If given tuple was NULL, it is obviously a time to
+ * break searching because it means no more tuples on the cache.
+ */
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+	Relation		rel = node->ss.ss_currentRelation;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	do {
+		ccache_head	   *ccache = csstate->ccache;
+
+		if (!ccache)
+		{
+			/*
+			 * ccache == NULL implies two cases; (1) a fallback path using
+			 * regular sequential scan instead of cache-only scan (2) cache
+			 * construction got failed during scan. We need to pay attention
+			 * for the later case because it uses SnapshotAny, thus it fetches
+			 * all the tuples including invisible ones.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+			buffer = scan->rs_cbuf;
+		}
+		else if (csstate->with_construction)
+		{
+			/*
+			 * "with_construction" means the columnar cache is under
+			 * construction, so we need to fetch a tuple from heap of
+			 * the target relation and insert it into the cache.
+			 * Note that we use SnapshotAny to fetch all the tuples both
+			 * of visible and invisible ones, so it is our responsibility
+			 * to check tuple visibility according to snapshot or the
+			 * current estate.
+			 * It is same even when we fetch tuples from the cache, without
+			 * referencing heap buffer.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+
+			if (HeapTupleIsValid(tuple))
+			{
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				if (ccache_insert_tuple(ccache, rel, tuple))
+					LWLockRelease(&ccache->lock);
+				else
+				{
+					/*
+					 * If ccache_insert_tuple got failed, it usually
+					 * implies lack of shared memory, thus unable to
+					 * continue construction of the columnar cacher.
+					 * So, we put the cache under construction status;
+					 * that prevents others to grab it again, and
+					 * moves to regular sequential scan for remaining
+					 * portion.
+					 */
+					cs_put_ccache(ccache);
+					LWLockRelease(&ccache->lock);
+					csstate->ccache = NULL;
+				}
+				buffer = scan->rs_cbuf;
+			}
+			else
+			{
+				/*
+				 * Once we reached end of the relation, it implies the
+				 * columnar-cache becomes constructed.
+				 */
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				ccache->status = CCACHE_STATUS_CONSTRUCTED;
+				LWLockRelease(&ccache->lock);
+				buffer = scan->rs_cbuf;
+			}
+		}
+		else
+		{
+			LWLockAcquire(&ccache->lock, LW_SHARED);
+			tuple = ccache_find_tuple(ccache->root_chunk,
+									  &csstate->curr_ctid,
+									  estate->es_direction);
+			if (HeapTupleIsValid(tuple))
+			{
+				ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+				tuple = heap_copytuple(tuple);
+			}
+			LWLockRelease(&ccache->lock);
+			buffer = InvalidBuffer;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_scan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_scan(CustomScanState *node)
+{
+	cs_state	   *csstate = node->custom_state;
+
+	/* nothing to cleanup, if EXPLAIN without ANALYZE */
+	if (!csstate)
+		return;
+
+	if (csstate->ccache)
+	{
+		ccache_head	   *ccache = csstate->ccache;
+		bool			needs_remove = false;
+
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+		if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			needs_remove = true;
+		LWLockRelease(&ccache->lock);
+
+		/*
+		 * In case when status of columnar-cache is "in-progress",
+		 * it implies the table scan didn't reach to the end of relation,
+		 * thus columnar-cache was not constructed completely.
+		 * Elsewhere, we keep the ccache that was originally created with
+		 * refcnt=1, but untrack this ccache.
+		 */
+		if (needs_remove || !csstate->with_construction)
+			cs_put_ccache(ccache);
+		else if (csstate->with_construction)
+			untrack_ccache_locally(ccache);
+	}
+	if (node->ss.ss_currentScanDesc)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+static void
+cs_rescan_custom_scan(CustomScanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+/*
+ * cache_scan_synchronizer
+ *
+ * trigger function to synchronize the columnar-cache with heap contents.
+ */
+Datum
+cache_scan_synchronizer(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		TriggerEvent	tg_event = trigdata->tg_event;
+
+		if (TRIGGER_FIRED_AFTER(tg_event) &&
+			TRIGGER_FIRED_FOR_ROW(tg_event) &&
+			TRIGGER_FIRED_BY_INSERT(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+				 TRIGGER_FIRED_BY_UPDATE(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+                 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+                 TRIGGER_FIRED_BY_DELETE(tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_STATEMENT(tg_event) &&
+				 TRIGGER_FIRED_BY_TRUNCATE(tg_event))
+		{
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache(ccache);
+		}
+		else
+			elog(ERROR, "%s: fired by unexpected context (%08x)",
+				 tg_name, tg_event);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(&ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_synchronizer);
+
+/*
+ * ccache_on_object_access
+ *
+ * It dropps an existing columnar-cache if the cached table was altered or
+ * dropped.
+ */
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	if (!ccache)
+		return;
+
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+/*
+ * ccache_on_page_prune
+ *
+ * It is a callback function when a particular heap block got vacuumed.
+ * On vacuuming, its dead space, being allocated by dead tuples, got
+ * reclaimed and tuple's location was ought to be moved.
+ * This routine also reclaims the space by dead tuples on the columnar
+ * cache according to layout changes on the heap.
+ */
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	/*
+	 * If relation already has a columnar-cache, it needs to be cleaned up
+	 * according to the heap vacuuming, also.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+		ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(&ccache->lock);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.enabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_enabled,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	DefineCustomRealVariable("cache_scan.width_threshold",
+							 "threshold percentage to be cached",
+							 NULL,
+							 &cache_scan_width_threshold,
+							 30.0,
+							 0.0,
+							 100.0,
+							 PGC_SIGHUP,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	memset(&provider, 0, sizeof(provider));
+	strncpy(provider.name, "cache scan", sizeof(provider.name));
+	provider.InitCustomScanPlan	= cs_init_custom_scan_plan;
+	provider.BeginCustomScan	= cs_begin_custom_scan;
+	provider.ExecCustomScan		= cs_exec_custom_scan;
+	provider.EndCustomScan		= cs_end_custom_scan;
+	provider.ReScanCustomScan	= cs_rescan_custom_scan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/cache-scan.sgml b/doc/src/sgml/cache-scan.sgml
new file mode 100644
index 0000000..df8d0de
--- /dev/null
+++ b/doc/src/sgml/cache-scan.sgml
@@ -0,0 +1,266 @@
+<!-- doc/src/sgml/cache-scan.sgml -->
+
+<sect1 id="cache-scan" xreflabel="cache-scan">
+ <title>cache-scan</title>
+
+ <indexterm zone="cache-scan">
+  <primary>cache-scan</primary>
+ </indexterm>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   The <filename>cache-scan</> module provides an alternative way to scan
+   relations using on-memory columnar cache, instead of usual heap scan,
+   in case when previous scan already holds contents of the table on the
+   cache.
+   Unlike buffer cache, it holds contents of the limited number of columns,
+   but not whole of the record, thus it allows to hold larger number of records
+   per same amount of RAM. Probably, this characteristic makes sense to run
+   analytic queries on a table with many columns and records.
+  </para>
+  <para>
+   Once this module gets loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   on-memory columnar cache, instead of regular heap scan.
+   It also performs as a proof-of-concept implementation that works on
+   the custom-scan API that enables to extend the core executor system.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Installation</title>
+  <para>
+   This module has to be loaded using
+   <xref linkend="guc-shared-preload-libraries"> parameter to acquired
+   a particular amount of shared memory on startup time.
+   In addition, the relation to be cached has special triggers, called
+   synchronizer, are implemented with <literal>cache_scan_synchronizer</>
+   function that synchronizes the cache contents according to the latest
+   heap on <command>INSERT</>, <command>UPDATE</>, <command>DELETE</> or
+   <command>TRUNCATE</>.
+  </para>
+  <para>
+   You can run this extension according to the following steps.
+  </para>
+  <procedure>
+   <step>
+    <para>
+     Adjust <xref linkend="guc-shared-preload-libraries"> parameter to
+     load <filename>cache_scan</> binary on startup time, then restart
+     the postmaster.
+    </para>
+   </step>
+   <step>
+    <para>
+     Run <xref linkend="sql-createextension"> to create synchronizer
+     function of <filename>cache_scan</>.
+<programlisting>
+CREATE EXTENSION cache_scan;
+</programlisting>
+    </para>
+   </step>
+   <step>
+    <para>
+     Create triggers of synchronizer on the target relation.
+<programlisting>
+CREATE TRIGGER t1_cache_row_sync
+    AFTER INSERT OR UPDATE OR DELETE ON t1 FOR ROW
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+CREATE TRIGGER t1_cache_stmt_sync
+    AFTER TRUNCATE ON t1 FOR STATEMENT
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+</programlisting>
+    </para>
+   </step>
+  </procedure>
+ </sect2>
+
+ <sect2>
+  <title>How does it works</title>
+  <para>
+   This module performs according to the usual fashion of
+   <xref linkend="custom-scan">.
+   It offers an alternative way to scan a relation if relation has synchronizer
+   triggers and width of referenced columns are less than 30% of average
+   record width.
+   Then, query optimizer will pick up the cheapest path. If the path chosen
+   is a custom-scan path managed by <filename>cache_scan</>, it runs on the
+   target relation using columnar cache.
+   On the first time running, it tries to construct relation's cache along
+   with regular sequential scan. Next time or later, it can run on
+   the columnar cache without referencing the heap.
+  </para>
+  <para>
+   You can check whether the query plan uses <filename>cache_scan</> using
+   <xref linkend="sql-explain"> command, as follows:
+<programlisting>
+postgres=# EXPLAIN (costs off) SELECT a,b FROM t1 WHERE b < pi();
+                     QUERY PLAN
+----------------------------------------------------
+ Custom Scan (cache scan) on t1
+   Filter: (b < 3.14159265358979::double precision)
+(2 rows)
+</programlisting>
+  </para>
+  <para>
+   A columnar cache, associated with a particular relation, has one or more chunks
+   that performs as node or leaf of t-tree structure.
+   The <literal>cache_scan_debuginfo()</> function can dump useful informationl;
+   properties of all the active chunks as follows.
+<programlisting>
+postgres=# SELECT * FROM cache_scan_debuginfo();
+ tableoid |   status    |     chunk      |     upper      | l_depth |    l_chunk     | r_depth |    r_chunk     | ntuples |  usage  | min_ctid  | max_ct
+id
+----------+-------------+----------------+----------------+---------+----------------+---------+----------------+---------+---------+-----------+-----------
+    16400 | constructed | 0x7f2b8ad84740 | 0x7f2b8af84740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (0,1)     | (677,15)
+    16400 | constructed | 0x7f2b8af84740 | (nil)          |       1 | 0x7f2b8ad84740 |       2 | 0x7f2b8b384740 |   29126 |  233088 | (677,16)  | (1354,30)
+    16400 | constructed | 0x7f2b8b184740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (1354,31) | (2032,2)
+    16400 | constructed | 0x7f2b8b384740 | 0x7f2b8af84740 |       1 | 0x7f2b8b184740 |       1 | 0x7f2b8b584740 |   29126 |  233088 | (2032,3)  | (2709,33)
+    16400 | constructed | 0x7f2b8b584740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |    3478 | 1874560 | (2709,34) | (2790,28)
+(5 rows)
+</programlisting>
+  </para>
+  <para>
+   All the cached tuples are indexed with <literal>ctid</> order, and each chunk has
+   an array of partial tuples with min- and max- values. Its left node is linked to
+   the chunks that have tuples with smaller <literal>ctid</>, and its right node is
+   linked to the chunks that have larger ones.
+   It enables to find out tuples in timely fashion when it needs to be invalidated
+   according to heap updates by DDL, DML or vacuuming.
+  </para>
+  <para>
+   The columnar cache are not owned by a particular session, so it retains the cache
+   unless it does not dropped or postmaster does not restart.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>GUC Parameters</title>
+  <variablelist>
+   <varlistentry id="guc-cache-scan-block_size" xreflabel="cache_scan.block_size">
+    <term><varname>cache_scan.block_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.block_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls length of the block on shared memory segment
+      for the columnar-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columnar cache on demand.
+      Too large block size damages flexibility of memory assignment, and
+      too small block size consumes much management are for each block.
+      So, we recommend to keep is as the default value; that is 2MB per block.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-num_blocks" xreflabel="cache_scan.num_blocks">
+    <term><varname>cache_scan.num_blocks</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.num_blocks</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls number of the block on shared memory segment
+      for the columnar-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columnar cache on demand.
+      Too small number of blocks damages flexibility of memory assignment
+      and may cause undesired cache dropping.
+      So, we recommend to set enough number of blocks to keep contents of
+      the target relations on memory.
+      Its default is <literal>64</literal>; probably too small for most of
+      real use cases.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-hash_size" xreflabel="cache_scan.hash_size">
+    <term><varname>cache_scan.hash_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.hash_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls width of the internal hash table slots; that
+      link every columnar cache distributed by table's oid.
+      Its default is <literal>128</>; no need to adjust it usually.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-max_cached_attnum" xreflabel="cache_scan.max_cached_attnum">
+    <term><varname>cache_scan.max_cached_attnum</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.max_cached_attnum</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the maximum attribute number we can cache on
+      the columnar cache. Because of internal data representation, a bitmap set
+      to track attributes being cached has to be fixed-length.
+      Thus, the largest attribute number needs to be fixed preliminary.
+      Its default is <literal>128</>; although most tables likely have less
+      than 100 columns.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-enabled" xreflabel="cache_scan.enabled">
+    <term><varname>cache_scan.enabled</> (<type>boolean</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.enabled</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter enables or disables the query planner's use of
+      cache-only scan, even if it is ready to run.
+      Note that this parameter does not affect to synchronizer triggers,
+      so existing columnar cache being already constructed shall be
+      synchronized, even if cache-only scan is disabled later.
+      The default is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-width_threshold" xreflabel="cache_scan.width_threshold">
+    <term><varname>cache_scan.width_threshold</> (<type>float</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.width_threshold</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the threshold to construct cache-only scan plan
+      for planner. (If the proposed scan plan cost is enough cheap, planner
+      will choose it, instead of the built-in ones.)
+      This extension tries to built a cache-only scan plan if average width of
+      the referenced columns is less than the threshold in percentage.
+      The default is <literal>30.0</> that means a cache-only scan plan shall
+      be proposed to the planner if sum of width of referenced columns is
+      less than <literal>(30.0 / 100.0) x (average width of table)</>.
+     </para>
+     <para>
+      Because columnar cache feature makes sense if width of cached columns
+      is much less than total width of table definition, it needs to control
+      table scans that references many columns that will consume unignorable
+      amount of shared memory, and eventually kills the benefit.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 336ba0c..fdc4ba3 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -107,6 +107,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &auto-explain;
  &btree-gin;
  &btree-gist;
+ &cache-scan;
  &chkpass;
  &citext;
  &cube;
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..71070fe 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -55,6 +55,20 @@
      </para>
     </listitem>
    </varlistentry>
+   <varlistentry>
+    <term><xref linkend="cache-scan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan refering the on-memory
+      columnar cache instead of the heap, if the target relation already has
+      this cache being constructed already.
+      Unlike buffer cache, it holds limited number of columns that have been
+      referenced before, but not all the columns in the table definition.
+      Thus, it allows to cache much larger number of records on-memory than
+      buffer cache.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
  </para>
  <para>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d63b1a8..b75d7df 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -103,6 +103,7 @@
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
+<!ENTITY cache-scan      SYSTEM "cache-scan.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 27cbac8..1fb5f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,6 +42,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bfdadc3..9775aad 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 					TransactionId OldestXmin);
 extern int heap_page_prune(Relation relation, Buffer buffer,
#18Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kouhei Kaigai (#17)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)

+ ccache_merge_chunk(ccache, pchunk);

+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread
across two cache chunks. Actually one cache chunk can accommodate one
or more than heap pages. it needs some other way of handling.

I adjusted the logic to merge the chunks as follows:

Once a tuple is vacuumed from a chunk, it also checks whether it can be
merged with its child leafs. A chunk has up to two child leafs; left one
has less ctid that the parent, and right one has greater ctid. It means
a chunk without right child in the left sub-tree or a chunk without left
child in the right sub-tree are neighbor of the chunk being vacuumed. In
addition, if vacuumed chunk does not have either (or both) of children,
it can be merged with parent node.
I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down,
if vacuumed chunk has enough free space.

Patch looks good.

Regarding merging of the nodes, instead of checking whether merge is
possible or not for every tuple which is vacuumed,
can we put some kind of threshold as whenever the node is 50% free then try
to merge it from leaf nodes until 90% is full.
The rest of the 10% will be left for the next inserts on the node. what do
you say?

I will update you later regarding the performance test results.

Regards,
Hari Babu
Fujitsu Australia

#19Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Haribabu Kommi (#18)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

2014-03-06 18:17 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:

On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)

+ ccache_merge_chunk(ccache, pchunk);

+ pchunk = cchunk;

The merge_chunk is called only when the heap tuples are spread
across two cache chunks. Actually one cache chunk can accommodate one
or more than heap pages. it needs some other way of handling.

I adjusted the logic to merge the chunks as follows:

Once a tuple is vacuumed from a chunk, it also checks whether it can be
merged with its child leafs. A chunk has up to two child leafs; left one
has less ctid that the parent, and right one has greater ctid. It means
a chunk without right child in the left sub-tree or a chunk without left
child in the right sub-tree are neighbor of the chunk being vacuumed. In
addition, if vacuumed chunk does not have either (or both) of children,
it can be merged with parent node.
I modified ccache_vacuum_tuple() to merge chunks during t-tree
walk-down,
if vacuumed chunk has enough free space.

Patch looks good.

Thanks for your volunteering.

Regarding merging of the nodes, instead of checking whether merge is
possible or not for every tuple which is vacuumed,
can we put some kind of threshold as whenever the node is 50% free then try
to merge it from leaf nodes until 90% is full.
The rest of the 10% will be left for the next inserts on the node. what do
you say?

Hmm. Indeed, it makes sense. How about an idea that kicks chunk merging
if "expected" free space of merged chunk is less than 50%?
If threshold depends on the (expected) usage of merged chunk, it can avoid
over-merging.

I will update you later regarding the performance test results.

Thhanks,

Also, I'll rebase the patch on top of the new custom-scan interfaces
according to Tom's suggestion, even though main logic of cache_scan
is not changed.

Best regards,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kohei KaiGai (#19)
1 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2014-03-06 18:17 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:

I will update you later regarding the performance test results.

I ran the performance test on the cache scan patch and below are the readings.

Configuration:

Shared_buffers - 512MB
cache_scan.num_blocks - 600
checkpoint_segments - 255

Machine:
OS - centos - 6.4
CPU - 4 core 2.5 GHZ
Memory - 4GB

Head patched Diff
Select - 500K 772ms 2659ms -200%
Insert - 400K 3429ms 1948ms 43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms -92%
update - 200K 3915ms 5899ms -50%

This patch shown how the custom scan can be used very well but coming
to patch as It is having
some performance problem which needs to be investigated.

I attached the test script file used for the performance test.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

cache_scan_test.txttext/plain; charset=US-ASCII; name=cache_scan_test.txtDownload
#21Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#20)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Thanks for your efforts!

Head patched
Diff
Select - 500K 772ms 2659ms -200%
Insert - 400K 3429ms 1948ms 43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms -92%
update - 200K 3915ms 5899ms -50%

This patch shown how the custom scan can be used very well but coming to
patch as It is having some performance problem which needs to be
investigated.

I attached the test script file used for the performance test.

First of all, it seems to me your test case has too small data set that
allows to hold all the data in memory - briefly 500K of 200bytes record
will consume about 100MB. Your configuration allocates 512MB of
shared_buffer, and about 3GB of OS-level page cache is available.
(Note that Linux uses free memory as disk cache adaptively.)

This cache is designed to hide latency of disk accesses, so this test
case does not fit its intention.
(Also, the primary purpose of this module is a demonstration for
heap_page_prune_hook to hook vacuuming, so simple code was preferred
than complicated implementation but better performance.)

I could reproduce the overall trend, no cache scan is faster than
cached scan if buffer is in memory. Probably, it comes from the
cost to walk down T-tree index using ctid per reference.
Performance penalty around UPDATE and DELETE likely come from
trigger invocation per row.
I could observe performance gain on INSERT a little bit.
It's strange for me, also. :-(

On the other hand, the discussion around custom-plan interface
effects this module because it uses this API as foundation.
Please wait for a few days to rebase the cache_scan module onto
the newer custom-plan interface; that I submitted just a moment
before.

Also, is it really necessary to tune the performance stuff in this
example module of the heap_page_prune_hook?
Even though I have a few ideas to improve the cache performance,
like insertion of multiple rows at once or local chunk copy instead
of t-tree walk down, I'm not sure whether it is productive in the
current v9.4 timeframe. ;-(

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Haribabu Kommi
Sent: Wednesday, March 12, 2014 1:14 PM
To: Kohei KaiGai
Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2014-03-06 18:17 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:

I will update you later regarding the performance test results.

I ran the performance test on the cache scan patch and below are the readings.

Configuration:

Shared_buffers - 512MB
cache_scan.num_blocks - 600
checkpoint_segments - 255

Machine:
OS - centos - 6.4
CPU - 4 core 2.5 GHZ
Memory - 4GB

Head patched
Diff
Select - 500K 772ms 2659ms -200%
Insert - 400K 3429ms 1948ms 43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms -92%
update - 200K 3915ms 5899ms -50%

This patch shown how the custom scan can be used very well but coming to
patch as It is having some performance problem which needs to be
investigated.

I attached the test script file used for the performance test.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kouhei Kaigai (#21)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Thanks for your efforts!

Head patched
Diff
Select - 500K 772ms 2659ms -200%
Insert - 400K 3429ms 1948ms 43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms -92%
update - 200K 3915ms 5899ms -50%

This patch shown how the custom scan can be used very well but coming to
patch as It is having some performance problem which needs to be
investigated.

I attached the test script file used for the performance test.

First of all, it seems to me your test case has too small data set that
allows to hold all the data in memory - briefly 500K of 200bytes record
will consume about 100MB. Your configuration allocates 512MB of
shared_buffer, and about 3GB of OS-level page cache is available.
(Note that Linux uses free memory as disk cache adaptively.)

Thanks for the information and a small correction. The Total number of
records are 5 million.
The select operation is selecting 500K records. The total table size
is around 1GB.

Once I get your new patch re-based on the custom scan patch, I will
test the performance
again by increasing my database size more than the RAM size. And also
I will make sure
that memory available for disk cache is less.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#22)
3 attachment(s)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Hello,

The attached patches are revised ones according to the latest custom-plan
interface patch (v11).
The cache-scan module was re-implemented on the newer interface, and also
I noticed the extension does not handle the tuples being redirected correctly,
So, I revised the logic in ccache_vacuum_page() totally. It now becomes to
synchronize the cached tuples per page, not per tuple, basic and tries to
merge t-tree chunks per page basis also.

Also, I split the patches again because *demonstration* part is much larger
than the patches to the core backend. It will help reviewing.
* pgsql-v9.4-vacuum_page_hook.v11.patch
-> It adds a hook for each page being vacuumed; that needs to synchronize
the status of in-memory cache managed by extension.
* pgsql-v9.4-mvcc_allows_cache.v11.patch
-> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
on the in-memory cache, not on the heap.
* pgsql-v9.4-example-cache_scan.v11.patch
-> It demonstrates the usage of above two patches. It allows to scan
a relation without storage access if possible.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Haribabu Kommi
Sent: Wednesday, March 12, 2014 3:43 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Kohei KaiGai; Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Thanks for your efforts!

Head patched
Diff
Select - 500K 772ms 2659ms -200%
Insert - 400K 3429ms 1948ms 43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms -92%
update - 200K 3915ms 5899ms -50%

This patch shown how the custom scan can be used very well but coming
to patch as It is having some performance problem which needs to be
investigated.

I attached the test script file used for the performance test.

First of all, it seems to me your test case has too small data set
that allows to hold all the data in memory - briefly 500K of 200bytes
record will consume about 100MB. Your configuration allocates 512MB of
shared_buffer, and about 3GB of OS-level page cache is available.
(Note that Linux uses free memory as disk cache adaptively.)

Thanks for the information and a small correction. The Total number of
records are 5 million.
The select operation is selecting 500K records. The total table size is
around 1GB.

Once I get your new patch re-based on the custom scan patch, I will test
the performance again by increasing my database size more than the RAM size.
And also I will make sure that memory available for disk cache is less.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

pgsql-v9.4-example-cache_scan.v11.patchapplication/octet-stream; name=pgsql-v9.4-example-cache_scan.v11.patchDownload
 contrib/cache_scan/Makefile                        |   19 +
 contrib/cache_scan/cache_scan--1.0.sql             |   26 +
 contrib/cache_scan/cache_scan--unpackaged--1.0.sql |    3 +
 contrib/cache_scan/cache_scan.control              |    5 +
 contrib/cache_scan/cache_scan.h                    |   81 +
 contrib/cache_scan/ccache.c                        | 1576 ++++++++++++++++++++
 contrib/cache_scan/cscan.c                         | 1163 +++++++++++++++
 doc/src/sgml/cache-scan.sgml                       |  266 ++++
 doc/src/sgml/contrib.sgml                          |    1 +
 doc/src/sgml/filelist.sgml                         |    1 +
 10 files changed, 3141 insertions(+)

diff --git a/contrib/cache_scan/Makefile b/contrib/cache_scan/Makefile
new file mode 100644
index 0000000..c409817
--- /dev/null
+++ b/contrib/cache_scan/Makefile
@@ -0,0 +1,19 @@
+# contrib/cache_scan/Makefile
+
+MODULE_big = cache_scan
+OBJS = cscan.o ccache.o
+
+EXTENSION = cache_scan
+DATA = cache_scan--1.0.sql cache_scan--unpackaged--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/cache_scan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/contrib/cache_scan/cache_scan--1.0.sql b/contrib/cache_scan/cache_scan--1.0.sql
new file mode 100644
index 0000000..4bd04d1
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--1.0.sql
@@ -0,0 +1,26 @@
+CREATE FUNCTION public.cache_scan_synchronizer()
+RETURNS trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C VOLATILE STRICT;
+
+CREATE TYPE public.__cache_scan_debuginfo AS
+(
+	tableoid	oid,
+	status		text,
+	chunk		text,
+	upper		text,
+	l_depth		int4,
+	l_chunk		text,
+	r_depth		int4,
+	r_chunk		text,
+	ntuples		int4,
+	usage		int4,
+	min_ctid	tid,
+	max_ctid	tid
+);
+CREATE FUNCTION public.cache_scan_debuginfo()
+  RETURNS SETOF public.__cache_scan_debuginfo
+  AS 'MODULE_PATHNAME'
+  LANGUAGE C STRICT;
+
+
diff --git a/contrib/cache_scan/cache_scan--unpackaged--1.0.sql b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
new file mode 100644
index 0000000..718a2de
--- /dev/null
+++ b/contrib/cache_scan/cache_scan--unpackaged--1.0.sql
@@ -0,0 +1,3 @@
+DROP FUNCTION public.cache_scan_synchronizer() CASCADE;
+DROP FUNCTION public.cache_scan_debuginfo() CASCADE;
+DROP TYPE public.__cache_scan_debuginfo;
diff --git a/contrib/cache_scan/cache_scan.control b/contrib/cache_scan/cache_scan.control
new file mode 100644
index 0000000..77946da
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.control
@@ -0,0 +1,5 @@
+# cache_scan extension
+comment = 'custom scan provider for cache-only scan'
+default_version = '1.0'
+module_pathname = '$libdir/cache_scan'
+relocatable = false
diff --git a/contrib/cache_scan/cache_scan.h b/contrib/cache_scan/cache_scan.h
new file mode 100644
index 0000000..aac5fa6
--- /dev/null
+++ b/contrib/cache_scan/cache_scan.h
@@ -0,0 +1,81 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cache_scan.h
+ *
+ * Definitions for the cache_scan extension
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef CACHE_SCAN_H
+#define CACHE_SCAN_H
+#include "access/htup_details.h"
+#include "lib/ilist.h"
+#include "nodes/bitmapset.h"
+#include "storage/lwlock.h"
+#include "utils/rel.h"
+
+typedef struct ccache_chunk {
+	struct ccache_chunk	*upper;	/* link to the upper node */
+	struct ccache_chunk *right;	/* link to the greaternode, if exist */
+	struct ccache_chunk *left;	/* link to the less node, if exist */
+	int				r_depth;	/* max depth in right branch */
+	int				l_depth;	/* max depth in left branch */
+	uint32			ntups;		/* number of tuples being cached */
+	uint32			usage;		/* usage counter of this chunk */
+	uint32			deadspace;	/* payload by dead tuples */
+	HeapTuple		tuples[FLEXIBLE_ARRAY_MEMBER];
+} ccache_chunk;
+
+/*
+ * Status flag of columnar cache. A ccache_head is created with status of
+ * CCACHE_STATUS_INITIALIZED, then someone picks up the cache_head from
+ * the hash table and marks it as CCACHE_STATUS_IN_PROGRESS; that means
+ * this cache is under construction by a particular scan. Once it got
+ * constructed, it shall have CCACHE_STATUS_CONSTRUCTED state.
+ */
+#define CCACHE_STATUS_INITIALIZED	1
+#define CCACHE_STATUS_IN_PROGRESS	2
+#define CCACHE_STATUS_CONSTRUCTED	3
+
+typedef struct {
+	LWLock			lock;	/* used to protect ttree links */
+	volatile int	refcnt;
+	int				status;
+
+	dlist_node		hash_chain;	/* linked to ccache_hash->slots[] or
+								 * free_list. Elsewhere, unlinked */
+	dlist_node		lru_chain;	/* linked to ccache_hash->lru_list */
+
+	Oid				tableoid;
+	ccache_chunk   *root_chunk;
+	Bitmapset		attrs_used;	/* !Bitmapset is variable length! */
+} ccache_head;
+
+extern int ccache_max_attribute_number(void);
+extern Bitmapset *ccache_new_attribute_set(Oid tableoid,
+										   Bitmapset *required,
+										   Bitmapset *existing);
+extern ccache_head *cs_get_ccache(Oid tableoid, Bitmapset *attrs_used,
+								  bool create_on_demand);
+extern void cs_put_ccache(ccache_head *ccache);
+extern void untrack_ccache_locally(ccache_head *ccache);
+
+extern bool ccache_insert_tuple(ccache_head *ccache,
+								Relation rel, HeapTuple tuple);
+extern bool ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup);
+
+extern bool ccache_vacuum_page(ccache_head *ccache, Buffer buffer);
+
+extern HeapTuple ccache_find_tuple(ccache_chunk *cchunk,
+								   ItemPointer ctid,
+								   ScanDirection direction);
+extern void ccache_init(void);
+
+extern Datum cache_scan_synchronizer(PG_FUNCTION_ARGS);
+extern Datum cache_scan_debuginfo(PG_FUNCTION_ARGS);
+
+extern void	_PG_init(void);
+
+#endif /* CACHE_SCAN_H */
diff --git a/contrib/cache_scan/ccache.c b/contrib/cache_scan/ccache.c
new file mode 100644
index 0000000..30d6631
--- /dev/null
+++ b/contrib/cache_scan/ccache.c
@@ -0,0 +1,1576 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/ccache.c
+ *
+ * Routines for columns-culled cache implementation
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "storage/barrier.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "cache_scan.h"
+
+/*
+ * Hash table to manage all the ccache_head
+ */
+typedef struct {
+	slock_t			lock;		/* lock of the hash table */
+	dlist_head		lru_list;	/* list of recently used cache */
+	dlist_head		free_list;	/* list of free ccache_head */
+	dlist_head		slots[FLEXIBLE_ARRAY_MEMBER];
+} ccache_hash;
+
+/*
+ * shmseg_head
+ *
+ * A data structure to manage blocks on the shared memory segment.
+ * This extension acquires (shmseg_blocksize) x (shmseg_num_blocks) bytes of
+ * shared memory segment on its startup time, then it shall be split into
+ * multiple fixed-length memory blocks. All (internal) memory allocation and
+ * release shall be done by a block, to avoid memory fragmentation that
+ * eventually makes implementation complicated.
+ *
+ * The shmseg_head has a spinlock and global free_list to link free blocks.
+ * Any elements in its blocks[] array represents the state of a particular
+ * block being associated with. If it is chained to the free_list, it means
+ * this block is not allocated yet. Elsewhere, it is allocated to someone,
+ * thus unavailable to allocate it.
+ *
+ * A block-mapped region is dealt with a ccache_chunk. This structure has
+ * some fixed-length field and variable length array to store pointers of
+ * HeapTupleData. This array will grow up from the head to tail direction
+ * according to the number of tuples being stored on the block. On the
+ * other hands, contents of heap-tuple shall be put on the tail of blocks,
+ * then its usage will grow up from the tail to head direction.
+ * Thus, a chunk (= a block) can store multiple heap-tuples unless its
+ * usage for the pointer array does not cross its usage for the contents
+ * of heap-tuples.
+ *
+ * [layout of a block]
+ * +------------------------+  +0
+ * | struct ccache_chunk {  |
+ * |       :                |
+ * |       :                |
+ * |   HeapTuple tuples[];  |
+ * | };    :                |
+ * |       |                |
+ * |       v                |
+ * |                        |
+ * |                        |
+ * |       ^                |
+ * |       |                |
+ * |   buffer for           |
+ * | tuple contents         |
+ * |       |                |
+ * |       :                |
+ * +------------------------+  +(shmseg_blocksize - 1)
+ */
+typedef struct {
+	slock_t			lock;
+	dlist_head		free_list;
+	Size			base_address;
+	dlist_node		blocks[FLEXIBLE_ARRAY_MEMBER];
+} shmseg_head;
+
+/*
+ * ccache_entry is used to track ccache_head being acquired by this backend.
+ */
+typedef struct {
+	dlist_node		chain;
+	ResourceOwner	owner;
+	ccache_head	   *ccache;
+} ccache_entry;
+
+static dlist_head	ccache_local_list;
+static dlist_head	ccache_free_list;
+
+/* Static variables */
+static shmem_startup_hook_type  shmem_startup_next = NULL;
+
+static ccache_hash *cs_ccache_hash = NULL;
+static shmseg_head *cs_shmseg_head = NULL;
+
+/* GUC variables */
+static int  ccache_hash_size;
+static int  shmseg_blocksize;
+static int  shmseg_num_blocks;
+static int  max_cached_attnum;
+
+/* Static functions */
+static void *cs_alloc_shmblock(void);
+static void	 cs_free_shmblock(void *address);
+
+#define AssertIfNotShmem(addr)										\
+	Assert((addr) == NULL ||										\
+		   (((Size)(addr)) >= cs_shmseg_head->base_address &&		\
+			((Size)(addr)) < (cs_shmseg_head->base_address +		\
+						(Size)shmseg_num_blocks * (Size)shmseg_blocksize)))
+
+/*
+ * cchunk_sanity_check - for debugging
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+cchunk_sanity_check(ccache_chunk *cchunk)
+{
+	ccache_chunk   *uchunk = cchunk->upper;
+
+	Assert(!uchunk || uchunk->left == cchunk || uchunk->right == cchunk);
+	AssertIfNotShmem(cchunk->right);
+	AssertIfNotShmem(cchunk->left);
+
+	Assert(cchunk->usage <= shmseg_blocksize);
+	Assert(offsetof(ccache_chunk, tuples[cchunk->ntups]) <= cchunk->usage);
+#if NOT_USED	/* more nervous sanity checks */
+	{
+		int		i;
+		for (i=0; i < cchunk->ntups; i++)
+		{
+			HeapTuple	tuple = cchunk->tuples[i];
+
+			Assert(tuple != NULL &&
+				   (ulong)tuple >= (ulong)(&cchunk->tuples[cchunk->ntups]) &&
+				   (ulong)tuple < (ulong)cchunk + shmseg_blocksize);
+			Assert(tuple->t_data != NULL &&
+				   (ulong)tuple->t_data >= (ulong)tuple &&
+				   (ulong)tuple->t_data < (ulong)cchunk + shmseg_blocksize);
+		}
+	}
+#endif
+}
+#else
+#define	cchunk_sanity_check(chunk)	do {} while(0)
+#endif
+
+int
+ccache_max_attribute_number(void)
+{
+	return (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+			BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+}
+
+/*
+ * ccache_on_resource_release
+ *
+ * It is a callback to put ccache_head being acquired locally, to keep
+ * consistency of reference counter.
+ */
+static void
+ccache_on_resource_release(ResourceReleasePhase phase,
+						   bool isCommit,
+						   bool isTopLevel,
+						   void *arg)
+{
+	dlist_mutable_iter	iter;
+
+	if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+		return;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry   *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+
+			if (isCommit)
+				elog(WARNING, "cache reference leak (tableoid=%u, refcnt=%d)",
+					 entry->ccache->tableoid, entry->ccache->refcnt);
+			cs_put_ccache(entry->ccache);
+
+			entry->ccache = NULL;
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+	}
+}
+
+static ccache_chunk *
+ccache_alloc_chunk(ccache_head *ccache, ccache_chunk *upper)
+{
+	ccache_chunk *cchunk = cs_alloc_shmblock();
+
+	if (cchunk)
+	{
+		cchunk->upper = upper;
+		cchunk->right = NULL;
+		cchunk->left = NULL;
+		cchunk->r_depth = 0;
+		cchunk->l_depth = 0;
+		cchunk->ntups = 0;
+		cchunk->usage = shmseg_blocksize;
+		cchunk->deadspace = 0;
+	}
+	return cchunk;
+}
+
+/*
+ * ccache_rebalance_tree
+ *
+ * It keeps the balance of ccache tree if the supplied chunk has
+ * unbalanced subtrees.
+ */
+#define TTREE_DEPTH(chunk)	\
+	((chunk) == 0 ? 0 : Max((chunk)->l_depth, (chunk)->r_depth) + 1)
+
+static void
+ccache_rebalance_tree(ccache_head *ccache, ccache_chunk *cchunk)
+{
+	Assert(cchunk->upper != NULL
+		   ? (cchunk->upper->left == cchunk || cchunk->upper->right == cchunk)
+		   : (ccache->root_chunk == cchunk));
+
+	if (cchunk->l_depth + 1 < cchunk->r_depth)
+	{
+		/* anticlockwise rotation */
+		ccache_chunk   *rchunk = cchunk->right;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->right = rchunk->left;
+		cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		cchunk->upper = rchunk;
+		if (cchunk->right)
+			cchunk->right->upper = cchunk;
+
+		rchunk->left = cchunk;
+		rchunk->l_depth = TTREE_DEPTH(rchunk->left);
+		rchunk->upper = upper;
+		cchunk->upper = rchunk;
+
+		if (!upper)
+			ccache->root_chunk = rchunk;
+		else if (upper->left == cchunk)
+		{
+			upper->left = rchunk;
+			upper->l_depth = TTREE_DEPTH(rchunk);
+		}
+		else
+		{
+			Assert(upper->right == cchunk);
+			upper->right = rchunk;
+			upper->r_depth = TTREE_DEPTH(rchunk);
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(rchunk->left);
+		AssertIfNotShmem(rchunk->right);
+		AssertIfNotShmem(rchunk->upper);
+	}
+	else if (cchunk->l_depth > cchunk->r_depth + 1)
+	{
+		/* clockwise rotation */
+		ccache_chunk   *lchunk = cchunk->left;
+		ccache_chunk   *upper = cchunk->upper;
+
+		cchunk->left = lchunk->right;
+		cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		cchunk->upper = lchunk;
+		if (cchunk->left)
+			cchunk->left->upper = cchunk;
+
+		lchunk->right = cchunk;
+		lchunk->r_depth = TTREE_DEPTH(lchunk->right);
+		lchunk->upper = upper;
+		cchunk->upper = lchunk;
+
+		if (!upper)
+			ccache->root_chunk = lchunk;
+		else if (upper->right == cchunk)
+		{
+			upper->right = lchunk;
+			upper->r_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		else
+		{
+			Assert(upper->left == cchunk);
+			upper->left = lchunk;
+			upper->l_depth = TTREE_DEPTH(lchunk) + 1;
+		}
+		AssertIfNotShmem(cchunk->right);
+		AssertIfNotShmem(cchunk->left);
+		AssertIfNotShmem(cchunk->upper);
+		AssertIfNotShmem(lchunk->left);
+		AssertIfNotShmem(lchunk->right);
+		AssertIfNotShmem(lchunk->upper);
+	}
+	cchunk_sanity_check(cchunk);
+}
+
+/* it computes "actual" free space we can use right now */
+#define cchunk_freespace(cchunk)		\
+	((cchunk)->usage - offsetof(ccache_chunk, tuples[(cchunk)->ntups]))
+/* it computes "expected" free space we can use if compaction */
+#define cchunk_availablespace(cchunk)	\
+	(cchunk_freespace(cchunk) + (cchunk)->deadspace)
+/* it computes space in use. Sum with availablespace is always blocksize */
+#define cchunk_usedspace(cchunk)						\
+	(offsetof(ccache_chunk, tuples[(cchunk)->ntups])	\
+	 + shmseg_blocksize - (cchunk)->usage - (cchunk)->deadspace)
+
+/*
+ * ccache_chunk_compaction
+ *
+ * It moves existing tuples to eliminate dead spaces of the chunk.
+ * Eventually, chunk's deadspace shall become zero.
+ */
+static void
+ccache_chunk_compaction(ccache_chunk *cchunk)
+{
+	ccache_chunk   *temp = alloca(shmseg_blocksize);
+	int				i;
+
+	/* setting up temporary chunk */
+	temp->upper		= cchunk->upper;
+	temp->right		= cchunk->right;
+	temp->left		= cchunk->left;
+	temp->r_depth	= cchunk->r_depth;
+	temp->l_depth	= cchunk->l_depth;
+	temp->ntups		= cchunk->ntups;
+	temp->usage		= shmseg_blocksize;
+	temp->deadspace	= 0;
+
+	for (i=0; i < cchunk->ntups; i++)
+	{
+		HeapTuple	tuple = cchunk->tuples[i];
+		HeapTuple	dest;
+		uint32		required = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+		uint32		offset;
+
+		Assert(required + sizeof(HeapTuple) <= cchunk_freespace(temp));
+
+		temp->usage -= required;
+		offset = temp->usage;
+
+		/*
+		 * Even though we put the body of HeapTupleHeaderData just after
+		 * HeapTupleData, usually, here is no guarantee that both of data
+		 * structures are located on continuous memory address.
+		 * So, we explicitly adjust tuple->t_data to point the area just
+		 * behind of itself, to reference the HeapTuple on columnar-cache
+		 * as like regular ones.
+		 */
+		dest = (HeapTuple)((char *)temp + offset);
+		dest->t_data = (HeapTupleHeader)((char *)dest + HEAPTUPLESIZE);
+		memcpy(dest, tuple, HEAPTUPLESIZE);
+		memcpy(dest->t_data, tuple->t_data, tuple->t_len);
+
+		temp->tuples[i] = (HeapTuple)((char *)cchunk + offset);
+	}
+	elog(LOG, "chunk (%p) compaction: freespace %zu -> %zu",
+		 cchunk, cchunk_freespace(cchunk), cchunk_freespace(temp));
+	memcpy(cchunk, temp, shmseg_blocksize);
+	cchunk_sanity_check(cchunk);
+}
+
+/*
+ * ccache_insert_tuple
+ *
+ * It inserts the supplied tuple, but uncached columns are dropped off,
+ * onto the ccache_head. If no space is left, it expands the t-tree
+ * structure with a chunk newly allocated. If no shared memory space was
+ * left, it returns false.
+ */
+static void
+do_insert_tuple(ccache_head *ccache, ccache_chunk *cchunk, HeapTuple tuple)
+{
+	HeapTuple	newtup;
+	ItemPointer	ctid = &tuple->t_self;
+	int			i_min = 0;
+	int			i_max = cchunk->ntups;
+	uint32		required = MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+
+	if (required + sizeof(HeapTuple) > cchunk_freespace(cchunk))
+		ccache_chunk_compaction(cchunk);
+	Assert(required + sizeof(HeapTuple) <= cchunk_freespace(cchunk));
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+
+	if (i_min < cchunk->ntups)
+	{
+		memmove(&cchunk->tuples[i_min + 1],
+				&cchunk->tuples[i_min],
+				sizeof(HeapTuple) * (cchunk->ntups - i_min));
+	}
+	cchunk->usage -= required;
+	newtup = (HeapTuple)(((char *)cchunk) + cchunk->usage);
+	memcpy(newtup, tuple, HEAPTUPLESIZE);
+	newtup->t_data = (HeapTupleHeader)((char *)newtup + HEAPTUPLESIZE);
+	memcpy(newtup->t_data, tuple->t_data, tuple->t_len);
+
+	cchunk->tuples[i_min] = newtup;
+	cchunk->ntups++;
+
+	cchunk_sanity_check(cchunk);
+}
+
+static void
+copy_tuple_properties(HeapTuple newtup, HeapTuple oldtup)
+{
+	ItemPointerCopy(&oldtup->t_self, &newtup->t_self);
+	newtup->t_tableOid = oldtup->t_tableOid;
+	memcpy(&newtup->t_data->t_choice.t_heap,
+		   &oldtup->t_data->t_choice.t_heap,
+		   sizeof(HeapTupleFields));
+	ItemPointerCopy(&oldtup->t_data->t_ctid,
+					&newtup->t_data->t_ctid);
+	newtup->t_data->t_infomask
+		= ((newtup->t_data->t_infomask & ~HEAP_XACT_MASK) |
+		   (oldtup->t_data->t_infomask &  HEAP_XACT_MASK));
+	newtup->t_data->t_infomask2
+		= ((newtup->t_data->t_infomask2 & ~HEAP2_XACT_MASK) |
+		   (oldtup->t_data->t_infomask2 &  HEAP2_XACT_MASK));
+}
+
+static bool
+ccache_insert_tuple_internal(ccache_head *ccache,
+							 ccache_chunk *cchunk,
+							 HeapTuple newtup)
+{
+	ItemPointer		ctid = &newtup->t_self;
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	int				required = MAXALIGN(HEAPTUPLESIZE + newtup->t_len);
+
+	if (cchunk->ntups == 0)
+	{
+		HeapTuple	tup;
+
+		cchunk->usage -= required;
+		cchunk->tuples[0] = tup = (HeapTuple)((char *)cchunk + cchunk->usage);
+		memcpy(tup, newtup, HEAPTUPLESIZE);
+		tup->t_data = (HeapTupleHeader)((char *)tup + HEAPTUPLESIZE);
+		memcpy(tup->t_data, newtup->t_data, newtup->t_len);
+		cchunk->ntups++;
+
+		return true;
+	}
+
+retry:
+	min_ctid = &cchunk->tuples[0]->t_self;
+	max_ctid = &cchunk->tuples[cchunk->ntups - 1]->t_self;
+
+	if (ItemPointerCompare(ctid, min_ctid) < 0)
+	{
+		if (!cchunk->left &&
+			required + sizeof(HeapTuple) <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->left)
+			{
+				cchunk->left = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->left)
+					return false;
+				cchunk->l_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->left, newtup))
+				return false;
+			cchunk->l_depth = TTREE_DEPTH(cchunk->left);
+		}
+	}
+	else if (ItemPointerCompare(ctid, max_ctid) > 0)
+	{
+		if (!cchunk->right &&
+			required + sizeof(HeapTuple) <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, newtup))
+				return false;
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+		}
+	}
+	else
+	{
+		if (required + sizeof(HeapTuple) <= cchunk_freespace(cchunk))
+			do_insert_tuple(ccache, cchunk, newtup);
+		else
+		{
+			HeapTuple	movtup;
+
+			/* push out largest ctid until we get enough space */
+			if (!cchunk->right)
+			{
+				cchunk->right = ccache_alloc_chunk(ccache, cchunk);
+				if (!cchunk->right)
+					return false;
+				cchunk->r_depth = 1;
+			}
+			movtup = cchunk->tuples[cchunk->ntups - 1];
+
+			if (!ccache_insert_tuple_internal(ccache, cchunk->right, movtup))
+				return false;
+
+			cchunk->ntups--;
+			cchunk->deadspace += MAXALIGN(HEAPTUPLESIZE + movtup->t_len);
+			cchunk->r_depth = TTREE_DEPTH(cchunk->right);
+
+			goto retry;
+		}
+	}
+	/* Rebalance the tree, if needed */
+	ccache_rebalance_tree(ccache, cchunk);
+
+	return true;
+}
+
+bool
+ccache_insert_tuple(ccache_head *ccache, Relation rel, HeapTuple tuple)
+{
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	HeapTuple	newtup;
+	Datum	   *cs_values = alloca(sizeof(Datum) * tupdesc->natts);
+	bool	   *cs_isnull = alloca(sizeof(bool) * tupdesc->natts);
+	int			i, j;
+
+	/* remove unreferenced columns */
+	heap_deform_tuple(tuple, tupdesc, cs_values, cs_isnull);
+	for (i=0; i < tupdesc->natts; i++)
+	{
+		j = i + 1 - FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(j, &ccache->attrs_used))
+			cs_isnull[i] = true;
+	}
+	newtup = heap_form_tuple(tupdesc, cs_values, cs_isnull);
+	copy_tuple_properties(newtup, tuple);
+
+	return ccache_insert_tuple_internal(ccache, ccache->root_chunk, newtup);
+}
+
+/*
+ * ccache_find_tuple
+ *
+ * It find a tuple that satisfies the supplied ItemPointer according to
+ * the ScanDirection. If NoMovementScanDirection, it returns a tuple that
+ * has strictly same ItemPointer. On the other hand, it returns a tuple
+ * that has the least ItemPointer greater than the supplied one if
+ * ForwardScanDirection, and also returns a tuple with the greatest
+ * ItemPointer smaller than the supplied one if BackwardScanDirection.
+ */
+HeapTuple
+ccache_find_tuple(ccache_chunk *cchunk, ItemPointer ctid,
+				  ScanDirection direction)
+{
+	ItemPointer		min_ctid;
+	ItemPointer		max_ctid;
+	HeapTuple		tuple = NULL;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+	int				rc;
+
+	if (cchunk->ntups == 0)
+		return false;
+
+	min_ctid = &cchunk->tuples[i_min]->t_self;
+	max_ctid = &cchunk->tuples[i_max]->t_self;
+
+	if ((rc = ItemPointerCompare(ctid, min_ctid)) <= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == ForwardScanDirection))
+		{
+			if (cchunk->ntups > direction)
+				return cchunk->tuples[direction];
+		}
+		else
+		{
+			if (cchunk->left)
+				tuple = ccache_find_tuple(cchunk->left, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == ForwardScanDirection)
+				return cchunk->tuples[0];
+			return tuple;
+		}
+	}
+
+	if ((rc = ItemPointerCompare(ctid, max_ctid)) >= 0)
+	{
+		if (rc == 0 && (direction == NoMovementScanDirection ||
+						direction == BackwardScanDirection))
+		{
+			if (i_max + direction >= 0)
+				return cchunk->tuples[i_max + direction];
+		}
+		else
+		{
+			if (cchunk->right)
+				tuple = ccache_find_tuple(cchunk->right, ctid, direction);
+			if (!HeapTupleIsValid(tuple) && direction == BackwardScanDirection)
+				return cchunk->tuples[i_max];
+			return tuple;
+		}
+	}
+
+	while (i_min < i_max)
+	{
+		int	i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) == 0)
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == NoMovementScanDirection)
+			return cchunk->tuples[i_min];
+		else if (direction == ForwardScanDirection)
+		{
+			Assert(i_min + 1 < cchunk->ntups);
+			return cchunk->tuples[i_min + 1];
+		}
+	}
+	else
+	{
+		if (direction == BackwardScanDirection && i_min > 0)
+			return cchunk->tuples[i_min - 1];
+		else if (direction == ForwardScanDirection)
+			return cchunk->tuples[i_min];
+	}
+	return NULL;
+}
+
+/*
+ * ccache_delete_tuple
+ *
+ * It synchronizes the properties of tuple being already cached, usually
+ * for deletion.
+ */
+bool
+ccache_delete_tuple(ccache_head *ccache, HeapTuple oldtup)
+{
+	HeapTuple	tuple;
+
+	tuple = ccache_find_tuple(ccache->root_chunk, &oldtup->t_self,
+							  NoMovementScanDirection);
+	if (!tuple)
+		return false;
+
+	copy_tuple_properties(tuple, oldtup);
+
+	return true;
+}
+
+/*
+ * ccache_merge_right_chunk
+ *
+ * It tries to find out the least greater chunk to merge if makes sense.
+ */
+static bool
+ccache_merge_right_chunk(ccache_chunk *cchunk, ccache_chunk *target)
+{
+	ccache_chunk   *upper;
+	int		i;
+	long	required;
+	long	expected;
+	bool	result = false;
+
+	cchunk_sanity_check(cchunk);
+
+	while (target != NULL)
+	{
+		cchunk_sanity_check(target);
+		if (target->left)
+		{
+			target = target->left;
+			continue;
+		}
+
+		/* merge them, if expected new chunk consumes less than 50% */
+		required = cchunk_usedspace(target);
+		expected = cchunk_usedspace(cchunk) + required;
+		if (required + sizeof(HeapTuple) <= cchunk_availablespace(cchunk) &&
+			expected <= shmseg_blocksize / 2)
+		{
+			if (required + sizeof(HeapTuple) > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required + sizeof(HeapTuple) <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			for (i=0; i < target->ntups; i++)
+			{
+				HeapTuple	oldtup = target->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= MAXALIGN(HEAPTUPLESIZE + oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data = (HeapTupleHeader)((char *)newtup +
+												   HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data, oldtup->t_len);
+				cchunk->tuples[cchunk->ntups++] = newtup;
+			}
+
+			/* detach the target chunk */
+			upper = target->upper;
+			Assert(upper != NULL && (upper->right == target ||
+									 upper->left == target));
+			if (upper->right == target)
+			{
+				upper->right = target->right;
+				upper->r_depth = target->r_depth;
+			}
+			else
+			{
+				upper->left = target->right;
+				upper->l_depth = target->r_depth;
+			}
+			if (target->right)
+				target->right->upper = target->upper;
+
+			/* release it */
+			memset(target, 0xdeadbeaf, shmseg_blocksize);
+			cs_free_shmblock(target);
+
+			cchunk_sanity_check(cchunk);
+			result = true;
+		}
+		break;
+	}
+	return result;
+}
+
+/*
+ * ccache_merge_left_chunk
+ *
+ * It tries to find out the greatest less chunk to merge if makes sense.
+ */
+static bool
+ccache_merge_left_chunk(ccache_chunk *cchunk, ccache_chunk *target)
+{
+	ccache_chunk   *upper;
+	int		i;
+	long	required;
+	long	expected;
+	bool	result = false;
+
+	cchunk_sanity_check(cchunk);
+
+	while (target != NULL)
+	{
+		cchunk_sanity_check(target);
+		if (target->right)
+		{
+			target = target->right;
+			continue;
+		}
+
+		/* merge them, if expected new chunk consumes less than 50% */
+		required = cchunk_usedspace(target);
+		expected = cchunk_usedspace(cchunk) + required;
+		if (required + sizeof(HeapTuple) <= cchunk_availablespace(cchunk) &&
+			expected <= shmseg_blocksize / 2)
+		{
+			if (required + sizeof(HeapTuple) > cchunk_freespace(cchunk))
+				ccache_chunk_compaction(cchunk);
+			Assert(required + sizeof(HeapTuple) <= cchunk_freespace(cchunk));
+
+			/* merge contents */
+			memmove(&cchunk->tuples[target->ntups],
+					&cchunk->tuples[0],
+					sizeof(HeapTuple) * cchunk->ntups);
+			cchunk->ntups += target->ntups;
+
+			for (i=0; i < target->ntups; i++)
+			{
+				HeapTuple	oldtup = target->tuples[i];
+				HeapTuple	newtup;
+
+				cchunk->usage -= MAXALIGN(HEAPTUPLESIZE + oldtup->t_len);
+				newtup = (HeapTuple)((char *)cchunk + cchunk->usage);
+				memcpy(newtup, oldtup, HEAPTUPLESIZE);
+				newtup->t_data = (HeapTupleHeader)((char *)newtup +
+												   HEAPTUPLESIZE);
+				memcpy(newtup->t_data, oldtup->t_data, oldtup->t_len);
+				cchunk->tuples[i] = newtup;
+			}
+			/* detach the target chunk */
+			upper = target->upper;
+			Assert(upper != NULL && (upper->right == target ||
+									 upper->left == target));
+			if (upper->right == target)
+			{
+				upper->right = target->left;
+				upper->r_depth = target->l_depth;
+			}
+			else
+			{
+				upper->left = target->left;
+				upper->l_depth = target->l_depth;
+			}
+			if (target->left)
+				target->left->upper = target->upper;
+
+			/* release it */
+			memset(target, 0xfee1dead, shmseg_blocksize);
+			cs_free_shmblock(target);
+
+			cchunk_sanity_check(cchunk);
+			result = true;
+		}
+		cchunk_sanity_check(cchunk);
+		break;
+	}
+	return result;
+}
+
+static ccache_chunk *
+lookup_vacuum_chunk(ccache_chunk *cchunk, ItemPointer ctid, int *p_index)
+{
+	ItemPointer		temp;
+	int				i_min = 0;
+	int				i_max = cchunk->ntups - 1;
+
+	if (cchunk->ntups == 0)
+		return NULL;
+
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) < 0)
+	{
+		if (cchunk->left)
+			return lookup_vacuum_chunk(cchunk->left, ctid, p_index);
+		return NULL;
+	}
+	if (ItemPointerCompare(ctid, &cchunk->tuples[i_max]->t_self) > 0)
+	{
+		if (cchunk->right)
+			return lookup_vacuum_chunk(cchunk->right, ctid, p_index);
+		return NULL;
+	}
+
+	while (i_min < i_max)
+	{
+		int		i_mid = (i_min + i_max) / 2;
+
+		if (ItemPointerCompare(ctid, &cchunk->tuples[i_mid]->t_self) <= 0)
+			i_max = i_mid;
+		else
+			i_min = i_mid + 1;
+	}
+	Assert(i_min == i_max);
+	Assert(i_min == 0 ||
+		   (ItemPointerCompare(ctid, &cchunk->tuples[i_min-1]->t_self) > 0 &&
+			ItemPointerCompare(ctid, &cchunk->tuples[i_min]->t_self) <= 0));
+
+	temp = &cchunk->tuples[i_min]->t_self;
+	if (ItemPointerGetBlockNumber(temp) != ItemPointerGetBlockNumber(ctid))
+		return NULL;
+
+	*p_index = i_min;
+
+	return cchunk;
+}
+
+bool
+ccache_vacuum_page(ccache_head *ccache, Buffer buffer)
+{
+	/* Note that it needs buffer being valid and pinned */
+	BlockNumber		blknum = BufferGetBlockNumber(buffer);
+	OffsetNumber	offnum = FirstOffsetNumber;
+	Page			page = BufferGetPage(buffer);
+	ItemPointerData	ctid;
+	ccache_chunk   *cchunk;
+	int				index;
+	bool			rebalance;
+
+next:
+	ItemPointerSetBlockNumber(&ctid, blknum);
+	ItemPointerSetOffsetNumber(&ctid, offnum);
+	cchunk = lookup_vacuum_chunk(ccache->root_chunk, &ctid, &index);
+	if (cchunk)
+	{
+		while(index < cchunk->ntups)
+		{
+			HeapTuple	tuple = cchunk->tuples[index];
+			ItemId		itemid;
+
+			/* if it moved to the next block, finish it */
+			if (ItemPointerGetBlockNumber(&tuple->t_self) != blknum)
+				return true;
+
+			/*
+			 * check whether the cached tuple has already moved or
+			 * deleted.
+			 */
+			offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
+			itemid = PageGetItemId(page, offnum);
+
+			if (!ItemIdIsNormal(itemid))
+			{
+				/* find the actual tuple, if redirected */
+				while (ItemIdIsRedirected(itemid))
+					itemid = PageGetItemId(page, ItemIdGetRedirect(itemid));
+
+				/* move the tuple, if redirected and still alive */
+				if (ItemIdIsNormal(itemid))
+				{
+					HeapTuple	 newtup = heap_copytuple(tuple);
+					OffsetNumber offset
+						= (itemid - ((PageHeader)page)->pd_linp) + 1;
+
+					Assert(itemid == PageGetItemId(page, offset));
+
+					ItemPointerSetOffsetNumber(&newtup->t_self, offset);
+
+					if (!ccache_insert_tuple_internal(ccache,
+													  ccache->root_chunk,
+													  newtup))
+						return false;
+					pfree(newtup);
+				}
+				/* remove the old item from cchunk */
+				if (index < cchunk->ntups - 1)
+					memmove(&cchunk->tuples[index],
+							&cchunk->tuples[index+1],
+							sizeof(HeapTuple) * (cchunk->ntups - index - 1));
+				cchunk->ntups--;
+				cchunk->deadspace += MAXALIGN(HEAPTUPLESIZE + tuple->t_len);
+			}
+			index++;
+		}
+		rebalance = false;
+		if (cchunk->left)
+			rebalance |= ccache_merge_left_chunk(cchunk, cchunk->left);
+		if (cchunk->right)
+			rebalance |= ccache_merge_right_chunk(cchunk, cchunk->right);
+		if (rebalance)
+			ccache_rebalance_tree(ccache, cchunk);
+	}
+
+	/*
+	 * In case when we reached end of the chunk, but not end of the page,
+	 * we need to acquire another chunk and vacuum it.
+	 */
+	if (++offnum < PageGetMaxOffsetNumber(page))
+		goto next;
+
+	return true;
+}
+
+static void
+ccache_release_all_chunks(ccache_chunk *cchunk)
+{
+	if (cchunk->left)
+		ccache_release_all_chunks(cchunk->left);
+	if (cchunk->right)
+		ccache_release_all_chunks(cchunk->right);
+	cs_free_shmblock(cchunk);
+}
+
+static void
+track_ccache_locally(ccache_head *ccache)
+{
+	ccache_entry   *entry;
+	dlist_node	   *dnode;
+
+	if (dlist_is_empty(&ccache_free_list))
+	{
+		/*
+		 * If no free ccache_entry is available, it construct a new one
+		 * on demand, to track the locally acquired columnar-cache.
+		 * Because get/put columnar-cache is a very frequent job, we
+		 * allocate tracking entries on the TopMemoryContext to reuse,
+		 * instead of allocation for each operation.
+		 */
+		PG_TRY();
+		{
+			entry = MemoryContextAlloc(TopMemoryContext,
+									   sizeof(ccache_entry));
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+		}
+		PG_CATCH();
+		{
+			cs_put_ccache(ccache);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+	}
+	dnode = dlist_pop_head_node(&ccache_free_list);
+	entry = dlist_container(ccache_entry, chain, dnode);
+	entry->owner = CurrentResourceOwner;
+	entry->ccache = ccache;
+	dlist_push_tail(&ccache_local_list, &entry->chain);
+}
+
+void
+untrack_ccache_locally(ccache_head *ccache)
+{
+	dlist_mutable_iter	iter;
+
+	dlist_foreach_modify(iter, &ccache_local_list)
+	{
+		ccache_entry *entry
+			= dlist_container(ccache_entry, chain, iter.cur);
+
+		if (entry->ccache == ccache &&
+			entry->owner == CurrentResourceOwner)
+		{
+			dlist_delete(&entry->chain);
+			dlist_push_tail(&ccache_free_list, &entry->chain);
+			return;
+		}
+	}
+}
+
+static void
+cs_put_ccache_nolock(ccache_head *ccache)
+{
+	Assert(ccache->refcnt > 0);
+	if (--ccache->refcnt == 0)
+	{
+		dlist_delete(&ccache->hash_chain);
+		dlist_delete(&ccache->lru_chain);
+		ccache_release_all_chunks(ccache->root_chunk);
+		dlist_push_head(&cs_ccache_hash->free_list, &ccache->hash_chain);
+	}
+	untrack_ccache_locally(ccache);
+}
+
+void
+cs_put_ccache(ccache_head *cache)
+{
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	cs_put_ccache_nolock(cache);
+	SpinLockRelease(&cs_ccache_hash->lock);
+}
+
+static ccache_head *
+cs_create_ccache(Oid tableoid, Bitmapset *attrs_used)
+{
+	ccache_head	   *temp;
+	ccache_head	   *new_cache;
+	dlist_node	   *dnode;
+
+	/*
+	 * Here is no columnar cache of this relation or cache attributes are
+	 * not enough to run the required query. So, it tries to create a new
+	 * ccache_head for the upcoming cache-scan.
+	 * Also allocate ones, if we have no free ccache_head any more.
+	 */
+	if (dlist_is_empty(&cs_ccache_hash->free_list))
+	{
+		char   *buffer;
+		int		offset;
+		int		nwords, size;
+
+		buffer = cs_alloc_shmblock();
+		if (!buffer)
+			return NULL;
+
+		nwords = (max_cached_attnum - FirstLowInvalidHeapAttributeNumber +
+				  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+		size = MAXALIGN(offsetof(ccache_head,
+								 attrs_used.words[nwords + 1]));
+		for (offset = 0; offset <= shmseg_blocksize - size; offset += size)
+		{
+			temp = (ccache_head *)(buffer + offset);
+
+			dlist_push_tail(&cs_ccache_hash->free_list, &temp->hash_chain);
+		}
+	}
+	dnode = dlist_pop_head_node(&cs_ccache_hash->free_list);
+	new_cache = dlist_container(ccache_head, hash_chain, dnode);
+
+	LWLockInitialize(&new_cache->lock, 0);
+	new_cache->refcnt = 1;
+	new_cache->status = CCACHE_STATUS_INITIALIZED;
+
+	new_cache->tableoid = tableoid;
+	new_cache->root_chunk = ccache_alloc_chunk(new_cache, NULL);
+	if (!new_cache->root_chunk)
+	{
+		dlist_push_head(&cs_ccache_hash->free_list, &new_cache->hash_chain);
+		return NULL;
+	}
+
+	if (attrs_used)
+		memcpy(&new_cache->attrs_used, attrs_used,
+			   offsetof(Bitmapset, words[attrs_used->nwords]));
+	else
+	{
+		new_cache->attrs_used.nwords = 1;
+		new_cache->attrs_used.words[0] = 0;
+	}
+	return new_cache;
+}
+
+ccache_head *
+cs_get_ccache(Oid tableoid, Bitmapset *attrs_used, bool create_on_demand)
+{
+	Datum			hash = hash_any((unsigned char *)&tableoid, sizeof(Oid));
+	Index			i = hash % ccache_hash_size;
+	dlist_iter		iter;
+	ccache_head	   *old_cache = NULL;
+	ccache_head	   *new_cache = NULL;
+	ccache_head	   *temp;
+
+	SpinLockAcquire(&cs_ccache_hash->lock);
+	PG_TRY();
+	{
+		/*
+		 * Try to find out existing ccache that has all the columns being
+		 * referenced in this query.
+		 */
+		dlist_foreach(iter, &cs_ccache_hash->slots[i])
+		{
+			temp = dlist_container(ccache_head, hash_chain, iter.cur);
+
+			if (tableoid != temp->tableoid)
+				continue;
+
+			if (bms_is_subset(attrs_used, &temp->attrs_used))
+			{
+				temp->refcnt++;
+				if (create_on_demand)
+					dlist_move_head(&cs_ccache_hash->lru_list,
+									&temp->lru_chain);
+				new_cache = temp;
+				goto out_unlock;
+			}
+			old_cache = temp;
+			break;
+		}
+
+		if (create_on_demand)
+		{
+			/* chose a set of columns to be cached */
+			if (old_cache)
+				attrs_used = ccache_new_attribute_set(tableoid,
+													  attrs_used,
+													  &old_cache->attrs_used);
+
+			new_cache = cs_create_ccache(tableoid, attrs_used);
+			if (!new_cache)
+				goto out_unlock;
+
+			dlist_push_head(&cs_ccache_hash->slots[i], &new_cache->hash_chain);
+			dlist_push_head(&cs_ccache_hash->lru_list, &new_cache->lru_chain);
+			if (old_cache)
+				cs_put_ccache_nolock(old_cache);
+		}
+	}
+	PG_CATCH();
+	{
+		SpinLockRelease(&cs_ccache_hash->lock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+out_unlock:
+	SpinLockRelease(&cs_ccache_hash->lock);
+
+	if (new_cache)
+		track_ccache_locally(new_cache);
+
+	return new_cache;
+}
+
+typedef struct {
+	Oid				tableoid;
+	int				status;
+	ccache_chunk   *cchunk;
+	ccache_chunk   *upper;
+	ccache_chunk   *right;
+	ccache_chunk   *left;
+	int				r_depth;
+	int				l_depth;
+	uint32			ntups;
+	uint32			usage;
+	ItemPointerData	min_ctid;
+	ItemPointerData	max_ctid;
+} ccache_status;
+
+static List *
+cache_scan_debuginfo_internal(ccache_head *ccache,
+							  ccache_chunk *cchunk, List *result)
+{
+	ccache_status  *cstatus = palloc0(sizeof(ccache_status));
+	List		   *temp;
+
+	if (cchunk->left)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->left, NIL);
+		result = list_concat(result, temp);
+	}
+	cstatus->tableoid = ccache->tableoid;
+	cstatus->status   = ccache->status;
+	cstatus->cchunk   = cchunk;
+	cstatus->upper    = cchunk->upper;
+	cstatus->right    = cchunk->right;
+	cstatus->left     = cchunk->left;
+	cstatus->r_depth  = cchunk->r_depth;
+	cstatus->l_depth  = cchunk->l_depth;
+	cstatus->ntups    = cchunk->ntups;
+	cstatus->usage    = cchunk->usage;
+	if (cchunk->ntups > 0)
+	{
+		ItemPointerCopy(&cchunk->tuples[0]->t_self,
+						&cstatus->min_ctid);
+		ItemPointerCopy(&cchunk->tuples[cchunk->ntups - 1]->t_self,
+						&cstatus->max_ctid);
+	}
+	else
+	{
+		ItemPointerSet(&cstatus->min_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+		ItemPointerSet(&cstatus->max_ctid,
+					   InvalidBlockNumber,
+					   InvalidOffsetNumber);
+	}
+	result = lappend(result, cstatus);
+
+	if (cchunk->right)
+	{
+		temp = cache_scan_debuginfo_internal(ccache, cchunk->right, NIL);
+		result = list_concat(result, temp);
+	}
+	return result;
+}
+
+/*
+ * cache_scan_debuginfo
+ *
+ * It shows the current status of ccache_chunks being allocated.
+ */
+Datum
+cache_scan_debuginfo(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	*fncxt;
+	List	   *cstatus_list;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc		tupdesc;
+		MemoryContext	oldcxt;
+		int				i;
+		dlist_iter		iter;
+		List		   *result = NIL;
+
+		fncxt = SRF_FIRSTCALL_INIT();
+		oldcxt = MemoryContextSwitchTo(fncxt->multi_call_memory_ctx);
+
+		/* make definition of tuple-descriptor */
+		tupdesc = CreateTemplateTupleDesc(12, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "tableoid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "upper",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "l_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "l_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "r_depth",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "r_chunk",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "ntuples",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)10, "usage",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)11, "min_ctid",
+						   TIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber)12, "max_ctid",
+						   TIDOID, -1, 0);
+		fncxt->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/* make a snapshot of the current table cache */
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		for (i=0; i < ccache_hash_size; i++)
+		{
+			dlist_foreach(iter, &cs_ccache_hash->slots[i])
+			{
+				ccache_head	*ccache
+					= dlist_container(ccache_head, hash_chain, iter.cur);
+
+				ccache->refcnt++;
+				SpinLockRelease(&cs_ccache_hash->lock);
+				track_ccache_locally(ccache);
+
+				LWLockAcquire(&ccache->lock, LW_SHARED);
+				result = cache_scan_debuginfo_internal(ccache,
+													   ccache->root_chunk,
+													   result);
+				LWLockRelease(&ccache->lock);
+
+				SpinLockAcquire(&cs_ccache_hash->lock);
+				cs_put_ccache_nolock(ccache);
+			}
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		fncxt->user_fctx = result;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	fncxt = SRF_PERCALL_SETUP();
+
+	cstatus_list = (List *)fncxt->user_fctx;
+	if (cstatus_list != NIL &&
+		fncxt->call_cntr < cstatus_list->length)
+	{
+		ccache_status *cstatus = list_nth(cstatus_list, fncxt->call_cntr);
+		Datum		values[12];
+		bool		isnull[12];
+		HeapTuple	tuple;
+
+		memset(isnull, false, sizeof(isnull));
+		values[0] = ObjectIdGetDatum(cstatus->tableoid);
+		if (cstatus->status == CCACHE_STATUS_INITIALIZED)
+			values[1] = CStringGetTextDatum("initialized");
+		else if (cstatus->status == CCACHE_STATUS_IN_PROGRESS)
+			values[1] = CStringGetTextDatum("in-progress");
+		else if (cstatus->status == CCACHE_STATUS_CONSTRUCTED)
+			values[1] = CStringGetTextDatum("constructed");
+		else
+			values[1] = CStringGetTextDatum("unknown");
+		values[2] = CStringGetTextDatum(psprintf("%p", cstatus->cchunk));
+		values[3] = CStringGetTextDatum(psprintf("%p", cstatus->upper));
+		values[4] = Int32GetDatum(cstatus->l_depth);
+		values[5] = CStringGetTextDatum(psprintf("%p", cstatus->left));
+		values[6] = Int32GetDatum(cstatus->r_depth);
+		values[7] = CStringGetTextDatum(psprintf("%p", cstatus->right));
+		values[8] = Int32GetDatum(cstatus->ntups);
+		values[9] = Int32GetDatum(cstatus->usage);
+
+		if (ItemPointerIsValid(&cstatus->min_ctid))
+			values[10] = PointerGetDatum(&cstatus->min_ctid);
+		else
+			isnull[10] = true;
+		if (ItemPointerIsValid(&cstatus->max_ctid))
+			values[11] = PointerGetDatum(&cstatus->max_ctid);
+		else
+			isnull[11] = true;
+
+		tuple = heap_form_tuple(fncxt->tuple_desc, values, isnull);
+
+		SRF_RETURN_NEXT(fncxt, HeapTupleGetDatum(tuple));
+	}
+	SRF_RETURN_DONE(fncxt);
+}
+PG_FUNCTION_INFO_V1(cache_scan_debuginfo);
+
+/*
+ * cs_alloc_shmblock
+ *
+ * It allocates a fixed-length block. The reason why this routine does not
+ * support variable length allocation is to simplify the logic for its purpose.
+ */
+static void *
+cs_alloc_shmblock(void)
+{
+	ccache_head	   *ccache;
+	dlist_node	   *dnode;
+	void		   *address = NULL;
+	int				index;
+	int				retry = 2;
+
+do_retry:
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	if (dlist_is_empty(&cs_shmseg_head->free_list) && retry-- > 0)
+	{
+		SpinLockRelease(&cs_shmseg_head->lock);
+
+		SpinLockAcquire(&cs_ccache_hash->lock);
+		if (!dlist_is_empty(&cs_ccache_hash->lru_list))
+		{
+			dnode = dlist_tail_node(&cs_ccache_hash->lru_list);
+			ccache = dlist_container(ccache_head, lru_chain, dnode);
+
+			pg_memory_barrier();
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache_nolock(ccache);
+			else
+				dlist_move_head(&cs_ccache_hash->lru_list, &ccache->lru_chain);
+		}
+		SpinLockRelease(&cs_ccache_hash->lock);
+
+		goto do_retry;
+	}
+
+	if (!dlist_is_empty(&cs_shmseg_head->free_list))
+	{
+		dnode = dlist_pop_head_node(&cs_shmseg_head->free_list);
+
+		index = dnode - cs_shmseg_head->blocks;
+		Assert(index >= 0 && index < shmseg_num_blocks);
+
+		memset(dnode, 0, sizeof(dlist_node));
+		address = (void *)((char *)cs_shmseg_head->base_address + 
+						   index * shmseg_blocksize);
+	}
+	SpinLockRelease(&cs_shmseg_head->lock);
+
+	return address;
+}
+
+/*
+ * cs_free_shmblock
+ *
+ * It release a block being allocated by cs_alloc_shmblock
+ */
+static void
+cs_free_shmblock(void *address)
+{
+	Size		curr = (Size) address;
+	Size		base = cs_shmseg_head->base_address;
+	ulong		index;
+	dlist_node *dnode;
+
+	Assert((curr - base) % shmseg_blocksize == 0);
+	Assert(curr >= base && curr < base + shmseg_num_blocks * shmseg_blocksize);
+	index = (curr - base) / shmseg_blocksize;
+
+	SpinLockAcquire(&cs_shmseg_head->lock);
+	dnode = &cs_shmseg_head->blocks[index];
+	Assert(dnode->prev == NULL && dnode->next == NULL);
+
+	dlist_push_head(&cs_shmseg_head->free_list, dnode);
+
+	SpinLockRelease(&cs_shmseg_head->lock);
+}
+
+static void
+ccache_setup(void)
+{
+	int		i;
+	bool	found;
+
+	/* allocation of a shared memory segment for table's hash */
+	cs_ccache_hash
+		= ShmemInitStruct("cache_scan: hash of columnar cache",
+						  MAXALIGN(offsetof(ccache_hash,
+											slots[ccache_hash_size])),
+						  &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_ccache_hash->lock);
+	dlist_init(&cs_ccache_hash->lru_list);
+	dlist_init(&cs_ccache_hash->free_list);
+	for (i=0; i < ccache_hash_size; i++)
+		dlist_init(&cs_ccache_hash->slots[i]);
+
+	/* allocation of a shared memory segment for columnar cache */
+	cs_shmseg_head = ShmemInitStruct("cache_scan: columnar cache",
+									 offsetof(shmseg_head,
+											  blocks[shmseg_num_blocks]) +
+									 (Size)shmseg_num_blocks *
+									 (Size)shmseg_blocksize,
+									 &found);
+	Assert(!found);
+
+	SpinLockInit(&cs_shmseg_head->lock);
+	dlist_init(&cs_shmseg_head->free_list);
+	cs_shmseg_head->base_address
+		= MAXALIGN(&cs_shmseg_head->blocks[shmseg_num_blocks]);
+	for (i=0; i < shmseg_num_blocks; i++)
+	{
+		dlist_push_tail(&cs_shmseg_head->free_list,
+						&cs_shmseg_head->blocks[i]);
+	}
+}
+
+void
+ccache_init(void)
+{
+	/* setup GUC variables */
+	DefineCustomIntVariable("cache_scan.block_size",
+							"block size of in-memory columnar cache",
+							NULL,
+							&shmseg_blocksize,
+							2048 * 1024,	/* 2MB */
+							1024 * 1024,	/* 1MB */
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+	if ((shmseg_blocksize & (shmseg_blocksize - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cache_scan.block_size must be power of 2")));
+
+	DefineCustomIntVariable("cache_scan.num_blocks",
+							"number of in-memory columnar cache blocks",
+							NULL,
+							&shmseg_num_blocks,
+							64,
+							64,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.hash_size",
+							"number of hash slots for columnar cache",
+							NULL,
+							&ccache_hash_size,
+							128,
+							128,
+							INT_MAX,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	DefineCustomIntVariable("cache_scan.max_cached_attnum",
+							"max attribute number we can cache",
+							NULL,
+							&max_cached_attnum,
+							128,
+							sizeof(bitmapword) * BITS_PER_BYTE,
+							2048,
+							PGC_SIGHUP,
+							GUC_NOT_IN_SAMPLE,
+							NULL, NULL, NULL);
+
+	/* request shared memory segment for table's cache */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(ccache_hash)) +
+						   MAXALIGN(sizeof(dlist_head) * ccache_hash_size) +
+						   MAXALIGN(sizeof(LWLockId) * ccache_hash_size) +
+						   MAXALIGN(offsetof(shmseg_head,
+											 blocks[shmseg_num_blocks])) +
+						   (Size)shmseg_num_blocks * (Size)shmseg_blocksize);
+
+	shmem_startup_next = shmem_startup_hook;
+	shmem_startup_hook = ccache_setup;
+
+	/* register resource-release callback */
+	dlist_init(&ccache_local_list);
+	dlist_init(&ccache_free_list);
+	RegisterResourceReleaseCallback(ccache_on_resource_release, NULL);
+}
diff --git a/contrib/cache_scan/cscan.c b/contrib/cache_scan/cscan.c
new file mode 100644
index 0000000..a9e24ec
--- /dev/null
+++ b/contrib/cache_scan/cscan.c
@@ -0,0 +1,1163 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/cache_scan/cscan.c
+ *
+ * An extension that offers an alternative way to scan a table utilizing column
+ * oriented database cache.
+ *
+ * Copyright (c) 2010-2013, PostgreSQL Global Development Group
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_language.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_trigger.h"
+#include "commands/explain.h"
+#include "commands/trigger.h"
+#include "executor/executor.h"
+#include "executor/nodeCustom.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/var.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/guc.h"
+#include "utils/spccache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "cache_scan.h"
+#include <limits.h>
+
+PG_MODULE_MAGIC;
+
+/* Type declarations */
+typedef struct
+{
+	CustomPath		cpath;
+	Bitmapset	   *attrs_used;
+} CScanPath;
+
+typedef struct
+{
+	CustomPlan		cplan;
+	Index			scanrelid;
+	Bitmapset	   *attrs_used;
+} CScanPlan;
+
+/* Static variables */
+static add_scan_path_hook_type		add_scan_path_next = NULL;
+static object_access_hook_type		object_access_next = NULL;
+static heap_page_prune_hook_type	heap_page_prune_next = NULL;
+static CustomPathMethods			cache_scan_path_methods;
+static CustomPlanMethods			cache_scan_plan_methods;
+
+static bool		cache_scan_enabled;
+static double	cache_scan_width_threshold;
+
+static bool
+cs_estimate_costs(PlannerInfo *root,
+                  RelOptInfo *baserel,
+				  Relation rel,
+                  CustomPath *cpath,
+				  Bitmapset **attrs_used)
+{
+	ListCell	   *lc;
+	ccache_head	   *ccache;
+	Oid				tableoid = RelationGetRelid(rel);
+	TupleDesc		tupdesc = RelationGetDescr(rel);
+	double			hit_ratio;
+	Cost			run_cost = 0.0;
+	Cost			startup_cost = 0.0;
+	double			tablespace_page_cost;
+	QualCost		qpqual_cost;
+	Cost			cpu_per_tuple;
+	int				i;
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* List up all the columns being in-use */
+	pull_varattnos((Node *) baserel->reltargetlist,
+				   baserel->relid,
+				   attrs_used);
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) lfirst(lc);
+
+		pull_varattnos((Node *) rinfo->clause,
+					   baserel->relid,
+					   attrs_used);
+	}
+
+	for (i=FirstLowInvalidHeapAttributeNumber + 1; i <= 0; i++)
+	{
+		int		attidx = i - FirstLowInvalidHeapAttributeNumber;
+
+		if (bms_is_member(attidx, *attrs_used))
+		{
+			/* oid and whole-row reference is not supported */
+			if (i == ObjectIdAttributeNumber || i == InvalidAttrNumber)
+				return false;
+
+			/* clear system attributes from the bitmap */
+			*attrs_used = bms_del_member(*attrs_used, attidx);
+		}
+	}
+
+	/*
+	 * Because of layout on the shared memory segment, we have to restrict
+	 * the largest attribute number in use to prevent overrun by growth of
+	 * Bitmapset.
+	 */
+	if (*attrs_used &&
+		(*attrs_used)->nwords > ccache_max_attribute_number())
+		return false;
+
+	/*
+	 * Try to get existing cache. If exist, we assume this cache is probably 
+	 * available on the time when this plan is executed.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(rel), *attrs_used, false);
+	if (!ccache)
+	{
+		double	usage_ratio;
+		int		total_width = 0;
+		int		tuple_width = 0;
+
+		/*
+		 * Estimation of average width of cached columns - it does not make
+		 * sense to construct a new cache, if its average width is more than
+		 * the configured threshold; usually 30%.
+		 */
+		for (i=0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = tupdesc->attrs[i];
+			int		attidx = i + 1 - FirstLowInvalidHeapAttributeNumber;
+			int		width;
+
+			if (attr->attlen > 0)
+				width = attr->attlen;
+			else
+				width = get_attavgwidth(tableoid, attr->attnum);
+
+			total_width += width;
+			if (bms_is_member(attidx, *attrs_used))
+				tuple_width += width;
+		}
+		usage_ratio = (double)tuple_width / (double)total_width;
+		if (usage_ratio > cache_scan_width_threshold / 100.0)
+			return false;
+
+		hit_ratio = 0.05;
+	}
+	else
+	{
+		/*
+		 * If and when existing cache hold all the required attributes,
+		 * we don't need to care about width of cached columnes (because
+		 * it is obvious the width is less than threshold).
+		 */
+		hit_ratio = 0.95;
+		cs_put_ccache(ccache);
+	}
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &tablespace_page_cost);
+	/* Disk costs */
+	run_cost += (1.0 - hit_ratio) * tablespace_page_cost * baserel->pages;
+
+	/* CPU costs (logic copied fomr get_restriction_qual_cost) */
+	if (cpath->path.param_info)
+	{
+		/* Include costs of pushed-down clauses */
+		cost_qual_eval(&qpqual_cost,
+					   cpath->path.param_info->ppi_clauses,
+					   root);
+        qpqual_cost.startup += baserel->baserestrictcost.startup;
+        qpqual_cost.per_tuple += baserel->baserestrictcost.per_tuple;
+    }
+    else
+		qpqual_cost = baserel->baserestrictcost;
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+
+	return true;
+}
+
+/*
+ * ccache_new_attribute_set
+ *
+ * It selects attributes to be cached. In case when a part of newly required
+ * attributes are not cached, we will re-construct a new one that has union
+ * set of attributes, unless its width does not grow up larger than the
+ * configured threshold. If (required | existing) set has larger width than
+ * the threshold, we will drop attribute in (~required & existing).
+ * Usually, total width of required columns shall be less than threshold
+ * because of the checks in planner stage.
+ */
+Bitmapset *
+ccache_new_attribute_set(Oid tableoid,
+						 Bitmapset *required, Bitmapset *existing)
+{
+	Form_pg_class	relform;
+	HeapTuple		reltup;
+	Bitmapset	   *difference;
+	int			   *attrs_width;
+	int				i, anum;
+	int				total_width;
+	int				required_width;
+	int				union_width;
+	double			usage_ratio;
+
+	reltup = SearchSysCache1(RELOID, ObjectIdGetDatum(tableoid));
+	if (!HeapTupleIsValid(reltup))
+		elog(ERROR, "cache lookup failed for relation %u", tableoid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	attrs_width = palloc0(sizeof(int) * relform->relnatts);
+
+	total_width = 0;
+	required_width = 0;
+	union_width = 0;
+	for (anum = 1; anum <= relform->relnatts; anum++)
+	{
+		Form_pg_attribute	attform;
+		HeapTuple			atttup;
+
+		atttup = SearchSysCache2(ATTNUM,
+								 ObjectIdGetDatum(tableoid),
+								 Int16GetDatum(anum));
+		if (!HeapTupleIsValid(atttup))
+			elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+				 anum, tableoid);
+		attform = (Form_pg_attribute) GETSTRUCT(atttup);
+
+		if (attform->attisdropped)
+		{
+			ReleaseSysCache(atttup);
+			continue;
+		}
+
+		if (attform->attlen > 0)
+			attrs_width[anum - 1] = attform->attlen;
+		else
+			attrs_width[anum - 1] = get_attavgwidth(tableoid, anum);
+
+		total_width += attrs_width[anum - 1];
+		i = anum - FirstLowInvalidHeapAttributeNumber;
+		if (bms_is_member(i, required))
+		{
+			required_width += attrs_width[anum - 1];
+			union_width += attrs_width[anum - 1];
+		}
+		else if (bms_is_member(i, existing))
+			union_width += attrs_width[anum - 1];
+
+		ReleaseSysCache(atttup);
+	}
+	ReleaseSysCache(reltup);
+
+	/*
+	 * An easy case: if total_width is still less than the threshold,
+	 * we don't need to drop columns to cache; just propagation.
+	 */
+	usage_ratio = (double) union_width / (double) total_width;
+	if (usage_ratio <= cache_scan_width_threshold / 100.0)
+		return bms_union(required, existing);
+
+	/*
+	 * Elsewhere, we will drop a column that is not referenced with
+	 * the upcoming query, but has largest width within them, until
+	 * width of the cache is larger than the threshold.
+	 */
+	difference = bms_difference(existing, required);
+	do {
+		Bitmapset  *tempset = bms_copy(difference);
+		int			maxwidth = -1;
+		AttrNumber	maxwidth_anum = 0;
+
+		Assert(!bms_is_empty(tempset));
+		union_width = required_width;
+		while ((i = bms_first_member(tempset)) >= 0)
+		{
+			anum += FirstLowInvalidHeapAttributeNumber;
+
+			union_width += attrs_width[anum - 1];
+			if (attrs_width[anum - 1] > maxwidth)
+			{
+				maxwidth = attrs_width[anum - 1];
+				maxwidth_anum = anum;
+			}
+		}
+		pfree(tempset);
+
+		/* drop a column that has largest length */
+		Assert(maxwidth_anum > 0);
+		i = maxwidth_anum - FirstLowInvalidHeapAttributeNumber;
+		difference = bms_del_member(difference, i);
+		union_width -= maxwidth;
+
+		usage_ratio = (double) union_width / (double) total_width;
+	} while (usage_ratio > cache_scan_width_threshold / 100.0);
+
+	pfree(attrs_width);
+
+	return bms_union(required, difference);
+}
+
+/*
+ * cs_relation_has_synchronizer
+ *
+ * A table that can have columnar-cache also needs to have trigger for
+ * synchronization, to ensure the on-memory cache keeps the latest contents
+ * of the heap. It returns TRUE, if supplied relation has triggers that
+ * invokes cache_scan_synchronizer on appropriate context. Elsewhere, FALSE
+ * shall be returned.
+ */
+static bool
+cs_relation_has_synchronizer(Relation rel)
+{
+	int		i, numtriggers;
+	bool	has_on_insert_synchronizer = false;
+	bool	has_on_update_synchronizer = false;
+	bool	has_on_delete_synchronizer = false;
+	bool	has_on_truncate_synchronizer = false;
+
+	if (!rel->trigdesc)
+		return false;
+
+	numtriggers = rel->trigdesc->numtriggers;
+	for (i=0; i < numtriggers; i++)
+	{
+		Trigger	   *trig = rel->trigdesc->triggers + i;
+		HeapTuple	tup;
+
+		if (!trig->tgenabled)
+			continue;
+
+		tup = SearchSysCache1(PROCOID, ObjectIdGetDatum(trig->tgfoid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for function %u", trig->tgfoid);
+
+		if (((Form_pg_proc) GETSTRUCT(tup))->prolang == ClanguageId)
+		{
+			Datum	value;
+			bool	isnull;
+			char   *prosrc;
+			char   *probin;
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_prosrc, &isnull);
+			if (isnull)
+				elog(ERROR, "null prosrc for C function %u", trig->tgoid);
+			prosrc = TextDatumGetCString(value);
+
+			value = SysCacheGetAttr(PROCOID, tup,
+									Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				elog(ERROR, "null probin for C function %u", trig->tgoid);
+			probin = TextDatumGetCString(value);
+
+			if (strcmp(prosrc, "cache_scan_synchronizer") == 0 &&
+				strcmp(probin, "$libdir/cache_scan") == 0)
+			{
+				int16		tgtype = trig->tgtype;
+
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_INSERT))
+					has_on_insert_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_UPDATE))
+					has_on_update_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_ROW,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_DELETE))
+					has_on_delete_synchronizer = true;
+				if (TRIGGER_TYPE_MATCHES(tgtype,
+										 TRIGGER_TYPE_STATEMENT,
+										 TRIGGER_TYPE_AFTER,
+										 TRIGGER_TYPE_TRUNCATE))
+					has_on_truncate_synchronizer = true;
+			}
+			pfree(prosrc);
+			pfree(probin);
+		}
+		ReleaseSysCache(tup);
+	}
+
+	if (has_on_insert_synchronizer &&
+		has_on_update_synchronizer &&
+		has_on_delete_synchronizer &&
+		has_on_truncate_synchronizer)
+		return true;
+	return false;
+}
+
+
+static void
+cs_add_scan_path(PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 RangeTblEntry *rte)
+{
+	Relation		rel;
+
+	/* call the secondary hook if exist */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* Is this feature available now? */
+	if (!cache_scan_enabled)
+		return;
+
+	/* Only regular tables can be cached */
+	if (baserel->reloptkind != RELOPT_BASEREL ||
+		rte->rtekind != RTE_RELATION)
+		return;
+
+	/* Core code should already acquire an appropriate lock  */
+	rel = heap_open(rte->relid, NoLock);
+
+	if (cs_relation_has_synchronizer(rel))
+	{
+		CScanPath  *cspath = palloc0(sizeof(CScanPath));
+		Relids		required_outer;
+		Bitmapset  *attrs_used = NULL;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+        required_outer = baserel->lateral_relids;
+
+		cspath->cpath.path.type = T_CustomPath;
+		cspath->cpath.path.pathtype = T_CustomPlan;
+		cspath->cpath.path.parent = baserel;
+		cspath->cpath.path.param_info
+			= get_baserel_parampathinfo(root, baserel, required_outer);
+		if (cs_estimate_costs(root, baserel, rel, &cspath->cpath, &attrs_used))
+		{
+			cspath->attrs_used = attrs_used;
+			add_path(baserel, &cspath->cpath.path);
+		}
+		else
+			pfree(cspath);
+
+		cspath->cpath.methods = &cache_scan_path_methods;
+	}
+	heap_close(rel, NoLock);
+}
+
+static CustomPlan *
+cs_create_custom_plan(PlannerInfo *root, CustomPath *custom_path)
+{
+	CScanPlan	   *csplan;
+	RelOptInfo	   *rel = custom_path->path.parent;
+	Index			scanrelid = rel->relid;
+	Bitmapset	   *attrs_used = ((CScanPath *) custom_path)->attrs_used;
+	List		   *tlist = NIL;
+	List		   *clauses;
+	RangeTblEntry  *rte;
+
+	/* make up targetlist */
+	if (use_physical_tlist(root, rel))
+		tlist = build_physical_tlist(root, rel);
+	if (tlist == NIL)
+		tlist = build_path_tlist(root, &custom_path->path);
+
+	/* make up scan clauses */
+	clauses = rel->baserestrictinfo;
+	if (custom_path->path.param_info)
+		clauses = list_concat(list_copy(clauses),
+							  custom_path->path.param_info->ppi_clauses);
+	clauses = order_qual_clauses(root, clauses);
+	clauses = extract_actual_clauses(clauses, false);
+	if (custom_path->path.param_info)
+		clauses = (List *) replace_nestloop_params(root, (Node *) clauses);
+
+	/* it should be a base rel... */
+	Assert(scanrelid > 0);
+	Assert(rel->rtekind == RTE_RELATION);
+	rte = planner_rt_fetch(scanrelid, root);
+	Assert(rte->rtekind == RTE_RELATION);
+
+	/* make a CScanPlan node */
+	csplan = palloc0(sizeof(CScanPlan));
+	csplan->cplan.plan.type = T_CustomPlan;
+	csplan->cplan.plan.targetlist = tlist;
+	csplan->cplan.plan.qual = clauses;
+	csplan->cplan.methods = &cache_scan_plan_methods;
+	csplan->scanrelid = scanrelid;
+	csplan->attrs_used = bms_copy(attrs_used);
+
+	return &csplan->cplan;
+}
+
+static void
+cs_textout_custom_path(StringInfo str, Node *node)
+{
+	CScanPath  *cspath = (CScanPath *) node;
+	Bitmapset  *tmpset;
+	int			x;
+
+	appendStringInfoChar(str, '(');
+	appendStringInfoChar(str, 'b');
+	tmpset = bms_copy(cspath->attrs_used);
+	while ((x = bms_first_member(tmpset)) >= 0)
+		appendStringInfo(str, " %d", x);
+	bms_free(tmpset);
+	appendStringInfoChar(str, ')');
+}
+
+static void
+cs_set_custom_plan_ref(PlannerInfo *root,
+					   CustomPlan *custom_plan,
+					   int rtoffset)
+{
+	CScanPlan  *csplan = (CScanPlan *) custom_plan;
+
+	csplan->scanrelid += rtoffset;
+	csplan->cplan.plan.targetlist = (List *)
+		fix_scan_expr(root, (Node *) csplan->cplan.plan.targetlist, rtoffset);
+	csplan->cplan.plan.qual = (List *)
+		fix_scan_expr(root, (Node *) csplan->cplan.plan.qual, rtoffset);
+}
+
+static void
+cs_finalize_custom_plan(PlannerInfo *root,
+						CustomPlan *custom_plan,
+						Bitmapset **paramids,
+						Bitmapset **valid_params,
+						Bitmapset **scan_params)
+{
+	*paramids = bms_add_members(*paramids, *scan_params);
+}
+
+typedef struct
+{
+	CustomPlanState	cps;
+	Relation		curr_rel;
+	HeapScanDesc	curr_scan;
+	TupleTableSlot *curr_slot;
+	ccache_head	   *ccache;
+	ItemPointerData	curr_ctid;
+	Bitmapset	   *attrs_used;
+	bool			normal_seqscan;
+	bool			with_construction;
+} CScanState;
+
+static CustomPlanState *
+cs_begin_custom_plan(CustomPlan *node, EState *estate, int eflags)
+{
+	CScanState	   *csstate;
+	CScanPlan	   *csplan = (CScanPlan *) node;
+	Relation		rel;
+	HeapScanDesc	scandesc = NULL;
+	Index			scanrelid = csplan->scanrelid;
+	Bitmapset	   *attrs_used;
+	ccache_head	   *ccache;
+
+	/* construct CScanState */
+	csstate = palloc0(sizeof(CScanState));
+	csstate->cps.ps.type = T_CustomPlanState;
+	csstate->cps.ps.plan = &node->plan;
+	csstate->cps.ps.state = estate;
+	csstate->cps.methods = &cache_scan_plan_methods;
+
+	/* create expression context for node */
+	ExecAssignExprContext(estate, &csstate->cps.ps);
+
+	/* initialize child expressions */
+	csstate->cps.ps.targetlist = (List *)
+        ExecInitExpr((Expr *) node->plan.targetlist, &csstate->cps.ps);
+    csstate->cps.ps.qual = (List *)
+		ExecInitExpr((Expr *) node->plan.qual, &csstate->cps.ps);
+
+	/* tuple table initialization */
+	ExecInitResultTupleSlot(estate, &csstate->cps.ps);
+	csstate->curr_slot = ExecAllocTableSlot(&estate->es_tupleTable);
+
+	/* open the relation to be scanned */
+	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+	csstate->curr_rel = rel;
+	ExecSetSlotDescriptor(csstate->curr_slot, RelationGetDescr(rel));
+
+	csstate->cps.ps.ps_TupFromTlist = false;
+
+	ExecAssignResultTypeFromTL(&csstate->cps.ps);
+	ExecAssignProjectionInfo(&csstate->cps.ps, RelationGetDescr(rel));
+
+	/* Do nothing if EXPLAIN without ANALYZE */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return &csstate->cps;
+
+	/* Determine the scan strategy */
+	attrs_used = bms_copy(csplan->attrs_used);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), attrs_used, true);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_SHARED);
+		if (ccache->status < CCACHE_STATUS_CONSTRUCTED)
+		{
+			LWLockRelease(&ccache->lock);
+			LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+			if (ccache->status == CCACHE_STATUS_INITIALIZED)
+			{
+				ccache->status = CCACHE_STATUS_IN_PROGRESS;
+				csstate->with_construction = true;
+				scandesc = heap_beginscan(rel, SnapshotAny, 0, NULL);
+			}
+			else if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			{
+				csstate->normal_seqscan = true;
+				scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+			}
+		}
+		LWLockRelease(&ccache->lock);
+		csstate->ccache = ccache;
+
+		/* seek to the first position */
+		if (estate->es_direction == ForwardScanDirection)
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, 0);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, 0);
+		}
+		else
+		{
+			ItemPointerSetBlockNumber(&csstate->curr_ctid, MaxBlockNumber);
+			ItemPointerSetOffsetNumber(&csstate->curr_ctid, MaxOffsetNumber);
+		}
+	}
+	else
+	{
+		scandesc = heap_beginscan(rel, estate->es_snapshot, 0, NULL);
+		csstate->normal_seqscan = true;
+	}
+	csstate->curr_scan = scandesc;
+
+	return &csstate->cps;
+}
+
+/*
+ * cache_scan_needs_next
+ *
+ * We may fetch a tuple to be invisible because columnar cache stores
+ * all the living tuples, including ones updated / deleted by concurrent
+ * sessions. So, it is a job of the caller to check MVCC visibility.
+ * It decides whether we need to move the next tuple due to the visibility
+ * condition, or not. If given tuple was NULL, it is obviously a time to
+ * break searching because it means no more tuples on the cache.
+ */
+static bool
+cache_scan_needs_next(HeapTuple tuple, Snapshot snapshot, Buffer buffer)
+{
+	bool	visibility;
+
+	/* end of the scan */
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	visibility = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+	if (buffer != InvalidBuffer)
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	return !visibility ? true : false;
+}
+
+static TupleTableSlot *
+cache_scan_next(CustomPlanState *node)
+{
+	CScanState	   *csstate = (CScanState *) node;
+	Relation		rel = csstate->curr_rel;
+	HeapScanDesc	scan = csstate->curr_scan;
+	TupleTableSlot *slot = csstate->curr_slot;
+	EState		   *estate = csstate->cps.ps.state;
+	Snapshot		snapshot = estate->es_snapshot;
+	HeapTuple		tuple;
+	Buffer			buffer;
+
+	do {
+		ccache_head	   *ccache = csstate->ccache;
+
+		if (!ccache)
+		{
+			/*
+			 * ccache == NULL implies two cases; (1) a fallback path using
+			 * regular sequential scan instead of cache-only scan (2) cache
+			 * construction got failed during scan. We need to pay attention
+			 * for the later case because it uses SnapshotAny, thus it fetches
+			 * all the tuples including invisible ones.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+			buffer = scan->rs_cbuf;
+		}
+		else if (csstate->with_construction)
+		{
+			/*
+			 * "with_construction" means the columnar cache is under
+			 * construction, so we need to fetch a tuple from heap of
+			 * the target relation and insert it into the cache.
+			 * Note that we use SnapshotAny to fetch all the tuples both
+			 * of visible and invisible ones, so it is our responsibility
+			 * to check tuple visibility according to snapshot or the
+			 * current estate.
+			 * It is same even when we fetch tuples from the cache, without
+			 * referencing heap buffer.
+			 */
+			tuple = heap_getnext(scan, estate->es_direction);
+
+			if (HeapTupleIsValid(tuple))
+			{
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				if (ccache_insert_tuple(ccache, rel, tuple))
+					LWLockRelease(&ccache->lock);
+				else
+				{
+					/*
+					 * If ccache_insert_tuple got failed, it usually
+					 * implies lack of shared memory, thus unable to
+					 * continue construction of the columnar cacher.
+					 * So, we put the cache under construction status;
+					 * that prevents others to grab it again, and
+					 * moves to regular sequential scan for remaining
+					 * portion.
+					 */
+					cs_put_ccache(ccache);
+					LWLockRelease(&ccache->lock);
+					csstate->ccache = NULL;
+				}
+				buffer = scan->rs_cbuf;
+			}
+			else
+			{
+				/*
+				 * Once we reached end of the relation, it implies the
+				 * columnar-cache becomes constructed.
+				 */
+				LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+				ccache->status = CCACHE_STATUS_CONSTRUCTED;
+				LWLockRelease(&ccache->lock);
+				buffer = scan->rs_cbuf;
+			}
+		}
+		else
+		{
+			LWLockAcquire(&ccache->lock, LW_SHARED);
+			tuple = ccache_find_tuple(ccache->root_chunk,
+									  &csstate->curr_ctid,
+									  estate->es_direction);
+			if (HeapTupleIsValid(tuple))
+			{
+				ItemPointerCopy(&tuple->t_self, &csstate->curr_ctid);
+				tuple = heap_copytuple(tuple);
+			}
+			LWLockRelease(&ccache->lock);
+			buffer = InvalidBuffer;
+		}
+	} while (cache_scan_needs_next(tuple, snapshot, buffer));
+
+	if (HeapTupleIsValid(tuple))
+		ExecStoreTuple(tuple, slot, buffer, buffer == InvalidBuffer);
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+static bool
+cache_scan_recheck(CustomPlanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+cs_exec_custom_plan(CustomPlanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) cache_scan_next,
+					(ExecScanRecheckMtd) cache_scan_recheck);
+}
+
+static void
+cs_end_custom_plan(CustomPlanState *node)
+{
+	CScanState	   *csstate = (CScanState *) node;
+
+	if (csstate->ccache)
+	{
+		ccache_head	   *ccache = csstate->ccache;
+		bool			needs_remove = false;
+
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+		if (ccache->status == CCACHE_STATUS_IN_PROGRESS)
+			needs_remove = true;
+		LWLockRelease(&ccache->lock);
+
+		/*
+		 * In case when status of columnar-cache is "in-progress",
+		 * it implies the table scan didn't reach to the end of relation,
+		 * thus columnar-cache was not constructed completely.
+		 * Elsewhere, we keep the ccache that was originally created with
+		 * refcnt=1, but untrack this ccache.
+		 */
+		if (needs_remove || !csstate->with_construction)
+			cs_put_ccache(ccache);
+		else if (csstate->with_construction)
+			untrack_ccache_locally(ccache);
+	}
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->cps.ps);
+
+	/* clean out the tuple table */
+	ExecClearTuple(csstate->cps.ps.ps_ResultTupleSlot);
+	ExecClearTuple(csstate->curr_slot);
+
+	/* close the heap scan */
+	if (csstate->curr_scan)
+		heap_endscan(csstate->curr_scan);
+	/* close the heap relation */
+	if (csstate->curr_rel)
+		heap_close(csstate->curr_rel, NoLock);
+}
+
+static void
+cs_rescan_custom_plan(CustomPlanState *node)
+{
+	elog(ERROR, "not implemented yet");
+}
+
+static void
+cs_explain_custom_plan_target_rel(CustomPlanState *node, ExplainState *es)
+{
+	CScanState *csstate = (CScanState *) node;
+	CScanPlan  *csplan = (CScanPlan *) csstate->cps.ps.plan;
+	Index		scanrelid = csplan->scanrelid;
+	char	   *refname;
+	char	   *objectname = NULL;
+	char	   *namespace = NULL;
+	RangeTblEntry  *rte;
+
+	rte = rt_fetch(scanrelid, es->rtable);
+	Assert(rte->rtekind == RTE_RELATION);
+
+	refname = (char *) list_nth(es->rtable_names, scanrelid - 1);
+	if (refname == NULL)
+		refname = rte->eref->aliasname;
+	objectname = get_rel_name(rte->relid);
+	if (es->verbose)
+		namespace = get_namespace_name(get_rel_namespace(rte->relid));
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		appendStringInfoString(es->str, " on");
+		if (namespace != NULL)
+			appendStringInfo(es->str, " %s.%s", quote_identifier(namespace),
+							 quote_identifier(objectname));
+		else if (objectname != NULL)
+			appendStringInfo(es->str, " %s", quote_identifier(objectname));
+		if (objectname == NULL || strcmp(refname, objectname) != 0)
+			appendStringInfo(es->str, " %s", quote_identifier(refname));
+	}
+	else
+	{
+		if (objectname != NULL)
+			ExplainPropertyText("Relation Name", objectname, es);
+		if (namespace != NULL)
+			ExplainPropertyText("Schema", namespace, es);
+		ExplainPropertyText("Alias", refname, es);
+	}
+}
+
+static void
+cs_explain_custom_plan(CustomPlanState *node,
+					   List *ancestors,
+					   ExplainState *es)
+{}
+
+static Bitmapset *
+cs_get_relids_custom_plan(CustomPlanState *node)
+{
+	CScanState *csstate = (CScanState *) node;
+	CScanPlan  *csplan = (CScanPlan *) csstate->cps.ps.plan;
+
+	return bms_make_singleton(csplan->scanrelid);
+}
+
+static void
+cs_textout_custom_plan(StringInfo str, const CustomPlan *node)
+{
+	CScanPlan  *csplan = (CScanPlan *) node;
+	Bitmapset  *tmpset = bms_copy(csplan->attrs_used);
+	int			x;
+
+	appendStringInfo(str, " :scanrelid %u", csplan->scanrelid);
+	appendStringInfo(str, " :attrs_used (b");
+	while ((x = bms_first_member(tmpset)) >= 0)
+		appendStringInfo(str, " %d", x);
+	appendStringInfoChar(str, ')');
+	bms_free(tmpset);
+}
+
+static CustomPlan *
+cs_copy_custom_plan(const CustomPlan *_from)
+{
+	const CScanPlan *from = (const CScanPlan *) _from;
+	CScanPlan  *newnode = palloc0(sizeof(CScanPlan));
+
+	CopyCustomPlanCommon((const Node *) from, (Node *) newnode);
+	newnode->scanrelid = from->scanrelid;
+	newnode->attrs_used = bms_copy(from->attrs_used);
+
+	return &newnode->cplan;
+}
+
+/*
+ * cache_scan_synchronizer
+ *
+ * trigger function to synchronize the columnar-cache with heap contents.
+ */
+Datum
+cache_scan_synchronizer(PG_FUNCTION_ARGS)
+{
+	TriggerData	   *trigdata = (TriggerData *) fcinfo->context;
+	Relation		rel = trigdata->tg_relation;
+	HeapTuple		tuple = trigdata->tg_trigtuple;
+	HeapTuple		newtup = trigdata->tg_newtuple;
+	HeapTuple		result = NULL;
+	const char	   *tg_name = trigdata->tg_trigger->tgname;
+	ccache_head	   *ccache;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		elog(ERROR, "%s: not fired by trigger manager", tg_name);
+
+	ccache = cs_get_ccache(RelationGetRelid(rel), NULL, false);
+	if (!ccache)
+		return PointerGetDatum(newtup);
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+	PG_TRY();
+	{
+		TriggerEvent	tg_event = trigdata->tg_event;
+
+		if (TRIGGER_FIRED_AFTER(tg_event) &&
+			TRIGGER_FIRED_FOR_ROW(tg_event) &&
+			TRIGGER_FIRED_BY_INSERT(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+				 TRIGGER_FIRED_BY_UPDATE(tg_event))
+		{
+			ccache_insert_tuple(ccache, rel, newtup);
+			ccache_delete_tuple(ccache, tuple);
+			result = newtup;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+                 TRIGGER_FIRED_FOR_ROW(tg_event) &&
+                 TRIGGER_FIRED_BY_DELETE(tg_event))
+		{
+			ccache_delete_tuple(ccache, tuple);
+			result = tuple;
+		}
+		else if (TRIGGER_FIRED_AFTER(tg_event) &&
+				 TRIGGER_FIRED_FOR_STATEMENT(tg_event) &&
+				 TRIGGER_FIRED_BY_TRUNCATE(tg_event))
+		{
+			if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+				cs_put_ccache(ccache);
+		}
+		else
+			elog(ERROR, "%s: fired by unexpected context (%08x)",
+				 tg_name, tg_event);
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(&ccache->lock);
+		cs_put_ccache(ccache);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+
+	PG_RETURN_POINTER(result);
+}
+PG_FUNCTION_INFO_V1(cache_scan_synchronizer);
+
+/*
+ * ccache_on_object_access
+ *
+ * It dropps an existing columnar-cache if the cached table was altered or
+ * dropped.
+ */
+static void
+ccache_on_object_access(ObjectAccessType access,
+						Oid classId,
+						Oid objectId,
+						int subId,
+						void *arg)
+{
+	ccache_head	   *ccache;
+
+	/* ALTER TABLE and DROP TABLE needs cache invalidation */
+	if (access != OAT_DROP && access != OAT_POST_ALTER)
+		return;
+	if (classId != RelationRelationId)
+		return;
+
+	ccache = cs_get_ccache(objectId, NULL, false);
+	if (!ccache)
+		return;
+
+	LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+	if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+		cs_put_ccache(ccache);
+	LWLockRelease(&ccache->lock);
+	cs_put_ccache(ccache);
+}
+
+/*
+ * ccache_on_page_prune
+ *
+ * It is a callback function when a particular heap block got vacuumed.
+ * On vacuuming, its dead space, being allocated by dead tuples, got
+ * reclaimed and tuple's location was ought to be moved.
+ * This routine also reclaims the space by dead tuples on the columnar
+ * cache according to layout changes on the heap.
+ */
+static void
+ccache_on_page_prune(Relation relation,
+					 Buffer buffer,
+					 int ndeleted,
+					 TransactionId OldestXmin,
+					 TransactionId latestRemovedXid)
+{
+	ccache_head	   *ccache;
+	bool			result;
+
+	/* call the secondary hook */
+	if (heap_page_prune_next)
+		(*heap_page_prune_next)(relation, buffer, ndeleted,
+								OldestXmin, latestRemovedXid);
+
+	/*
+	 * If relation already has a columnar-cache, it needs to be cleaned up
+	 * according to the heap vacuuming, also.
+	 */
+	ccache = cs_get_ccache(RelationGetRelid(relation), NULL, false);
+	if (ccache)
+	{
+		LWLockAcquire(&ccache->lock, LW_EXCLUSIVE);
+
+		result = ccache_vacuum_page(ccache, buffer);
+
+		LWLockRelease(&ccache->lock);
+
+		/* if failed to vacuum, drop cache */
+		if (!result)
+			cs_put_ccache(ccache);
+
+		cs_put_ccache(ccache);
+	}
+}
+
+void
+_PG_init(void)
+{
+	if (IsUnderPostmaster)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		errmsg("cache_scan must be loaded via shared_preload_libraries")));
+
+	DefineCustomBoolVariable("cache_scan.enabled",
+							 "turn on/off cache_scan feature on run-time",
+							 NULL,
+							 &cache_scan_enabled,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	DefineCustomRealVariable("cache_scan.width_threshold",
+							 "threshold percentage to be cached",
+							 NULL,
+							 &cache_scan_width_threshold,
+							 30.0,
+							 0.0,
+							 100.0,
+							 PGC_SIGHUP,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* initialization of cache subsystem */
+	ccache_init();
+
+	/* callbacks for cache invalidation */
+	object_access_next = object_access_hook;
+	object_access_hook = ccache_on_object_access;
+
+	heap_page_prune_next = heap_page_prune_hook;
+	heap_page_prune_hook = ccache_on_page_prune;
+
+	/* registration of custom scan provider */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = cs_add_scan_path;
+
+	/* setting up static plan/path moethods */
+	memset(&cache_scan_path_methods, 0, sizeof(CustomPathMethods));
+	cache_scan_path_methods.CustomName = "cache scan";
+	cache_scan_path_methods.CreateCustomPlan = cs_create_custom_plan;
+	cache_scan_path_methods.TextOutCustomPath = cs_textout_custom_path;
+
+	memset(&cache_scan_plan_methods, 0, sizeof(CustomPlanMethods));
+	cache_scan_plan_methods.CustomName = "cache scan";
+	cache_scan_plan_methods.SetCustomPlanRef = cs_set_custom_plan_ref;
+	cache_scan_plan_methods.SupportBackwardScan = NULL;
+	cache_scan_plan_methods.FinalizeCustomPlan = cs_finalize_custom_plan;
+	cache_scan_plan_methods.BeginCustomPlan = cs_begin_custom_plan;
+	cache_scan_plan_methods.ExecCustomPlan = cs_exec_custom_plan;
+	cache_scan_plan_methods.EndCustomPlan = cs_end_custom_plan;
+	cache_scan_plan_methods.ReScanCustomPlan = cs_rescan_custom_plan;
+	cache_scan_plan_methods.MarkPosCustomPlan = NULL;
+	cache_scan_plan_methods.RestrPosCustomPlan = NULL;
+	cache_scan_plan_methods.ExplainCustomPlanTargetRel
+		= cs_explain_custom_plan_target_rel;
+	cache_scan_plan_methods.ExplainCustomPlan = cs_explain_custom_plan;
+	cache_scan_plan_methods.GetRelidsCustomPlan = cs_get_relids_custom_plan;
+	cache_scan_plan_methods.GetSpecialCustomVar = NULL;
+	cache_scan_plan_methods.TextOutCustomPlan = cs_textout_custom_plan;
+	cache_scan_plan_methods.CopyCustomPlan = cs_copy_custom_plan;
+}
diff --git a/doc/src/sgml/cache-scan.sgml b/doc/src/sgml/cache-scan.sgml
new file mode 100644
index 0000000..df8d0de
--- /dev/null
+++ b/doc/src/sgml/cache-scan.sgml
@@ -0,0 +1,266 @@
+<!-- doc/src/sgml/cache-scan.sgml -->
+
+<sect1 id="cache-scan" xreflabel="cache-scan">
+ <title>cache-scan</title>
+
+ <indexterm zone="cache-scan">
+  <primary>cache-scan</primary>
+ </indexterm>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   The <filename>cache-scan</> module provides an alternative way to scan
+   relations using on-memory columnar cache, instead of usual heap scan,
+   in case when previous scan already holds contents of the table on the
+   cache.
+   Unlike buffer cache, it holds contents of the limited number of columns,
+   but not whole of the record, thus it allows to hold larger number of records
+   per same amount of RAM. Probably, this characteristic makes sense to run
+   analytic queries on a table with many columns and records.
+  </para>
+  <para>
+   Once this module gets loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   on-memory columnar cache, instead of regular heap scan.
+   It also performs as a proof-of-concept implementation that works on
+   the custom-scan API that enables to extend the core executor system.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Installation</title>
+  <para>
+   This module has to be loaded using
+   <xref linkend="guc-shared-preload-libraries"> parameter to acquired
+   a particular amount of shared memory on startup time.
+   In addition, the relation to be cached has special triggers, called
+   synchronizer, are implemented with <literal>cache_scan_synchronizer</>
+   function that synchronizes the cache contents according to the latest
+   heap on <command>INSERT</>, <command>UPDATE</>, <command>DELETE</> or
+   <command>TRUNCATE</>.
+  </para>
+  <para>
+   You can run this extension according to the following steps.
+  </para>
+  <procedure>
+   <step>
+    <para>
+     Adjust <xref linkend="guc-shared-preload-libraries"> parameter to
+     load <filename>cache_scan</> binary on startup time, then restart
+     the postmaster.
+    </para>
+   </step>
+   <step>
+    <para>
+     Run <xref linkend="sql-createextension"> to create synchronizer
+     function of <filename>cache_scan</>.
+<programlisting>
+CREATE EXTENSION cache_scan;
+</programlisting>
+    </para>
+   </step>
+   <step>
+    <para>
+     Create triggers of synchronizer on the target relation.
+<programlisting>
+CREATE TRIGGER t1_cache_row_sync
+    AFTER INSERT OR UPDATE OR DELETE ON t1 FOR ROW
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+CREATE TRIGGER t1_cache_stmt_sync
+    AFTER TRUNCATE ON t1 FOR STATEMENT
+    EXECUTE PROCEDURE cache_scan_synchronizer();
+</programlisting>
+    </para>
+   </step>
+  </procedure>
+ </sect2>
+
+ <sect2>
+  <title>How does it works</title>
+  <para>
+   This module performs according to the usual fashion of
+   <xref linkend="custom-scan">.
+   It offers an alternative way to scan a relation if relation has synchronizer
+   triggers and width of referenced columns are less than 30% of average
+   record width.
+   Then, query optimizer will pick up the cheapest path. If the path chosen
+   is a custom-scan path managed by <filename>cache_scan</>, it runs on the
+   target relation using columnar cache.
+   On the first time running, it tries to construct relation's cache along
+   with regular sequential scan. Next time or later, it can run on
+   the columnar cache without referencing the heap.
+  </para>
+  <para>
+   You can check whether the query plan uses <filename>cache_scan</> using
+   <xref linkend="sql-explain"> command, as follows:
+<programlisting>
+postgres=# EXPLAIN (costs off) SELECT a,b FROM t1 WHERE b < pi();
+                     QUERY PLAN
+----------------------------------------------------
+ Custom Scan (cache scan) on t1
+   Filter: (b < 3.14159265358979::double precision)
+(2 rows)
+</programlisting>
+  </para>
+  <para>
+   A columnar cache, associated with a particular relation, has one or more chunks
+   that performs as node or leaf of t-tree structure.
+   The <literal>cache_scan_debuginfo()</> function can dump useful informationl;
+   properties of all the active chunks as follows.
+<programlisting>
+postgres=# SELECT * FROM cache_scan_debuginfo();
+ tableoid |   status    |     chunk      |     upper      | l_depth |    l_chunk     | r_depth |    r_chunk     | ntuples |  usage  | min_ctid  | max_ct
+id
+----------+-------------+----------------+----------------+---------+----------------+---------+----------------+---------+---------+-----------+-----------
+    16400 | constructed | 0x7f2b8ad84740 | 0x7f2b8af84740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (0,1)     | (677,15)
+    16400 | constructed | 0x7f2b8af84740 | (nil)          |       1 | 0x7f2b8ad84740 |       2 | 0x7f2b8b384740 |   29126 |  233088 | (677,16)  | (1354,30)
+    16400 | constructed | 0x7f2b8b184740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |   29126 |  233088 | (1354,31) | (2032,2)
+    16400 | constructed | 0x7f2b8b384740 | 0x7f2b8af84740 |       1 | 0x7f2b8b184740 |       1 | 0x7f2b8b584740 |   29126 |  233088 | (2032,3)  | (2709,33)
+    16400 | constructed | 0x7f2b8b584740 | 0x7f2b8b384740 |       0 | (nil)          |       0 | (nil)          |    3478 | 1874560 | (2709,34) | (2790,28)
+(5 rows)
+</programlisting>
+  </para>
+  <para>
+   All the cached tuples are indexed with <literal>ctid</> order, and each chunk has
+   an array of partial tuples with min- and max- values. Its left node is linked to
+   the chunks that have tuples with smaller <literal>ctid</>, and its right node is
+   linked to the chunks that have larger ones.
+   It enables to find out tuples in timely fashion when it needs to be invalidated
+   according to heap updates by DDL, DML or vacuuming.
+  </para>
+  <para>
+   The columnar cache are not owned by a particular session, so it retains the cache
+   unless it does not dropped or postmaster does not restart.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>GUC Parameters</title>
+  <variablelist>
+   <varlistentry id="guc-cache-scan-block_size" xreflabel="cache_scan.block_size">
+    <term><varname>cache_scan.block_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.block_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls length of the block on shared memory segment
+      for the columnar-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columnar cache on demand.
+      Too large block size damages flexibility of memory assignment, and
+      too small block size consumes much management are for each block.
+      So, we recommend to keep is as the default value; that is 2MB per block.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-num_blocks" xreflabel="cache_scan.num_blocks">
+    <term><varname>cache_scan.num_blocks</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.num_blocks</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls number of the block on shared memory segment
+      for the columnar-cache. It needs to restart postmaster for validation.
+     </para>
+     <para>
+      <filename>cache_scan</> module acquires <literal>cache_scan.num_blocks</>
+      x <literal>cache_scan.block_size</> bytes of shared memory segment on
+      the startup time, then allocates them for columnar cache on demand.
+      Too small number of blocks damages flexibility of memory assignment
+      and may cause undesired cache dropping.
+      So, we recommend to set enough number of blocks to keep contents of
+      the target relations on memory.
+      Its default is <literal>64</literal>; probably too small for most of
+      real use cases.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-hash_size" xreflabel="cache_scan.hash_size">
+    <term><varname>cache_scan.hash_size</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.hash_size</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls width of the internal hash table slots; that
+      link every columnar cache distributed by table's oid.
+      Its default is <literal>128</>; no need to adjust it usually.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-max_cached_attnum" xreflabel="cache_scan.max_cached_attnum">
+    <term><varname>cache_scan.max_cached_attnum</> (<type>integer</type>)</term>
+    <indexterm>
+     <primary><varname>cache_scan.max_cached_attnum</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the maximum attribute number we can cache on
+      the columnar cache. Because of internal data representation, a bitmap set
+      to track attributes being cached has to be fixed-length.
+      Thus, the largest attribute number needs to be fixed preliminary.
+      Its default is <literal>128</>; although most tables likely have less
+      than 100 columns.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-enabled" xreflabel="cache_scan.enabled">
+    <term><varname>cache_scan.enabled</> (<type>boolean</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.enabled</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter enables or disables the query planner's use of
+      cache-only scan, even if it is ready to run.
+      Note that this parameter does not affect to synchronizer triggers,
+      so existing columnar cache being already constructed shall be
+      synchronized, even if cache-only scan is disabled later.
+      The default is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry id="guc-cache-scan-width_threshold" xreflabel="cache_scan.width_threshold">
+    <term><varname>cache_scan.width_threshold</> (<type>float</type>) </term>
+    <indexterm>
+     <primary><varname>cache_scan.width_threshold</> configuration parameter</>
+    </indexterm>
+    <listitem>
+     <para>
+      This parameter controls the threshold to construct cache-only scan plan
+      for planner. (If the proposed scan plan cost is enough cheap, planner
+      will choose it, instead of the built-in ones.)
+      This extension tries to built a cache-only scan plan if average width of
+      the referenced columns is less than the threshold in percentage.
+      The default is <literal>30.0</> that means a cache-only scan plan shall
+      be proposed to the planner if sum of width of referenced columns is
+      less than <literal>(30.0 / 100.0) x (average width of table)</>.
+     </para>
+     <para>
+      Because columnar cache feature makes sense if width of cached columns
+      is much less than total width of table definition, it needs to control
+      table scans that references many columns that will consume unignorable
+      amount of shared memory, and eventually kills the benefit.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index ec68f10..1da30b8 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -107,6 +107,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &auto-explain;
  &btree-gin;
  &btree-gist;
+ &cache-scan;
  &chkpass;
  &citext;
  &cube;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 33f964e..e7edd0e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -103,6 +103,7 @@
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
+<!ENTITY cache-scan      SYSTEM "cache-scan.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
pgsql-v9.4-mvcc_allows_cache.v11.patchapplication/octet-stream; name=pgsql-v9.4-mvcc_allows_cache.v11.patchDownload
 src/backend/utils/time/tqual.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index c4732ed..202f033 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -105,11 +105,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  *
  * The caller should pass xid as the XID of the transaction to check, or
  * InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
  */
 static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (BufferIsInvalid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
pgsql-v9.4-vacuum_page_hook.v11.patchapplication/octet-stream; name=pgsql-v9.4-vacuum_page_hook.v11.patchDownload
 src/backend/access/heap/pruneheap.c | 13 +++++++++++++
 src/include/access/heapam.h         |  7 +++++++
 2 files changed, 20 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3c69e1b..5bf7cea 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,6 +43,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -311,6 +314,16 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	 * and update FSM with the remaining space.
 	 */
 
+	/*
+	 * This callback allows extensions to synchronize their own status with
+	 * heap image on the disk, when this buffer page is vacuumed.
+	 */
+	if (heap_page_prune_hook)
+		(*heap_page_prune_hook)(relation,
+								buffer,
+								ndeleted,
+								OldestXmin,
+								prstate.latestRemovedXid);
 	return ndeleted;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0f80257..e88e839 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -164,6 +164,13 @@ extern void heap_restrpos(HeapScanDesc scan);
 extern void heap_sync(Relation relation);
 
 /* in heap/pruneheap.c */
+typedef void (*heap_page_prune_hook_type)(Relation relation,
+										  Buffer buffer,
+										  int ndeleted,
+										  TransactionId OldestXmin,
+										  TransactionId latestRemovedXid);
+extern PGDLLIMPORT heap_page_prune_hook_type heap_page_prune_hook;
+
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern int heap_page_prune(Relation relation, Buffer buffer,
 				TransactionId OldestXmin,
#24Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kouhei Kaigai (#23)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Hello,

The attached patches are revised ones according to the latest custom-plan
interface patch (v11).
The cache-scan module was re-implemented on the newer interface, and also
I noticed the extension does not handle the tuples being redirected correctly,
So, I revised the logic in ccache_vacuum_page() totally. It now becomes to
synchronize the cached tuples per page, not per tuple, basic and tries to
merge t-tree chunks per page basis also.

Also, I split the patches again because *demonstration* part is much larger
than the patches to the core backend. It will help reviewing.
* pgsql-v9.4-vacuum_page_hook.v11.patch
-> It adds a hook for each page being vacuumed; that needs to synchronize
the status of in-memory cache managed by extension.
* pgsql-v9.4-mvcc_allows_cache.v11.patch
-> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
on the in-memory cache, not on the heap.
* pgsql-v9.4-example-cache_scan.v11.patch
-> It demonstrates the usage of above two patches. It allows to scan
a relation without storage access if possible.

All the patches are good. The cache scan extension patch may need
further refinement
in terms of performance improvement but the same can be handled later also.
So I am marking the patch as "ready for committer". Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Haribabu Kommi (#24)
Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

Also, I split the patches again because *demonstration* part is much
larger than the patches to the core backend. It will help reviewing.
* pgsql-v9.4-vacuum_page_hook.v11.patch
-> It adds a hook for each page being vacuumed; that needs to synchronize
the status of in-memory cache managed by extension.
* pgsql-v9.4-mvcc_allows_cache.v11.patch
-> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
on the in-memory cache, not on the heap.
* pgsql-v9.4-example-cache_scan.v11.patch
-> It demonstrates the usage of above two patches. It allows to scan
a relation without storage access if possible.

All the patches are good. The cache scan extension patch may need further
refinement in terms of performance improvement but the same can be handled
later also.
So I am marking the patch as "ready for committer". Thanks for the patch.

Thanks for your dedicated efforts on these patches.

The smaller portions of above submission can be applied independent from the
custom-plan interface, and its scale is much smaller than contrib/ portion
(about 30lines in total).
If someone can pick them up individually from the extension portion, it also
makes sense. I intended to implement the extension portion as simple as I can,
for the demonstration purpose rather than performance, however, its scale is
about 2.5KL. :-(
Yes, I know the time pressure towards v9.4 final feature freeze....

Best regards,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
Sent: Tuesday, March 18, 2014 11:14 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Kohei KaiGai; Tom Lane; PgHacker; Robert Haas
Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
table scan?)

On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
wrote:

Hello,

The attached patches are revised ones according to the latest
custom-plan interface patch (v11).
The cache-scan module was re-implemented on the newer interface, and
also I noticed the extension does not handle the tuples being
redirected correctly, So, I revised the logic in ccache_vacuum_page()
totally. It now becomes to synchronize the cached tuples per page, not
per tuple, basic and tries to merge t-tree chunks per page basis also.

Also, I split the patches again because *demonstration* part is much
larger than the patches to the core backend. It will help reviewing.
* pgsql-v9.4-vacuum_page_hook.v11.patch
-> It adds a hook for each page being vacuumed; that needs to synchronize
the status of in-memory cache managed by extension.
* pgsql-v9.4-mvcc_allows_cache.v11.patch
-> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
on the in-memory cache, not on the heap.
* pgsql-v9.4-example-cache_scan.v11.patch
-> It demonstrates the usage of above two patches. It allows to scan
a relation without storage access if possible.

All the patches are good. The cache scan extension patch may need further
refinement in terms of performance improvement but the same can be handled
later also.
So I am marking the patch as "ready for committer". Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers