GIN improvements part 1: additional information
Hackers,
Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.
Resemble GIN interface changes that this patch introduce.
Patch modifies GIN interface as following:
1) Two arguments are added to extractValue
Datum **addInfo, bool **addInfoIsNull
2) Two arguments are added to consistent
Datum addInfo[], bool addInfoIsNull[]
3) New method config is introduced which returns datatype oid of addtional
information (analogy with SP-GiST config method).
Additionally there is another patch which demonstrates benefits from
additional information storage itself (because it don't accelerate tsearch
itselt). It comes in separate thread.
------
With best regards,
Alexander Korotkov.
Attachments:
On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.
New version of patch is attached with some more refactoring and bug fixes.
------
With best regards,
Alexander Korotkov.
Attachments:
On Mon, Jun 17, 2013 at 4:54 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov <aekorotkov@gmail.com
wrote:
Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.New version of patch is attached with some more refactoring and bug fixes.
Another revision with more bug fixes.
------
With best regards,
Alexander Korotkov.
Attachments:
On Wed, Jun 19, 2013 at 12:44 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Mon, Jun 17, 2013 at 4:54 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov <
aekorotkov@gmail.com> wrote:Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.New version of patch is attached with some more refactoring and bug fixes.
Another revision with more bug fixes.
New revision of patch is attached. Now it includes some docs.
------
With best regards,
Alexander Korotkov.
Attachments:
On 06/25/2013 12:03 AM, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Hi,
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:
* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever
* gin_private.h:ginDataPageLeafRead() - fetch_att() is used to retrieve
the additional info, but per the definition and comments in tupmacs.h it
expects aligned pointer.
* gindatapage.c:ginCheckPlaceToDataPageLeaf() - comment "if leaf data
page" should probably be "on a leaf data page" or so.
Regards,
Antonin Houska (Tony)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25.06.2013 01:03, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Thanks! I'm looking into this in detail now.
First, this patch actually contains two major things:
1. Pack item pointers more tightly on posting data leaf pages.
2. Allow opclass implementation to attach "additional information" to
each item pointer.
These are two very distinct features, so this patch needs to be split
into two. I extracted the 1st part into a separate patch, attached, and
am going to focus on that now.
I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.
The patch needs a lot of cleanup still, and I may well have broken some
stuff, but I'm quite pleased with the performance. I tested this with
two tables; one is the titles from the DBLP dataset. Another is integer
arrays, created with this:
create function randomintarr() returns int[] as $$ select
array_agg((random() * 1000.0)::int4) from generate_series(1,10) $$
language sql;
create table intarrtbl as select randomintarr() as ii from
generate_series(1, 10000000);
The effect on the index sizes is quite dramatic:
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size |
--------+--------------------+-------+--------+-------------+--------+
public | gin_intarr_master | index | heikki | intarrtbl | 585 MB |
public | gin_intarr_patched | index | heikki | intarrtbl | 211 MB |
public | gin_title | index | heikki | dblp_titles | 93 MB |
public | gin_title_master | index | heikki | dblp_titles | 180 MB |
(4 rows)
Tomas Vondra tested the search performance of an earlier version of this
patch: /messages/by-id/50BFF89A.7080908@fuzzy.cz).
He initially saw a huge slowdown, but could not reproduce it with a
later version of the patch. I did not see much difference in a few quick
queries I ran, so we're probably good on that front.
There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with
the patch, so anyone using GIN probably wants to rebuild them anyway,
sooner or later. Still, I'd like to give it a shot.
2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?
3. I'd like to see some performance testing of insertions, deletions,
and vacuum. I suspect that maintaining the 32-entry index might be
fairly expensive, as it's rewritten on every update to a leaf page.
- Heikki
Attachments:
gin-packed-postinglists-1.patchtext/x-diff; name=gin-packed-postinglists-1.patchDownload
*** a/src/backend/access/gin/README
--- b/src/backend/access/gin/README
***************
*** 145,150 **** none appear in the key entry itself. The separate pages are called a
--- 145,151 ----
Note that in either case, the ItemPointers associated with a key can
easily be read out in sorted order; this is relied on by the scan
algorithms.
+ FIXME: Update above paragraph!
* The index tuple header fields of a leaf key entry are abused as follows:
*** a/src/backend/access/gin/gindatapage.c
--- b/src/backend/access/gin/gindatapage.c
***************
*** 17,22 ****
--- 17,113 ----
#include "access/gin_private.h"
#include "utils/rel.h"
+ /*
+ * Write item pointer into leaf data page using varbyte encoding. Since
+ * BlockNumber is stored in incremental manner we also need a previous item
+ * pointer.
+ */
+ char *
+ ginDataPageLeafWriteItemPointer(char *ptr, ItemPointer iptr, ItemPointer prev)
+ {
+ uint32 blockNumberIncr;
+ uint16 offset;
+ uint8 v;
+
+ blockNumberIncr = iptr->ip_blkid.bi_lo + (iptr->ip_blkid.bi_hi << 16) -
+ (prev->ip_blkid.bi_lo + (prev->ip_blkid.bi_hi << 16));
+ for (;;)
+ {
+ if (blockNumberIncr < HIGHBIT)
+ {
+ v = (uint8) blockNumberIncr;
+ *ptr = v;
+ ptr++;
+ break;
+ }
+ else
+ {
+ v = ((uint8) blockNumberIncr) | HIGHBIT;
+ *ptr = v;
+ ptr++;
+ blockNumberIncr >>= 7;
+ }
+ }
+
+ offset = iptr->ip_posid;
+ for (;;)
+ {
+ if (offset < HIGHBIT)
+ {
+ v = (uint8) offset;
+ *ptr = v;
+ ptr++;
+ break;
+ }
+ else
+ {
+ v = ((uint8) offset) | HIGHBIT;
+ *ptr = v;
+ ptr++;
+ offset >>= 7;
+ }
+ }
+
+ return ptr;
+ }
+
+ /*
+ * Calculate size of incremental varbyte encoding of item pointer.
+ */
+ static int
+ ginDataPageLeafGetItemPointerSize(ItemPointer iptr, ItemPointer prev)
+ {
+ uint32 blockNumberIncr;
+ uint16 offset;
+ int size = 0;
+
+ blockNumberIncr = iptr->ip_blkid.bi_lo + (iptr->ip_blkid.bi_hi << 16) -
+ (prev->ip_blkid.bi_lo + (prev->ip_blkid.bi_hi << 16));
+ do
+ {
+ size++;
+ blockNumberIncr >>= 7;
+ } while (blockNumberIncr > 0);
+
+ offset = iptr->ip_posid;
+ do
+ {
+ size++;
+ offset >>= 7;
+ } while (offset > 0);
+
+ return size;
+ }
+
+ /*
+ * Returns size of item pointers if leaf data page after inserting another one.
+ */
+ Size
+ ginCheckPlaceToDataPageLeaf(ItemPointer iptr, ItemPointer prev, Size size)
+ {
+ return size + ginDataPageLeafGetItemPointerSize(iptr, prev);
+ }
+
int
ginCompareItemPointers(ItemPointer a, ItemPointer b)
{
***************
*** 159,164 **** dataLocateItem(GinBtree btree, GinBtreeStack *stack)
--- 250,322 ----
}
/*
+ * Find item pointer in leaf data page. Returns true if given item pointer is
+ * found and false if it's not. Sets offset and iptrOut to last item pointer
+ * which is less than given one. Sets ptrOut ahead that item pointer.
+ */
+ static bool
+ findInLeafPage(GinBtree btree, Page page, OffsetNumber *offset,
+ ItemPointerData *iptrOut, Pointer *ptrOut)
+ {
+ Pointer ptr = GinDataPageGetData(page);
+ OffsetNumber i, maxoff, first = FirstOffsetNumber;
+ ItemPointerData iptr = {{0,0},0};
+ int cmp;
+
+ maxoff = GinPageGetOpaque(page)->maxoff;
+
+ /*
+ * At first, search index at the end of page. As the result we narrow
+ * [first, maxoff] range.
+ */
+ for (i = 0; i < GinDataLeafIndexCount; i++)
+ {
+ GinDataLeafItemIndex *index = &GinPageGetIndexes(page)[i];
+ if (index->offsetNumer == InvalidOffsetNumber)
+ break;
+
+ cmp = ginCompareItemPointers(&index->iptr, btree->items + btree->curitem);
+ if (cmp < 0)
+ {
+ ptr = GinDataPageGetData(page) + index->pageOffset;
+ first = index->offsetNumer;
+ iptr = index->iptr;
+ }
+ else
+ {
+ maxoff = index->offsetNumer - 1;
+ break;
+ }
+ }
+
+ /* Search page in [first, maxoff] range found by page index */
+ for (i = first; i <= maxoff; i++)
+ {
+ *ptrOut = ptr;
+ *iptrOut = iptr;
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+
+ cmp = ginCompareItemPointers(btree->items + btree->curitem, &iptr);
+ if (cmp == 0)
+ {
+ *offset = i;
+ return true;
+ }
+ if (cmp < 0)
+ {
+ *offset = i;
+ return false;
+ }
+ }
+
+ *ptrOut = ptr;
+ *iptrOut = iptr;
+ *offset = GinPageGetOpaque(page)->maxoff + 1;
+ return false;
+ }
+
+
+ /*
* Searches correct position for value on leaf page.
* Page should be correctly chosen.
* Returns true if value found on page.
***************
*** 167,175 **** static bool
dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
{
Page page = BufferGetPage(stack->buffer);
! OffsetNumber low,
! high;
! int result;
Assert(GinPageIsLeaf(page));
Assert(GinPageIsData(page));
--- 325,332 ----
dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
{
Page page = BufferGetPage(stack->buffer);
! ItemPointerData iptr;
! Pointer ptr;
Assert(GinPageIsLeaf(page));
Assert(GinPageIsData(page));
***************
*** 180,215 **** dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
return TRUE;
}
! low = FirstOffsetNumber;
! high = GinPageGetOpaque(page)->maxoff;
!
! if (high < low)
! {
! stack->off = FirstOffsetNumber;
! return false;
! }
!
! high++;
!
! while (high > low)
! {
! OffsetNumber mid = low + ((high - low) / 2);
!
! result = ginCompareItemPointers(btree->items + btree->curitem, (ItemPointer) GinDataPageGetItem(page, mid));
- if (result == 0)
- {
- stack->off = mid;
- return true;
- }
- else if (result > 0)
- low = mid + 1;
- else
- high = mid;
- }
-
- stack->off = high;
- return false;
}
/*
--- 337,344 ----
return TRUE;
}
! return findInLeafPage(btree, page, &stack->off, &iptr, &ptr);
}
/*
***************
*** 333,345 **** dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
if (GinPageIsLeaf(page))
{
! if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
{
! if ((btree->nitem - btree->curitem) * sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
! return true;
}
! else if (sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
return true;
}
else if (sizeof(PostingItem) <= GinDataPageGetFreeSpace(page))
return true;
--- 462,498 ----
if (GinPageIsLeaf(page))
{
! int i, n, j;
! ItemPointerData iptr = {{0,0},0};
! Size size = 0;
!
! /*
! * Calculate additional size using worst case assumption: varbyte
! * encoding from zero item pointer. Also use worst case assumption about
! * alignment.
! */
! n = GinPageGetOpaque(page)->maxoff;
!
! if (GinPageRightMost(page) && off > n)
! {
! for (j = btree->curitem; j < btree->nitem; j++)
! {
! size = ginCheckPlaceToDataPageLeaf(&btree->items[j],
! (j == btree->curitem) ? (&iptr) : &btree->items[j - 1],
! size);
! i++;
! }
! }
! else
{
! j = btree->curitem;
! size = ginCheckPlaceToDataPageLeaf(&btree->items[j], &iptr, size);
}
! size += MAXIMUM_ALIGNOF;
!
! if (GinDataPageGetFreeSpace(page) >= size)
return true;
+
}
else if (sizeof(PostingItem) <= GinDataPageGetFreeSpace(page))
return true;
***************
*** 380,391 **** static void
dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prdata)
{
Page page = BufferGetPage(buf);
- int sizeofitem = GinSizeOfDataPageItem(page);
int cnt = 0;
/* these must be static so they can be returned to caller */
static XLogRecData rdata[3];
static ginxlogInsert data;
*prdata = rdata;
Assert(GinPageIsData(page));
--- 533,544 ----
dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prdata)
{
Page page = BufferGetPage(buf);
int cnt = 0;
/* these must be static so they can be returned to caller */
static XLogRecData rdata[3];
static ginxlogInsert data;
+ static char insertData[BLCKSZ];
*prdata = rdata;
Assert(GinPageIsData(page));
***************
*** 395,401 **** dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
data.node = btree->index->rd_node;
data.blkno = BufferGetBlockNumber(buf);
data.offset = off;
! data.nitem = 1;
data.isDelete = FALSE;
data.isData = TRUE;
data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
--- 548,554 ----
data.node = btree->index->rd_node;
data.blkno = BufferGetBlockNumber(buf);
data.offset = off;
! data.nitem = 0;
data.isDelete = FALSE;
data.isData = TRUE;
data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
***************
*** 423,457 **** dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
rdata[cnt].next = &rdata[cnt + 1];
cnt++;
- rdata[cnt].buffer = InvalidBuffer;
- rdata[cnt].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem));
- rdata[cnt].len = sizeofitem;
- rdata[cnt].next = NULL;
-
if (GinPageIsLeaf(page))
{
! if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
{
! /* usually, create index... */
! uint32 savedPos = btree->curitem;
! while (btree->curitem < btree->nitem)
{
! GinDataPageAddItem(page, btree->items + btree->curitem, off);
! off++;
! btree->curitem++;
}
- data.nitem = btree->curitem - savedPos;
- rdata[cnt].len = sizeofitem * data.nitem;
}
else
{
! GinDataPageAddItem(page, btree->items + btree->curitem, off);
btree->curitem++;
}
}
! else
! GinDataPageAddItem(page, &(btree->pitem), off);
}
/*
--- 576,908 ----
rdata[cnt].next = &rdata[cnt + 1];
cnt++;
if (GinPageIsLeaf(page))
{
! int i = 0, j, max_j;
! Pointer ptr = GinDataPageGetData(page), next_ptr, insertStart;
! ItemPointerData iptr = {{0,0},0}, next_iptr;
! char pageCopy[BLCKSZ];
! int maxoff = GinPageGetOpaque(page)->maxoff;
! int copySize = 0;
!
! /*
! * We're going to prevent var-byte re-encoding of whole page.
! * Find position in page using page indexes.
! */
! findInLeafPage(btree, page, &off, &iptr, &ptr);
!
! Assert(GinDataPageFreeSpacePre(page,ptr) >= 0);
!
! if (off <= maxoff)
! {
! /*
! * Read next item pointer: we'll have to re-encode it. Copy
! * previous part of page
! */
! next_iptr = iptr;
! next_ptr = ginDataPageLeafReadItemPointer(ptr, &next_iptr);
! copySize = GinDataPageSize - GinDataPageGetFreeSpace(page) -
! (next_ptr - GinDataPageGetData(page));
! memcpy(pageCopy, next_ptr, copySize);
! }
!
! /* Check how many items we're going to add */
! if (GinPageRightMost(page) && off > maxoff)
! max_j = btree->nitem;
! else
! max_j = btree->curitem + 1;
!
! /* Place items to the page while we have enough of space */
! *((ItemPointerData *)insertData) = iptr;
! insertStart = ptr;
! i = 0;
! for (j = btree->curitem; j < max_j; j++)
! {
! Pointer ptr2;
!
! ptr2 = page + ginCheckPlaceToDataPageLeaf(&btree->items[j],
! &iptr, ptr - page);
!
! if (GinDataPageFreeSpacePre(page, ptr2) < 0)
! break;
!
! ptr = ginDataPageLeafWriteItemPointer(ptr, &btree->items[j], &iptr);
! Assert(GinDataPageFreeSpacePre(page,ptr) >= 0);
!
! iptr = btree->items[j];
! btree->curitem++;
! data.nitem++;
! i++;
! }
!
! /* Put WAL data */
! memcpy(insertData + sizeof(ItemPointerData), insertStart,
! ptr - insertStart);
! rdata[cnt].buffer = InvalidBuffer;
! rdata[cnt].data = insertData;
! rdata[cnt].len = sizeof(ItemPointerData) + (ptr - insertStart);
! rdata[cnt].next = NULL;
!
! /* Place rest of the page back */
! if (off <= maxoff)
{
! ptr = ginDataPageLeafWriteItemPointer(ptr, &next_iptr, &iptr);
! Assert(GinDataPageFreeSpacePre(page,ptr) >= 0);
! memcpy(ptr, pageCopy, copySize);
! }
! GinPageGetOpaque(page)->maxoff += i;
!
! if (GinDataPageFreeSpacePre(page,ptr) < 0)
! elog(ERROR, "Not enough of space in leaf page!");
!
! /* Update indexes in the end of page */
! updateItemIndexes(page, btree->ginstate);
! }
! else
! {
! rdata[cnt].buffer = InvalidBuffer;
! rdata[cnt].data = (char *) &(btree->pitem);
! rdata[cnt].len = sizeof(PostingItem);
! rdata[cnt].next = NULL;
! data.nitem = 1;
!
! GinDataPageAddItem(page, &(btree->pitem), off);
! }
! }
!
! /* Macro for leaf data page split: switch to right page if needed. */
! #define CHECK_SWITCH_TO_RPAGE \
! do { \
! if (ptr - GinDataPageGetData(page) > \
! totalsize / 2 && page == lpage) \
! { \
! maxLeftIptr = iptr; \
! prevIptr.ip_blkid.bi_hi = 0; \
! prevIptr.ip_blkid.bi_lo = 0; \
! prevIptr.ip_posid = 0; \
! GinPageGetOpaque(lpage)->maxoff = j; \
! page = rpage; \
! ptr = GinDataPageGetData(rpage); \
! j = FirstOffsetNumber; \
! } \
! else \
! { \
! j++; \
! } \
! } while (0)
!
!
!
! /*
! * Place tuple and split page, original buffer(lbuf) leaves untouched,
! * returns shadow page of lbuf filled new data.
! */
! static Page
! dataSplitPageLeaf(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
! XLogRecData **prdata)
! {
! OffsetNumber i, j,
! maxoff;
! Size totalsize = 0, prevTotalsize;
! Pointer ptr, copyPtr;
! Page page;
! Page lpage = PageGetTempPageCopy(BufferGetPage(lbuf));
! Page rpage = BufferGetPage(rbuf);
! Size pageSize = PageGetPageSize(lpage);
! Size maxItemSize = 0;
! ItemPointerData iptr, prevIptr, maxLeftIptr;
! int totalCount = 0;
! int maxItemIndex = btree->curitem;
!
! /* these must be static so they can be returned to caller */
! static XLogRecData rdata[3];
! static ginxlogSplit data;
! static char lpageCopy[BLCKSZ];
!
! *prdata = rdata;
! data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
! InvalidOffsetNumber : GinGetDownlink(btree->entry);
! data.updateBlkno = dataPrepareData(btree, lpage, off);
!
! maxoff = GinPageGetOpaque(lpage)->maxoff;
!
! /* Copy original data of the page */
! memcpy(lpageCopy, lpage, BLCKSZ);
!
! /* Reinitialize pages */
! GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
! GinInitPage(lpage, GinPageGetOpaque(rpage)->flags, pageSize);
!
! GinPageGetOpaque(lpage)->maxoff = 0;
! GinPageGetOpaque(rpage)->maxoff = 0;
!
! /* Calculate the whole size we're going to place */
! copyPtr = GinDataPageGetData(lpageCopy);
! iptr.ip_blkid.bi_hi = 0;
! iptr.ip_blkid.bi_lo = 0;
! iptr.ip_posid = 0;
! for (i = FirstOffsetNumber; i <= maxoff; i++)
! {
! if (i == off)
! {
! prevIptr = iptr;
! iptr = btree->items[maxItemIndex];
!
! prevTotalsize = totalsize;
! totalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize);
!
! maxItemIndex++;
! totalCount++;
! maxItemSize = Max(maxItemSize, totalsize - prevTotalsize);
! }
!
! prevIptr = iptr;
! copyPtr = ginDataPageLeafReadItemPointer(copyPtr, &iptr);
!
! prevTotalsize = totalsize;
! totalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize);
!
! totalCount++;
! maxItemSize = Max(maxItemSize, totalsize - prevTotalsize);
! }
!
! if (off == maxoff + 1)
! {
! prevIptr = iptr;
! iptr = btree->items[maxItemIndex];
! if (GinPageRightMost(lpage))
! {
! Size newTotalsize;
!
! /*
! * Found how many new item pointer we're going to add using
! * worst case assumptions about odd placement and alignment.
! */
! while (maxItemIndex < btree->nitem &&
! (newTotalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize)) <
! 2 * GinDataPageSize - 2 * maxItemSize - 2 * MAXIMUM_ALIGNOF
! )
{
! maxItemIndex++;
! totalCount++;
! maxItemSize = Max(maxItemSize, newTotalsize - totalsize);
! totalsize = newTotalsize;
!
! prevIptr = iptr;
! if (maxItemIndex < btree->nitem)
! iptr = btree->items[maxItemIndex];
}
}
else
{
! prevTotalsize = totalsize;
! totalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize);
! maxItemIndex++;
!
! totalCount++;
! maxItemSize = Max(maxItemSize, totalsize - prevTotalsize);
! }
! }
!
! /*
! * Place item pointers with additional information to the pages using
! * previous calculations. XXX: what does this do now that I removed the
! * additional information stuff from the patch?
! */
! ptr = GinDataPageGetData(lpage);
! page = lpage;
! j = FirstOffsetNumber;
! iptr.ip_blkid.bi_hi = 0;
! iptr.ip_blkid.bi_lo = 0;
! iptr.ip_posid = 0;
! prevIptr = iptr;
! copyPtr = GinDataPageGetData(lpageCopy);
! for (i = FirstOffsetNumber; i <= maxoff; i++)
! {
! if (i == off)
! {
! while (btree->curitem < maxItemIndex)
! {
! iptr = btree->items[btree->curitem];
!
! ptr = ginDataPageLeafWriteItemPointer(ptr, &iptr, &prevIptr);
! Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
!
! btree->curitem++;
! prevIptr = iptr;
!
! CHECK_SWITCH_TO_RPAGE;
! }
! }
!
! copyPtr = ginDataPageLeafReadItemPointer(copyPtr, &iptr);
!
! ptr = ginDataPageLeafWriteItemPointer(ptr, &iptr, &prevIptr);
! Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
!
! prevIptr = iptr;
!
! CHECK_SWITCH_TO_RPAGE;
! }
!
! if (off == maxoff + 1)
! {
! while (btree->curitem < maxItemIndex)
! {
! iptr = btree->items[btree->curitem];
!
! ptr = ginDataPageLeafWriteItemPointer(ptr, &iptr, &prevIptr);
! Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
!
btree->curitem++;
+
+ prevIptr = iptr;
+
+ CHECK_SWITCH_TO_RPAGE;
}
}
!
! GinPageGetOpaque(rpage)->maxoff = j - 1;
!
! PostingItemSetBlockNumber(&(btree->pitem), BufferGetBlockNumber(lbuf));
! btree->pitem.key = maxLeftIptr;
! btree->rightblkno = BufferGetBlockNumber(rbuf);
!
! *GinDataPageGetRightBound(rpage) = *GinDataPageGetRightBound(lpage);
! *GinDataPageGetRightBound(lpage) = maxLeftIptr;
!
! /* Fill indexes at the end of pages */
! updateItemIndexes(lpage, btree->ginstate);
! updateItemIndexes(rpage, btree->ginstate);
!
! data.node = btree->index->rd_node;
! data.rootBlkno = InvalidBlockNumber;
! data.lblkno = BufferGetBlockNumber(lbuf);
! data.rblkno = BufferGetBlockNumber(rbuf);
! data.separator = GinPageGetOpaque(lpage)->maxoff;
! data.nitem = GinPageGetOpaque(lpage)->maxoff + GinPageGetOpaque(rpage)->maxoff;
! data.isData = TRUE;
! data.isLeaf = TRUE;
! data.isRootSplit = FALSE;
! data.rightbound = *GinDataPageGetRightBound(rpage);
!
! rdata[0].buffer = InvalidBuffer;
! rdata[0].data = (char *) &data;
! rdata[0].len = sizeof(ginxlogSplit);
! rdata[0].next = &rdata[1];
!
! rdata[1].buffer = InvalidBuffer;
! rdata[1].data = GinDataPageGetData(lpage);
! rdata[1].len = GinDataPageSize - GinDataPageGetFreeSpace(lpage);
! rdata[1].next = &rdata[2];
!
! rdata[2].buffer = InvalidBuffer;
! rdata[2].data = GinDataPageGetData(rpage);
! rdata[2].len = GinDataPageSize - GinDataPageGetFreeSpace(rpage);
! rdata[2].next = NULL;
!
! return lpage;
}
/*
***************
*** 461,467 **** dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
* left page
*/
static Page
! dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRecData **prdata)
{
char *ptr;
OffsetNumber separator;
--- 912,919 ----
* left page
*/
static Page
! dataSplitPageInternal(GinBtree btree, Buffer lbuf, Buffer rbuf,
! OffsetNumber off, XLogRecData **prdata)
{
char *ptr;
OffsetNumber separator;
***************
*** 563,569 **** dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
data.separator = separator;
data.nitem = maxoff;
data.isData = TRUE;
! data.isLeaf = GinPageIsLeaf(lpage) ? TRUE : FALSE;
data.isRootSplit = FALSE;
data.rightbound = oldbound;
--- 1015,1021 ----
data.separator = separator;
data.nitem = maxoff;
data.isData = TRUE;
! data.isLeaf = FALSE;
data.isRootSplit = FALSE;
data.rightbound = oldbound;
***************
*** 581,586 **** dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
--- 1033,1092 ----
}
/*
+ * Split page of posting tree. Calls relevant function of internal of leaf page
+ * because they are handled very different.
+ */
+ static Page
+ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+ XLogRecData **prdata)
+ {
+ if (GinPageIsLeaf(BufferGetPage(lbuf)))
+ return dataSplitPageLeaf(btree, lbuf, rbuf, off, prdata);
+ else
+ return dataSplitPageInternal(btree, lbuf, rbuf, off, prdata);
+ }
+
+ /*
+ * Updates indexes in the end of leaf page which are used for faster search.
+ * Also updates freespace opaque field of page. Returns last item pointer of
+ * page.
+ */
+ ItemPointerData
+ updateItemIndexes(Page page, GinState *ginstate)
+ {
+ Pointer ptr;
+ ItemPointerData iptr;
+ int j = 0, maxoff, i;
+
+ /* Iterate over page */
+
+ maxoff = GinPageGetOpaque(page)->maxoff;
+ ptr = GinDataPageGetData(page);
+ iptr.ip_blkid.bi_lo = 0;
+ iptr.ip_blkid.bi_hi = 0;
+ iptr.ip_posid = 0;
+
+ for (i = FirstOffsetNumber; i <= maxoff; i++)
+ {
+ /* Place next page index entry if it's time to */
+ if (i * (GinDataLeafIndexCount + 1) > (j + 1) * maxoff)
+ {
+ GinPageGetIndexes(page)[j].iptr = iptr;
+ GinPageGetIndexes(page)[j].offsetNumer = i;
+ GinPageGetIndexes(page)[j].pageOffset = ptr - GinDataPageGetData(page);
+ j++;
+ }
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ }
+ /* Fill rest of page indexes with InvalidOffsetNumber if any */
+ for (; j < GinDataLeafIndexCount; j++)
+ {
+ GinPageGetIndexes(page)[j].offsetNumer = InvalidOffsetNumber;
+ }
+ return iptr;
+ }
+
+ /*
* Fills new root by right bound values from child.
* Also called from ginxlog, should not use btree
*/
*** a/src/backend/access/gin/ginentrypage.c
--- b/src/backend/access/gin/ginentrypage.c
***************
*** 18,162 ****
#include "utils/rel.h"
/*
! * Form a tuple for entry tree.
! *
! * If the tuple would be too big to be stored, function throws a suitable
! * error if errorTooBig is TRUE, or returns NULL if errorTooBig is FALSE.
! *
! * See src/backend/access/gin/README for a description of the index tuple
! * format that is being built here. We build on the assumption that we
! * are making a leaf-level key entry containing a posting list of nipd items.
! * If the caller is actually trying to make a posting-tree entry, non-leaf
! * entry, or pending-list entry, it should pass nipd = 0 and then overwrite
! * the t_tid fields as necessary. In any case, ipd can be NULL to skip
! * copying any itempointers into the posting list; the caller is responsible
! * for filling the posting list afterwards, if ipd = NULL and nipd > 0.
*/
! IndexTuple
! GinFormTuple(GinState *ginstate,
! OffsetNumber attnum, Datum key, GinNullCategory category,
! ItemPointerData *ipd, uint32 nipd,
! bool errorTooBig)
{
! Datum datums[2];
! bool isnull[2];
! IndexTuple itup;
! uint32 newsize;
!
! /* Build the basic tuple: optional column number, plus key datum */
! if (ginstate->oneCol)
! {
! datums[0] = key;
! isnull[0] = (category != GIN_CAT_NORM_KEY);
! }
! else
! {
! datums[0] = UInt16GetDatum(attnum);
! isnull[0] = false;
! datums[1] = key;
! isnull[1] = (category != GIN_CAT_NORM_KEY);
! }
!
! itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
!
! /*
! * Determine and store offset to the posting list, making sure there is
! * room for the category byte if needed.
! *
! * Note: because index_form_tuple MAXALIGNs the tuple size, there may well
! * be some wasted pad space. Is it worth recomputing the data length to
! * prevent that? That would also allow us to Assert that the real data
! * doesn't overlap the GinNullCategory byte, which this code currently
! * takes on faith.
! */
! newsize = IndexTupleSize(itup);
!
! if (IndexTupleHasNulls(itup))
! {
! uint32 minsize;
!
! Assert(category != GIN_CAT_NORM_KEY);
! minsize = GinCategoryOffset(itup, ginstate) + sizeof(GinNullCategory);
! newsize = Max(newsize, minsize);
! }
!
! newsize = SHORTALIGN(newsize);
!
! GinSetPostingOffset(itup, newsize);
!
! GinSetNPosting(itup, nipd);
!
! /*
! * Add space needed for posting list, if any. Then check that the tuple
! * won't be too big to store.
! */
! newsize += sizeof(ItemPointerData) * nipd;
! newsize = MAXALIGN(newsize);
! if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
! {
! if (errorTooBig)
! ereport(ERROR,
! (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
! errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
! (unsigned long) newsize,
! (unsigned long) Min(INDEX_SIZE_MASK,
! GinMaxItemSize),
! RelationGetRelationName(ginstate->index))));
! pfree(itup);
! return NULL;
! }
! /*
! * Resize tuple if needed
! */
! if (newsize != IndexTupleSize(itup))
! {
! itup = repalloc(itup, newsize);
! /* set new size in tuple header */
! itup->t_info &= ~INDEX_SIZE_MASK;
! itup->t_info |= newsize;
! }
!
! /*
! * Insert category byte, if needed
! */
! if (category != GIN_CAT_NORM_KEY)
{
! Assert(IndexTupleHasNulls(itup));
! GinSetNullCategory(itup, ginstate, category);
}
-
- /*
- * Copy in the posting list, if provided
- */
- if (ipd)
- memcpy(GinGetPosting(itup), ipd, sizeof(ItemPointerData) * nipd);
-
- return itup;
- }
-
- /*
- * Sometimes we reduce the number of posting list items in a tuple after
- * having built it with GinFormTuple. This function adjusts the size
- * fields to match.
- */
- void
- GinShortenTuple(IndexTuple itup, uint32 nipd)
- {
- uint32 newsize;
-
- Assert(nipd <= GinGetNPosting(itup));
-
- newsize = GinGetPostingOffset(itup) + sizeof(ItemPointerData) * nipd;
- newsize = MAXALIGN(newsize);
-
- Assert(newsize <= (itup->t_info & INDEX_SIZE_MASK));
-
- itup->t_info &= ~INDEX_SIZE_MASK;
- itup->t_info |= newsize;
-
- GinSetNPosting(itup, nipd);
}
/*
--- 18,41 ----
#include "utils/rel.h"
/*
! * Read item pointers from leaf data page. Information is stored in the same
! * manner as in leaf data pages.
*/
! void
! ginReadTuple(GinState *ginstate, OffsetNumber attnum,
! IndexTuple itup, ItemPointerData *ipd)
{
! Pointer ptr;
! int nipd = GinGetNPosting(itup), i;
! ItemPointerData ip = {{0,0},0};
! ptr = GinGetPosting(itup);
! for (i = 0; i < nipd; i++)
{
! ptr = ginDataPageLeafReadItemPointer(ptr, &ip);
! ipd[i] = ip;
}
}
/*
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 23,28 ****
--- 23,29 ----
#include "miscadmin.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+ #include "access/htup_details.h"
#define GIN_PAGE_FREESIZE \
***************
*** 437,442 **** ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
--- 438,526 ----
ginInsertCleanup(ginstate, false, NULL);
}
+ static IndexTuple
+ GinFastFormTuple(GinState *ginstate,
+ OffsetNumber attnum, Datum key, GinNullCategory category)
+ {
+ Datum datums[2];
+ bool isnull[2];
+ IndexTuple itup;
+ uint32 newsize;
+
+ /* Build the basic tuple: optional column number, plus key datum */
+
+ if (ginstate->oneCol)
+ {
+ datums[0] = key;
+ isnull[0] = (category != GIN_CAT_NORM_KEY);
+ }
+ else
+ {
+ datums[0] = UInt16GetDatum(attnum);
+ isnull[0] = false;
+ datums[1] = key;
+ isnull[1] = (category != GIN_CAT_NORM_KEY);
+ }
+
+ itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
+
+ /*
+ * Place category to the last byte of index tuple extending it's size if
+ * needed
+ */
+ newsize = IndexTupleSize(itup);
+
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ uint32 minsize;
+
+ Assert(IndexTupleHasNulls(itup));
+ minsize = IndexInfoFindDataOffset(itup->t_info) +
+ heap_compute_data_size(ginstate->tupdesc[attnum - 1], datums, isnull) +
+ sizeof(GinNullCategory);
+ newsize = Max(newsize, minsize);
+ }
+
+ newsize = MAXALIGN(newsize);
+
+ if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
+ (unsigned long) newsize,
+ (unsigned long) Min(INDEX_SIZE_MASK,
+ GinMaxItemSize),
+ RelationGetRelationName(ginstate->index))));
+ pfree(itup);
+ return NULL;
+ }
+
+ /*
+ * Resize tuple if needed
+ */
+ if (newsize != IndexTupleSize(itup))
+ {
+ itup = repalloc(itup, newsize);
+
+ /* set new size in tuple header */
+ itup->t_info &= ~INDEX_SIZE_MASK;
+ itup->t_info |= newsize;
+ }
+
+ /*
+ * Insert category byte, if needed
+ */
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ GinSetNullCategory(itup, ginstate, category);
+ }
+
+ return itup;
+ }
+
+
/*
* Create temporary index tuples for a single indexable item (one index column
* for the heap tuple specified by ht_ctid), and append them to the array
***************
*** 486,493 **** ginHeapTupleFastCollect(GinState *ginstate,
{
IndexTuple itup;
! itup = GinFormTuple(ginstate, attnum, entries[i], categories[i],
! NULL, 0, true);
itup->t_tid = *ht_ctid;
collector->tuples[collector->ntuples++] = itup;
collector->sumsize += IndexTupleSize(itup);
--- 570,576 ----
{
IndexTuple itup;
! itup = GinFastFormTuple(ginstate, attnum, entries[i], categories[i]);
itup->t_tid = *ht_ctid;
collector->tuples[collector->ntuples++] = itup;
collector->sumsize += IndexTupleSize(itup);
*** a/src/backend/access/gin/ginget.c
--- b/src/backend/access/gin/ginget.c
***************
*** 73,90 **** findItemInPostingPage(Page page, ItemPointer item, OffsetNumber *off)
{
OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
int res;
if (GinPageGetOpaque(page)->flags & GIN_DELETED)
/* page was deleted by concurrent vacuum */
return false;
/*
* scan page to find equal or first greater value
*/
for (*off = FirstOffsetNumber; *off <= maxoff; (*off)++)
{
! res = ginCompareItemPointers(item, (ItemPointer) GinDataPageGetItem(page, *off));
if (res <= 0)
return true;
}
--- 73,94 ----
{
OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
int res;
+ Pointer ptr;
+ ItemPointerData iptr = {{0, 0}, 0};
if (GinPageGetOpaque(page)->flags & GIN_DELETED)
/* page was deleted by concurrent vacuum */
return false;
+ ptr = GinDataPageGetData(page);
/*
* scan page to find equal or first greater value
*/
for (*off = FirstOffsetNumber; *off <= maxoff; (*off)++)
{
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ res = ginCompareItemPointers(item, &iptr);
if (res <= 0)
return true;
}
***************
*** 148,162 **** scanPostingTree(Relation index, GinScanEntry scanEntry,
*/
for (;;)
{
page = BufferGetPage(buffer);
if ((GinPageGetOpaque(page)->flags & GIN_DELETED) == 0 &&
! GinPageGetOpaque(page)->maxoff >= FirstOffsetNumber)
{
! tbm_add_tuples(scanEntry->matchBitmap,
! (ItemPointer) GinDataPageGetItem(page, FirstOffsetNumber),
! GinPageGetOpaque(page)->maxoff, false);
! scanEntry->predictNumberResult += GinPageGetOpaque(page)->maxoff;
}
if (GinPageRightMost(page))
--- 152,176 ----
*/
for (;;)
{
+ OffsetNumber maxoff, i;
+
page = BufferGetPage(buffer);
+ maxoff = GinPageGetOpaque(page)->maxoff;
if ((GinPageGetOpaque(page)->flags & GIN_DELETED) == 0 &&
! maxoff >= FirstOffsetNumber)
{
! ItemPointerData iptr = {{0, 0}, 0};
! Pointer ptr;
!
! ptr = GinDataPageGetData(page);
! for (i = FirstOffsetNumber; i <= maxoff; i++)
! {
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! tbm_add_tuples(scanEntry->matchBitmap, &iptr, 1, false);
! }
!
! scanEntry->predictNumberResult += maxoff;
}
if (GinPageRightMost(page))
***************
*** 344,351 **** collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
}
else
{
tbm_add_tuples(scanEntry->matchBitmap,
! GinGetPosting(itup), GinGetNPosting(itup), false);
scanEntry->predictNumberResult += GinGetNPosting(itup);
}
--- 358,369 ----
}
else
{
+ ItemPointerData *ipd = (ItemPointerData *)palloc(
+ sizeof(ItemPointerData) * GinGetNPosting(itup));
+ ginReadTuple(btree->ginstate, scanEntry->attnum, itup, ipd);
+
tbm_add_tuples(scanEntry->matchBitmap,
! ipd, GinGetNPosting(itup), false);
scanEntry->predictNumberResult += GinGetNPosting(itup);
}
***************
*** 438,443 **** restartScanEntry:
--- 456,464 ----
BlockNumber rootPostingTree = GinGetPostingTree(itup);
GinPostingTreeScan *gdi;
Page page;
+ OffsetNumber maxoff, i;
+ Pointer ptr;
+ ItemPointerData iptr = {{0,0},0};
/*
* We should unlock entry page before touching posting tree to
***************
*** 465,474 **** restartScanEntry:
/*
* Keep page content in memory to prevent durable page locking
*/
! entry->list = (ItemPointerData *) palloc(BLCKSZ);
! entry->nlist = GinPageGetOpaque(page)->maxoff;
! memcpy(entry->list, GinDataPageGetItem(page, FirstOffsetNumber),
! GinPageGetOpaque(page)->maxoff * sizeof(ItemPointerData));
LockBuffer(entry->buffer, GIN_UNLOCK);
freeGinBtreeStack(gdi->stack);
--- 486,502 ----
/*
* Keep page content in memory to prevent durable page locking
*/
! entry->list = (ItemPointerData *) palloc(BLCKSZ * sizeof(ItemPointerData));
! maxoff = GinPageGetOpaque(page)->maxoff;
! entry->nlist = maxoff;
!
! ptr = GinDataPageGetData(page);
!
! for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
! {
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! entry->list[i - FirstOffsetNumber] = iptr;
! }
LockBuffer(entry->buffer, GIN_UNLOCK);
freeGinBtreeStack(gdi->stack);
***************
*** 478,485 **** restartScanEntry:
else if (GinGetNPosting(itup) > 0)
{
entry->nlist = GinGetNPosting(itup);
entry->list = (ItemPointerData *) palloc(sizeof(ItemPointerData) * entry->nlist);
! memcpy(entry->list, GinGetPosting(itup), sizeof(ItemPointerData) * entry->nlist);
entry->isFinished = FALSE;
}
}
--- 506,516 ----
else if (GinGetNPosting(itup) > 0)
{
entry->nlist = GinGetNPosting(itup);
+ entry->predictNumberResult = entry->nlist;
entry->list = (ItemPointerData *) palloc(sizeof(ItemPointerData) * entry->nlist);
!
! ginReadTuple(ginstate, entry->attnum, itup, entry->list);
!
entry->isFinished = FALSE;
}
}
***************
*** 583,594 **** entryGetNextItem(GinState *ginstate, GinScanEntry entry)
if (!ItemPointerIsValid(&entry->curItem) ||
findItemInPostingPage(page, &entry->curItem, &entry->offset))
{
/*
* Found position equal to or greater than stored
*/
! entry->nlist = GinPageGetOpaque(page)->maxoff;
! memcpy(entry->list, GinDataPageGetItem(page, FirstOffsetNumber),
! GinPageGetOpaque(page)->maxoff * sizeof(ItemPointerData));
LockBuffer(entry->buffer, GIN_UNLOCK);
--- 614,636 ----
if (!ItemPointerIsValid(&entry->curItem) ||
findItemInPostingPage(page, &entry->curItem, &entry->offset))
{
+ OffsetNumber maxoff, i;
+ Pointer ptr;
+ ItemPointerData iptr = {{0,0},0};
+
/*
* Found position equal to or greater than stored
*/
! maxoff = GinPageGetOpaque(page)->maxoff;
! entry->nlist = maxoff;
!
! ptr = GinDataPageGetData(page);
!
! for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
! {
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! entry->list[i - FirstOffsetNumber] = iptr;
! }
LockBuffer(entry->buffer, GIN_UNLOCK);
*** a/src/backend/access/gin/gininsert.c
--- b/src/backend/access/gin/gininsert.c
***************
*** 42,55 **** typedef struct
* items[] must be in sorted order with no duplicates.
*/
static BlockNumber
! createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
{
BlockNumber blkno;
! Buffer buffer = GinNewBuffer(index);
Page page;
/* Assert that the items[] array will fit on one page */
- Assert(nitems <= GinMaxLeafDataItems);
START_CRIT_SECTION();
--- 42,57 ----
* items[] must be in sorted order with no duplicates.
*/
static BlockNumber
! createPostingTree(GinState *ginstate, ItemPointerData *items, uint32 nitems)
{
BlockNumber blkno;
! Buffer buffer = GinNewBuffer(ginstate->index);
Page page;
+ int i;
+ Pointer ptr;
+ ItemPointerData prev_iptr = {{0,0},0};
/* Assert that the items[] array will fit on one page */
START_CRIT_SECTION();
***************
*** 57,74 **** createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
page = BufferGetPage(buffer);
blkno = BufferGetBlockNumber(buffer);
- memcpy(GinDataPageGetData(page), items, sizeof(ItemPointerData) * nitems);
GinPageGetOpaque(page)->maxoff = nitems;
MarkBufferDirty(buffer);
! if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
ginxlogCreatePostingTree data;
! data.node = index->rd_node;
data.blkno = blkno;
data.nitem = nitems;
--- 59,84 ----
page = BufferGetPage(buffer);
blkno = BufferGetBlockNumber(buffer);
GinPageGetOpaque(page)->maxoff = nitems;
+ ptr = GinDataPageGetData(page);
+ for (i = 0; i < nitems; i++)
+ {
+ if (i > 0)
+ prev_iptr = items[i - 1];
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &items[i], &prev_iptr);
+ }
+ Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
+ updateItemIndexes(page, ginstate);
MarkBufferDirty(buffer);
! if (RelationNeedsWAL(ginstate->index))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
ginxlogCreatePostingTree data;
! data.node = ginstate->index->rd_node;
data.blkno = blkno;
data.nitem = nitems;
***************
*** 78,85 **** createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
rdata[0].next = &rdata[1];
rdata[1].buffer = InvalidBuffer;
! rdata[1].data = (char *) items;
! rdata[1].len = sizeof(ItemPointerData) * nitems;
rdata[1].next = NULL;
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_PTREE, rdata);
--- 88,95 ----
rdata[0].next = &rdata[1];
rdata[1].buffer = InvalidBuffer;
! rdata[1].data = GinDataPageGetData(page);
! rdata[1].len = GinDataPageSize - GinDataPageGetFreeSpace(page);
rdata[1].next = NULL;
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_PTREE, rdata);
***************
*** 93,98 **** createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
--- 103,239 ----
return blkno;
}
+ /*
+ * Form a tuple for entry tree.
+ *
+ * If the tuple would be too big to be stored, function throws a suitable
+ * error if errorTooBig is TRUE, or returns NULL if errorTooBig is FALSE.
+ *
+ * See src/backend/access/gin/README for a description of the index tuple
+ * format that is being built here. We build on the assumption that we
+ * are making a leaf-level key entry containing a posting list of nipd items.
+ * If the caller is actually trying to make a posting-tree entry, non-leaf
+ * entry, or pending-list entry, it should pass nipd = 0 and then overwrite
+ * the t_tid fields as necessary. In any case, ipd can be NULL to skip
+ * copying any itempointers into the posting list; the caller is responsible
+ * for filling the posting list afterwards, if ipd = NULL and nipd > 0.
+ */
+ static IndexTuple
+ GinFormTuple(GinState *ginstate,
+ OffsetNumber attnum, Datum key, GinNullCategory category,
+ ItemPointerData *ipd,
+ uint32 nipd,
+ bool errorTooBig)
+ {
+ Datum datums[3];
+ bool isnull[3];
+ IndexTuple itup;
+ uint32 newsize;
+ int i;
+ ItemPointerData nullItemPointer = {{0,0},0};
+
+ /* Build the basic tuple: optional column number, plus key datum */
+ if (ginstate->oneCol)
+ {
+ datums[0] = key;
+ isnull[0] = (category != GIN_CAT_NORM_KEY);
+ isnull[1] = true;
+ }
+ else
+ {
+ datums[0] = UInt16GetDatum(attnum);
+ isnull[0] = false;
+ datums[1] = key;
+ isnull[1] = (category != GIN_CAT_NORM_KEY);
+ isnull[2] = true;
+ }
+
+ itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
+
+ /*
+ * Determine and store offset to the posting list, making sure there is
+ * room for the category byte if needed.
+ *
+ * Note: because index_form_tuple MAXALIGNs the tuple size, there may well
+ * be some wasted pad space. Is it worth recomputing the data length to
+ * prevent that? That would also allow us to Assert that the real data
+ * doesn't overlap the GinNullCategory byte, which this code currently
+ * takes on faith.
+ */
+ newsize = IndexTupleSize(itup);
+
+ GinSetPostingOffset(itup, newsize);
+
+ GinSetNPosting(itup, nipd);
+
+ /*
+ * Add space needed for posting list, if any. Then check that the tuple
+ * won't be too big to store.
+ */
+
+ if (nipd > 0)
+ {
+ newsize = ginCheckPlaceToDataPageLeaf(&ipd[0], &nullItemPointer, newsize);
+ for (i = 1; i < nipd; i++)
+ {
+ newsize = ginCheckPlaceToDataPageLeaf(&ipd[i], &ipd[i - 1], newsize);
+ }
+ }
+
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ newsize = newsize + sizeof(GinNullCategory);
+ }
+ newsize = MAXALIGN(newsize);
+
+ if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
+ {
+ if (errorTooBig)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
+ (unsigned long) newsize,
+ (unsigned long) Min(INDEX_SIZE_MASK,
+ GinMaxItemSize),
+ RelationGetRelationName(ginstate->index))));
+ pfree(itup);
+ return NULL;
+ }
+
+ /*
+ * Resize tuple if needed
+ */
+ if (newsize != IndexTupleSize(itup))
+ {
+ itup = repalloc(itup, newsize);
+
+ /* set new size in tuple header */
+ itup->t_info &= ~INDEX_SIZE_MASK;
+ itup->t_info |= newsize;
+ }
+
+ /*
+ * Copy in the posting list, if provided
+ */
+ if (nipd > 0)
+ {
+ char *ptr = GinGetPosting(itup);
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &ipd[0], &nullItemPointer);
+ for (i = 1; i < nipd; i++)
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &ipd[i], &ipd[i-1]);
+ }
+
+ /*
+ * Insert category byte, if needed
+ */
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ GinSetNullCategory(itup, ginstate, category);
+ }
+ return itup;
+ }
/*
* Adds array of item pointers to tuple's posting list, or
***************
*** 111,141 **** addItemPointersToLeafTuple(GinState *ginstate,
Datum key;
GinNullCategory category;
IndexTuple res;
Assert(!GinIsPostingTree(old));
attnum = gintuple_get_attrnum(ginstate, old);
key = gintuple_get_key(ginstate, old, &category);
/* try to build tuple with room for all the items */
res = GinFormTuple(ginstate, attnum, key, category,
! NULL, nitem + GinGetNPosting(old),
! false);
!
! if (res)
! {
! /* good, small enough */
! uint32 newnitem;
!
! /* fill in the posting list with union of old and new TIDs */
! newnitem = ginMergeItemPointers(GinGetPosting(res),
! GinGetPosting(old),
! GinGetNPosting(old),
! items, nitem);
! /* merge might have eliminated some duplicate items */
! GinShortenTuple(res, newnitem);
! }
! else
{
/* posting list would be too big, convert to posting tree */
BlockNumber postingRoot;
--- 252,281 ----
Datum key;
GinNullCategory category;
IndexTuple res;
+ ItemPointerData *newItems, *oldItems;
+ int oldNPosting, newNPosting;
Assert(!GinIsPostingTree(old));
attnum = gintuple_get_attrnum(ginstate, old);
key = gintuple_get_key(ginstate, old, &category);
+ oldNPosting = GinGetNPosting(old);
+ oldItems = (ItemPointerData *) palloc(sizeof(ItemPointerData) * oldNPosting);
+
+ newNPosting = oldNPosting + nitem;
+ newItems = (ItemPointerData *) palloc(sizeof(ItemPointerData) * newNPosting);
+
+ ginReadTuple(ginstate, attnum, old, oldItems);
+
+ newNPosting = ginMergeItemPointers(newItems,
+ items, nitem,
+ oldItems, oldNPosting);
+
/* try to build tuple with room for all the items */
res = GinFormTuple(ginstate, attnum, key, category,
! newItems, newNPosting, false);
! if (!res)
{
/* posting list would be too big, convert to posting tree */
BlockNumber postingRoot;
***************
*** 146,154 **** addItemPointersToLeafTuple(GinState *ginstate,
* surely small enough to fit on one posting-tree page, and should
* already be in order with no duplicates.
*/
! postingRoot = createPostingTree(ginstate->index,
! GinGetPosting(old),
! GinGetNPosting(old));
/* During index build, count the newly-added data page */
if (buildStats)
--- 286,294 ----
* surely small enough to fit on one posting-tree page, and should
* already be in order with no duplicates.
*/
! postingRoot = createPostingTree(ginstate,
! oldItems,
! oldNPosting);
/* During index build, count the newly-added data page */
if (buildStats)
***************
*** 194,199 **** buildFreshLeafTuple(GinState *ginstate,
--- 334,353 ----
{
/* posting list would be too big, build posting tree */
BlockNumber postingRoot;
+ ItemPointerData prevIptr = {{0,0},0};
+ Size size = 0;
+ int itemsCount = 0;
+
+ do
+ {
+ size = ginCheckPlaceToDataPageLeaf(&items[itemsCount], &prevIptr,
+ size);
+ prevIptr = items[itemsCount];
+ itemsCount++;
+ }
+ while (itemsCount < nitem && size < GinDataPageSize);
+ itemsCount--;
+
/*
* Build posting-tree-only result tuple. We do this first so as to
***************
*** 205,220 **** buildFreshLeafTuple(GinState *ginstate,
* Initialize posting tree with as many TIDs as will fit on the first
* page.
*/
! postingRoot = createPostingTree(ginstate->index,
items,
! Min(nitem, GinMaxLeafDataItems));
/* During index build, count the newly-added data page */
if (buildStats)
buildStats->nDataPages++;
/* Add any remaining TIDs to the posting tree */
! if (nitem > GinMaxLeafDataItems)
{
GinPostingTreeScan *gdi;
--- 359,374 ----
* Initialize posting tree with as many TIDs as will fit on the first
* page.
*/
! postingRoot = createPostingTree(ginstate,
items,
! itemsCount);
/* During index build, count the newly-added data page */
if (buildStats)
buildStats->nDataPages++;
/* Add any remaining TIDs to the posting tree */
! if (nitem > itemsCount)
{
GinPostingTreeScan *gdi;
***************
*** 222,229 **** buildFreshLeafTuple(GinState *ginstate,
gdi->btree.isBuild = (buildStats != NULL);
ginInsertItemPointers(gdi,
! items + GinMaxLeafDataItems,
! nitem - GinMaxLeafDataItems,
buildStats);
pfree(gdi);
--- 376,383 ----
gdi->btree.isBuild = (buildStats != NULL);
ginInsertItemPointers(gdi,
! items + itemsCount,
! nitem - itemsCount,
buildStats);
pfree(gdi);
*** a/src/backend/access/gin/ginvacuum.c
--- b/src/backend/access/gin/ginvacuum.c
***************
*** 41,80 **** typedef struct
*/
static uint32
! ginVacuumPostingList(GinVacuumState *gvs, ItemPointerData *items, uint32 nitem, ItemPointerData **cleaned)
{
uint32 i,
j = 0;
/*
* just scan over ItemPointer array
*/
for (i = 0; i < nitem; i++)
{
! if (gvs->callback(items + i, gvs->callback_state))
{
gvs->result->tuples_removed += 1;
! if (!*cleaned)
{
! *cleaned = (ItemPointerData *) palloc(sizeof(ItemPointerData) * nitem);
if (i != 0)
! memcpy(*cleaned, items, sizeof(ItemPointerData) * i);
}
}
else
{
gvs->result->num_index_tuples += 1;
if (i != j)
! (*cleaned)[j] = items[i];
j++;
}
}
return j;
}
/*
* fills WAL record for vacuum leaf page
*/
static void
--- 41,206 ----
*/
static uint32
! ginVacuumPostingList(GinVacuumState *gvs, Pointer src, uint32 nitem, Pointer *cleaned, Size size, Size *newSize)
{
uint32 i,
j = 0;
+ ItemPointerData iptr = {{0,0},0}, prevIptr;
+ Pointer dst = NULL, prev, ptr = src;
/*
* just scan over ItemPointer array
*/
+ prevIptr = iptr;
for (i = 0; i < nitem; i++)
{
! prev = ptr;
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! if (gvs->callback(&iptr, gvs->callback_state))
{
gvs->result->tuples_removed += 1;
! if (!dst)
{
! dst = (Pointer) palloc(size);
! *cleaned = dst;
if (i != 0)
! {
! memcpy(dst, src, prev - src);
! dst += prev - src;
! }
}
}
else
{
gvs->result->num_index_tuples += 1;
if (i != j)
! dst = ginDataPageLeafWriteItemPointer(dst, &iptr, &prevIptr);
j++;
+ prevIptr = iptr;
}
}
+ if (i != j)
+ *newSize = dst - *cleaned;
return j;
}
/*
+ * Form a tuple for entry tree based on already encoded array of item pointers
+ * with additional information.
+ */
+ static IndexTuple
+ GinFormTuple(GinState *ginstate,
+ OffsetNumber attnum, Datum key, GinNullCategory category,
+ Pointer data,
+ Size dataSize,
+ uint32 nipd,
+ bool errorTooBig)
+ {
+ Datum datums[3];
+ bool isnull[3];
+ IndexTuple itup;
+ uint32 newsize;
+
+ /* Build the basic tuple: optional column number, plus key datum */
+ if (ginstate->oneCol)
+ {
+ datums[0] = key;
+ isnull[0] = (category != GIN_CAT_NORM_KEY);
+ isnull[1] = true;
+ }
+ else
+ {
+ datums[0] = UInt16GetDatum(attnum);
+ isnull[0] = false;
+ datums[1] = key;
+ isnull[1] = (category != GIN_CAT_NORM_KEY);
+ isnull[2] = true;
+ }
+
+ itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
+
+ /*
+ * Determine and store offset to the posting list, making sure there is
+ * room for the category byte if needed.
+ *
+ * Note: because index_form_tuple MAXALIGNs the tuple size, there may well
+ * be some wasted pad space. Is it worth recomputing the data length to
+ * prevent that? That would also allow us to Assert that the real data
+ * doesn't overlap the GinNullCategory byte, which this code currently
+ * takes on faith.
+ */
+ newsize = IndexTupleSize(itup);
+
+ GinSetPostingOffset(itup, newsize);
+
+ GinSetNPosting(itup, nipd);
+
+ /*
+ * Add space needed for posting list, if any. Then check that the tuple
+ * won't be too big to store.
+ */
+
+ if (nipd > 0)
+ {
+ newsize += dataSize;
+ }
+
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ newsize = newsize + sizeof(GinNullCategory);
+ }
+ newsize = MAXALIGN(newsize);
+
+ if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
+ {
+ if (errorTooBig)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
+ (unsigned long) newsize,
+ (unsigned long) Min(INDEX_SIZE_MASK,
+ GinMaxItemSize),
+ RelationGetRelationName(ginstate->index))));
+ pfree(itup);
+ return NULL;
+ }
+
+ /*
+ * Resize tuple if needed
+ */
+ if (newsize != IndexTupleSize(itup))
+ {
+ itup = repalloc(itup, newsize);
+
+ /* set new size in tuple header */
+ itup->t_info &= ~INDEX_SIZE_MASK;
+ itup->t_info |= newsize;
+ }
+
+ /*
+ * Copy in the posting list, if provided
+ */
+ if (nipd > 0)
+ {
+ char *ptr = GinGetPosting(itup);
+ memcpy(ptr, data, dataSize);
+ }
+
+ /*
+ * Insert category byte, if needed
+ */
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ GinSetNullCategory(itup, ginstate, category);
+ }
+ return itup;
+ }
+
+ /*
* fills WAL record for vacuum leaf page
*/
static void
***************
*** 101,107 **** xlogVacuumPage(Relation index, Buffer buffer)
backup = GinDataPageGetData(page);
data.nitem = GinPageGetOpaque(page)->maxoff;
if (data.nitem)
! len = MAXALIGN(sizeof(ItemPointerData) * data.nitem);
}
else
{
--- 227,233 ----
backup = GinDataPageGetData(page);
data.nitem = GinPageGetOpaque(page)->maxoff;
if (data.nitem)
! len = MAXALIGN(GinDataPageSize - GinDataPageGetFreeSpace(page));
}
else
{
***************
*** 178,187 **** ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
{
OffsetNumber newMaxOff,
oldMaxOff = GinPageGetOpaque(page)->maxoff;
! ItemPointerData *cleaned = NULL;
newMaxOff = ginVacuumPostingList(gvs,
! (ItemPointer) GinDataPageGetData(page), oldMaxOff, &cleaned);
/* saves changes about deleted tuple ... */
if (oldMaxOff != newMaxOff)
--- 304,315 ----
{
OffsetNumber newMaxOff,
oldMaxOff = GinPageGetOpaque(page)->maxoff;
! Pointer cleaned = NULL;
! Size newSize;
newMaxOff = ginVacuumPostingList(gvs,
! GinDataPageGetData(page), oldMaxOff, &cleaned,
! GinDataPageSize - GinDataPageGetFreeSpace(page), &newSize);
/* saves changes about deleted tuple ... */
if (oldMaxOff != newMaxOff)
***************
*** 189,197 **** ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
START_CRIT_SECTION();
if (newMaxOff > 0)
! memcpy(GinDataPageGetData(page), cleaned, sizeof(ItemPointerData) * newMaxOff);
pfree(cleaned);
GinPageGetOpaque(page)->maxoff = newMaxOff;
MarkBufferDirty(buffer);
xlogVacuumPage(gvs->index, buffer);
--- 317,327 ----
START_CRIT_SECTION();
if (newMaxOff > 0)
! memcpy(GinDataPageGetData(page), cleaned, newSize);
!
pfree(cleaned);
GinPageGetOpaque(page)->maxoff = newMaxOff;
+ updateItemIndexes(page, &gvs->ginstate);
MarkBufferDirty(buffer);
xlogVacuumPage(gvs->index, buffer);
***************
*** 520,527 **** ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint3
* if we already create temporary page, we will make changes in
* place
*/
! ItemPointerData *cleaned = (tmppage == origpage) ? NULL : GinGetPosting(itup);
! uint32 newN = ginVacuumPostingList(gvs, GinGetPosting(itup), GinGetNPosting(itup), &cleaned);
if (GinGetNPosting(itup) != newN)
{
--- 650,662 ----
* if we already create temporary page, we will make changes in
* place
*/
! Size cleanedSize;
! Pointer cleaned = NULL;
! uint32 newN =
! ginVacuumPostingList(gvs,
! GinGetPosting(itup), GinGetNPosting(itup), &cleaned,
! IndexTupleSize(itup) - GinGetPostingOffset(itup),
! &cleanedSize);
if (GinGetNPosting(itup) != newN)
{
***************
*** 542,564 **** ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint3
*/
tmppage = PageGetTempPageCopy(origpage);
- if (newN > 0)
- {
- Size pos = ((char *) GinGetPosting(itup)) - ((char *) origpage);
-
- memcpy(tmppage + pos, cleaned, sizeof(ItemPointerData) * newN);
- }
-
- pfree(cleaned);
-
/* set itup pointer to new page */
itup = (IndexTuple) PageGetItem(tmppage, PageGetItemId(tmppage, i));
}
attnum = gintuple_get_attrnum(&gvs->ginstate, itup);
key = gintuple_get_key(&gvs->ginstate, itup, &category);
itup = GinFormTuple(&gvs->ginstate, attnum, key, category,
! GinGetPosting(itup), newN, true);
PageIndexTupleDelete(tmppage, i);
if (PageAddItem(tmppage, (Item) itup, IndexTupleSize(itup), i, false, false) != i)
--- 677,692 ----
*/
tmppage = PageGetTempPageCopy(origpage);
/* set itup pointer to new page */
itup = (IndexTuple) PageGetItem(tmppage, PageGetItemId(tmppage, i));
}
attnum = gintuple_get_attrnum(&gvs->ginstate, itup);
key = gintuple_get_key(&gvs->ginstate, itup, &category);
+ /* FIXME */
itup = GinFormTuple(&gvs->ginstate, attnum, key, category,
! cleaned, cleanedSize, newN, true);
! pfree(cleaned);
PageIndexTupleDelete(tmppage, i);
if (PageAddItem(tmppage, (Item) itup, IndexTupleSize(itup), i, false, false) != i)
*** a/src/backend/access/gin/ginxlog.c
--- b/src/backend/access/gin/ginxlog.c
***************
*** 106,114 **** static void
ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
{
ginxlogCreatePostingTree *data = (ginxlogCreatePostingTree *) XLogRecGetData(record);
! ItemPointerData *items = (ItemPointerData *) (XLogRecGetData(record) + sizeof(ginxlogCreatePostingTree));
Buffer buffer;
Page page;
/* Backup blocks are not used in create_ptree records */
Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
--- 106,117 ----
ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
{
ginxlogCreatePostingTree *data = (ginxlogCreatePostingTree *) XLogRecGetData(record);
! Pointer ptr = XLogRecGetData(record) + sizeof(ginxlogCreatePostingTree), tmp;
Buffer buffer;
Page page;
+ GinState ginstate;
+ ItemPointerData iptr = {{0, 0}, 0};
+ OffsetNumber i;
/* Backup blocks are not used in create_ptree records */
Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
***************
*** 118,128 **** ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
page = (Page) BufferGetPage(buffer);
GinInitBuffer(buffer, GIN_DATA | GIN_LEAF);
! memcpy(GinDataPageGetData(page), items, sizeof(ItemPointerData) * data->nitem);
GinPageGetOpaque(page)->maxoff = data->nitem;
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
UnlockReleaseBuffer(buffer);
}
--- 121,139 ----
page = (Page) BufferGetPage(buffer);
GinInitBuffer(buffer, GIN_DATA | GIN_LEAF);
!
! tmp = ptr;
! for (i = 1; i <= data->nitem; i++)
! tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
!
! memcpy(GinDataPageGetData(page), ptr, tmp - ptr);
!
GinPageGetOpaque(page)->maxoff = data->nitem;
PageSetLSN(page, lsn);
+ updateItemIndexes(page, &ginstate);
+
MarkBufferDirty(buffer);
UnlockReleaseBuffer(buffer);
}
***************
*** 182,195 **** ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
if (data->isLeaf)
{
! OffsetNumber i;
! ItemPointerData *items = (ItemPointerData *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
Assert(GinPageIsLeaf(page));
Assert(data->updateBlkno == InvalidBlockNumber);
! for (i = 0; i < data->nitem; i++)
! GinDataPageAddItem(page, items + i, data->offset + i);
}
else
{
--- 193,241 ----
if (data->isLeaf)
{
! ItemPointer startIptr = (ItemPointer) (XLogRecGetData(record) + sizeof(ginxlogInsert));
! OffsetNumber i, maxoff = GinPageGetOpaque(page)->maxoff, j;
! Pointer dataPtr = (Pointer)(startIptr + 1);
! GinState ginstate;
! ItemPointerData iptr = {{0, 0}, 0}, prev_iptr;
! char pageCopy[BLCKSZ];
! Pointer ptr, destPtr, dataFinish;
Assert(GinPageIsLeaf(page));
Assert(data->updateBlkno == InvalidBlockNumber);
! memcpy(pageCopy, page, BLCKSZ);
! ptr = GinDataPageGetData(page);
! for (i = 1; i < data->offset ; i++)
! {
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! }
!
! ptr = GinDataPageGetData(pageCopy);
! for (i = 1; i < data->offset ; i++)
! {
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! }
!
! prev_iptr = iptr;
! destPtr = page + (ptr - pageCopy);
!
! dataFinish = dataPtr;
! for (j = 0; j < data->nitem; j++)
! dataFinish = ginDataPageLeafReadItemPointer(dataFinish, &prev_iptr);
!
! memcpy(destPtr, dataPtr, dataFinish - dataPtr);
! destPtr += dataFinish - dataPtr;
!
! for (; i <= maxoff; i++)
! {
! ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
! destPtr = ginDataPageLeafWriteItemPointer(destPtr, &iptr, &prev_iptr);
! prev_iptr = iptr;
! }
!
! GinPageGetOpaque(page)->maxoff = maxoff + data->nitem;
! updateItemIndexes(page, &ginstate);
}
else
{
***************
*** 255,260 **** ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
--- 301,307 ----
Page lpage,
rpage;
uint32 flags = 0;
+ GinState ginstate;
if (data->isLeaf)
flags |= GIN_LEAF;
***************
*** 284,310 **** ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
OffsetNumber i;
ItemPointer bound;
! for (i = 0; i < data->separator; i++)
{
! GinDataPageAddItem(lpage, ptr, InvalidOffsetNumber);
! ptr += sizeofitem;
}
!
! for (i = data->separator; i < data->nitem; i++)
{
! GinDataPageAddItem(rpage, ptr, InvalidOffsetNumber);
! ptr += sizeofitem;
! }
! /* set up right key */
! bound = GinDataPageGetRightBound(lpage);
! if (data->isLeaf)
! *bound = *(ItemPointerData *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->maxoff);
! else
! *bound = ((PostingItem *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->maxoff))->key;
! bound = GinDataPageGetRightBound(rpage);
! *bound = data->rightbound;
}
else
{
--- 331,391 ----
OffsetNumber i;
ItemPointer bound;
! if (data->isLeaf)
{
! Pointer tmp, ptr2;
! ItemPointerData iptr = {{0, 0}, 0};
! Size lsize, rsize;
!
! tmp = ptr;
! for (i = 1; i <= data->separator; i++)
! tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
! lsize = tmp - ptr;
! ptr2 = ptr + MAXALIGN(lsize);
! tmp = ptr2;
! for (; i <= data->nitem; i++)
! tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
! rsize = tmp - ptr2;
!
! Assert(lsize < GinDataPageSize);
! Assert(rsize < GinDataPageSize);
!
! memcpy(GinDataPageGetData(lpage), ptr, lsize);
! memcpy(GinDataPageGetData(rpage), ptr2, rsize);
!
! GinPageGetOpaque(lpage)->maxoff = data->separator;
! GinPageGetOpaque(rpage)->maxoff = data->nitem - data->separator;
! *GinDataPageGetRightBound(lpage) = updateItemIndexes(lpage, &ginstate);
! updateItemIndexes(rpage, &ginstate);
!
! *GinDataPageGetRightBound(rpage) = data->rightbound;
!
! Assert(GinPageGetOpaque(lpage)->flags == flags);
! Assert(GinPageGetOpaque(rpage)->flags == flags);
}
! else
{
! for (i = 0; i < data->separator; i++)
! {
! GinDataPageAddItem(lpage, ptr, InvalidOffsetNumber);
! ptr += sizeofitem;
! }
! for (i = data->separator; i < data->nitem; i++)
! {
! GinDataPageAddItem(rpage, ptr, InvalidOffsetNumber);
! ptr += sizeofitem;
! }
! /* set up right key */
! bound = GinDataPageGetRightBound(lpage);
! if (data->isLeaf)
! *bound = *(ItemPointerData *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->maxoff);
! else
! *bound = ((PostingItem *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->maxoff))->key;
! bound = GinDataPageGetRightBound(rpage);
! *bound = data->rightbound;
! }
}
else
{
***************
*** 390,399 **** ginRedoVacuumPage(XLogRecPtr lsn, XLogRecord *record)
{
if (GinPageIsData(page))
{
! memcpy(GinDataPageGetData(page),
! XLogRecGetData(record) + sizeof(ginxlogVacuumPage),
! data->nitem * GinSizeOfDataPageItem(page));
! GinPageGetOpaque(page)->maxoff = data->nitem;
}
else
{
--- 471,500 ----
{
if (GinPageIsData(page))
{
! if (GinPageIsLeaf(page))
! {
! GinState ginstate;
! ItemPointerData iptr = {{0, 0}, 0};
! Pointer ptr, tmp;
! OffsetNumber i;
!
! ptr = XLogRecGetData(record) + sizeof(ginxlogVacuumPage);
! tmp = ptr;
! for (i = 1; i <= data->nitem; i++)
! tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
!
! memcpy(GinDataPageGetData(page), ptr, tmp - ptr);
!
! GinPageGetOpaque(page)->maxoff = data->nitem;
! updateItemIndexes(page, &ginstate);
! }
! else
! {
! memcpy(GinDataPageGetData(page),
! XLogRecGetData(record) + sizeof(ginxlogVacuumPage),
! data->nitem * GinSizeOfDataPageItem(page));
! GinPageGetOpaque(page)->maxoff = data->nitem;
! }
}
else
{
***************
*** 802,813 **** ginContinueSplit(ginIncompleteSplit *split)
ginPrepareDataScan(&btree, reln);
PostingItemSetBlockNumber(&(btree.pitem), split->leftBlkno);
! if (GinPageIsLeaf(page))
! btree.pitem.key = *(ItemPointerData *) GinDataPageGetItem(page,
! GinPageGetOpaque(page)->maxoff);
! else
! btree.pitem.key = ((PostingItem *) GinDataPageGetItem(page,
! GinPageGetOpaque(page)->maxoff))->key;
}
btree.rightblkno = split->rightBlkno;
--- 903,909 ----
ginPrepareDataScan(&btree, reln);
PostingItemSetBlockNumber(&(btree.pitem), split->leftBlkno);
! btree.pitem.key = *GinDataPageGetRightBound(page);
}
btree.rightblkno = split->rightBlkno;
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
***************
*** 196,205 **** typedef signed char GinNullCategory;
#define GinCategoryOffset(itup,ginstate) \
(IndexInfoFindDataOffset((itup)->t_info) + \
((ginstate)->oneCol ? 0 : sizeof(int16)))
! #define GinGetNullCategory(itup,ginstate) \
(*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))))
#define GinSetNullCategory(itup,ginstate,c) \
! (*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))) = (c))
/*
* Access macros for leaf-page entry tuples (see discussion in README)
--- 196,211 ----
#define GinCategoryOffset(itup,ginstate) \
(IndexInfoFindDataOffset((itup)->t_info) + \
((ginstate)->oneCol ? 0 : sizeof(int16)))
! /*#define GinGetNullCategory(itup,ginstate) \
(*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))))
#define GinSetNullCategory(itup,ginstate,c) \
! (*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))) = (c))*/
!
! #define GinGetNullCategory(itup,ginstate) \
! (*((GinNullCategory *) ((char*)(itup) + IndexTupleSize(itup) - sizeof(GinNullCategory))))
! #define GinSetNullCategory(itup,ginstate,c) \
! (*((GinNullCategory *) ((char*)(itup) + IndexTupleSize(itup) - sizeof(GinNullCategory))) = (c))
!
/*
* Access macros for leaf-page entry tuples (see discussion in README)
***************
*** 213,223 **** typedef signed char GinNullCategory;
#define GinGetPostingOffset(itup) GinItemPointerGetBlockNumber(&(itup)->t_tid)
#define GinSetPostingOffset(itup,n) ItemPointerSetBlockNumber(&(itup)->t_tid,n)
! #define GinGetPosting(itup) ((ItemPointer) ((char*)(itup) + GinGetPostingOffset(itup)))
#define GinMaxItemSize \
MAXALIGN_DOWN(((BLCKSZ - SizeOfPageHeaderData - \
! MAXALIGN(sizeof(GinPageOpaqueData))) / 3 - sizeof(ItemIdData)))
/*
* Access macros for non-leaf entry tuples
--- 219,229 ----
#define GinGetPostingOffset(itup) GinItemPointerGetBlockNumber(&(itup)->t_tid)
#define GinSetPostingOffset(itup,n) ItemPointerSetBlockNumber(&(itup)->t_tid,n)
! #define GinGetPosting(itup) ((Pointer) ((char*)(itup) + GinGetPostingOffset(itup)))
#define GinMaxItemSize \
MAXALIGN_DOWN(((BLCKSZ - SizeOfPageHeaderData - \
! MAXALIGN(sizeof(GinPageOpaqueData))) / 6 - sizeof(ItemIdData)))
/*
* Access macros for non-leaf entry tuples
***************
*** 255,260 **** typedef signed char GinNullCategory;
--- 261,289 ----
#define GinListPageSize \
( BLCKSZ - SizeOfPageHeaderData - MAXALIGN(sizeof(GinPageOpaqueData)) )
+ typedef struct
+ {
+ ItemPointerData iptr;
+ OffsetNumber offsetNumer;
+ uint16 pageOffset;
+ } GinDataLeafItemIndex;
+
+ #define GinDataLeafIndexCount 32
+
+ #define GinDataPageSize \
+ (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+ - MAXALIGN(sizeof(ItemPointerData)) \
+ - MAXALIGN(sizeof(GinPageOpaqueData)) \
+ - MAXALIGN(sizeof(GinDataLeafItemIndex) * GinDataLeafIndexCount))
+
+ #define GinDataPageFreeSpacePre(page,ptr) \
+ (GinDataPageSize \
+ - ((ptr) - GinDataPageGetData(page)))
+
+ #define GinPageGetIndexes(page) \
+ ((GinDataLeafItemIndex *)(GinDataPageGetData(page) + GinDataPageSize))
+
+
/*
* Storage type for GIN's reloptions
*/
***************
*** 519,536 **** extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
extern void ginFindParents(GinBtree btree, GinBtreeStack *stack, BlockNumber rootBlkno);
/* ginentrypage.c */
- extern IndexTuple GinFormTuple(GinState *ginstate,
- OffsetNumber attnum, Datum key, GinNullCategory category,
- ItemPointerData *ipd, uint32 nipd, bool errorTooBig);
- extern void GinShortenTuple(IndexTuple itup, uint32 nipd);
extern void ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
Datum key, GinNullCategory category,
GinState *ginstate);
extern void ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
extern IndexTuple ginPageGetLinkItup(Buffer buf);
/* gindatapage.c */
extern int ginCompareItemPointers(ItemPointer a, ItemPointer b);
extern uint32 ginMergeItemPointers(ItemPointerData *dst,
ItemPointerData *a, uint32 na,
ItemPointerData *b, uint32 nb);
--- 548,567 ----
extern void ginFindParents(GinBtree btree, GinBtreeStack *stack, BlockNumber rootBlkno);
/* ginentrypage.c */
extern void ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
Datum key, GinNullCategory category,
GinState *ginstate);
extern void ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
extern IndexTuple ginPageGetLinkItup(Buffer buf);
+ extern void ginReadTuple(GinState *ginstate, OffsetNumber attnum,
+ IndexTuple itup, ItemPointerData *ipd);
+ extern ItemPointerData updateItemIndexes(Page page, GinState *ginstate);
/* gindatapage.c */
extern int ginCompareItemPointers(ItemPointer a, ItemPointer b);
+ extern char *ginDataPageLeafWriteItemPointer(char *ptr, ItemPointer iptr, ItemPointer prev);
+ extern Size ginCheckPlaceToDataPageLeaf(ItemPointer iptr, ItemPointer prev,
+ Size size);
extern uint32 ginMergeItemPointers(ItemPointerData *dst,
ItemPointerData *a, uint32 na,
ItemPointerData *b, uint32 nb);
***************
*** 724,727 **** extern void ginHeapTupleFastCollect(GinState *ginstate,
--- 755,805 ----
extern void ginInsertCleanup(GinState *ginstate,
bool vac_delay, IndexBulkDeleteResult *stats);
+ /*
+ * Function for reading packed ItemPointers. Used in various .c files and
+ * have to be inline for being fast.
+ *
+ * Read next item pointer from leaf data page. Replaces current item pointer
+ * with the next one. Zero item pointer should be passed in order to read the
+ * first item pointer.
+ */
+ static inline char *
+ ginDataPageLeafReadItemPointer(char *ptr, ItemPointer iptr)
+ {
+ uint32 blockNumberIncr;
+ uint16 offset;
+ int i;
+ uint8 v;
+
+ i = 0;
+ blockNumberIncr = 0;
+ do
+ {
+ v = *ptr;
+ ptr++;
+ blockNumberIncr |= (uint32) (v & (~HIGHBIT)) << i;
+ i += 7;
+ }
+ while (IS_HIGHBIT_SET(v));
+
+ blockNumberIncr += iptr->ip_blkid.bi_lo + (iptr->ip_blkid.bi_hi << 16);
+
+ iptr->ip_blkid.bi_lo = blockNumberIncr & 0xFFFF;
+ iptr->ip_blkid.bi_hi = (blockNumberIncr >> 16) & 0xFFFF;
+
+ i = 0;
+ offset = 0;
+ do
+ {
+ v = *ptr;
+ ptr++;
+ offset |= (uint32) (v & (~HIGHBIT)) << i;
+ i += 7;
+ } while(IS_HIGHBIT_SET(v));
+
+ iptr->ip_posid = offset;
+
+ return ptr;
+ }
+
#endif /* GIN_PRIVATE_H */
On 27.06.2013 17:20, Antonin Houska wrote:
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever
Hmm. It won't loop forever, i is counting the number of entries that fit
on the page, while j is used to loop through the items being added.
However, i isn't actually used for anything (and isn't initialized
either), so it's just dead code.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/29/2013 11:00 AM, Heikki Linnakangas wrote:
On 27.06.2013 17:20, Antonin Houska wrote:
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops foreverHmm. It won't loop forever, i is counting the number of entries that
fit on the page, while j is used to loop through the items being
added. However, i isn't actually used for anything (and isn't
initialized either), so it's just dead code.- Heikki
You're right. While thinking about possible meaning of the 'i' I didn't
notice that j++ is in the 'for' construct. Stupid mistake on my side.
Tony
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29.06.2013 11:56, Heikki Linnakangas wrote:
On 25.06.2013 01:03, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Thanks! I'm looking into this in detail now.
First, this patch actually contains two major things:
1. Pack item pointers more tightly on posting data leaf pages.
2. Allow opclass implementation to attach "additional information" to
each item pointer.These are two very distinct features, so this patch needs to be split
into two. I extracted the 1st part into a separate patch, attached, and
am going to focus on that now.I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.
Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29.06.2013 20:08, Heikki Linnakangas wrote:
On 29.06.2013 11:56, Heikki Linnakangas wrote:
I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..
Here's the correct version. I've probably broken things, sorry about that.
I'm going to mark this as "returned with feedback" now. The code still
needs a lot of general cleanup, including comment and README updates.
Also, please take some time to consider the open questions I listed
here: archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com.
- Heikki
Attachments:
gin-packed-postinglists-2.patchtext/x-diff; name=gin-packed-postinglists-2.patchDownload
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index 67159d8..6ebee1b 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -145,6 +145,7 @@ none appear in the key entry itself. The separate pages are called a
Note that in either case, the ItemPointers associated with a key can
easily be read out in sorted order; this is relied on by the scan
algorithms.
+FIXME: Update above paragraph!
* The index tuple header fields of a leaf key entry are abused as follows:
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index f017de0..a250bfc 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -18,6 +18,97 @@
#include "utils/rel.h"
/*
+ * Write item pointer into leaf data page using varbyte encoding. Since
+ * BlockNumber is stored in incremental manner we also need a previous item
+ * pointer.
+ */
+char *
+ginDataPageLeafWriteItemPointer(char *ptr, ItemPointer iptr, ItemPointer prev)
+{
+ uint32 blockNumberIncr;
+ uint16 offset;
+ uint8 v;
+
+ blockNumberIncr = iptr->ip_blkid.bi_lo + (iptr->ip_blkid.bi_hi << 16) -
+ (prev->ip_blkid.bi_lo + (prev->ip_blkid.bi_hi << 16));
+ for (;;)
+ {
+ if (blockNumberIncr < HIGHBIT)
+ {
+ v = (uint8) blockNumberIncr;
+ *ptr = v;
+ ptr++;
+ break;
+ }
+ else
+ {
+ v = ((uint8) blockNumberIncr) | HIGHBIT;
+ *ptr = v;
+ ptr++;
+ blockNumberIncr >>= 7;
+ }
+ }
+
+ offset = iptr->ip_posid;
+ for (;;)
+ {
+ if (offset < HIGHBIT)
+ {
+ v = (uint8) offset;
+ *ptr = v;
+ ptr++;
+ break;
+ }
+ else
+ {
+ v = ((uint8) offset) | HIGHBIT;
+ *ptr = v;
+ ptr++;
+ offset >>= 7;
+ }
+ }
+
+ return ptr;
+}
+
+/*
+ * Calculate size of incremental varbyte encoding of item pointer.
+ */
+static int
+ginDataPageLeafGetItemPointerSize(ItemPointer iptr, ItemPointer prev)
+{
+ uint32 blockNumberIncr;
+ uint16 offset;
+ int size = 0;
+
+ blockNumberIncr = iptr->ip_blkid.bi_lo + (iptr->ip_blkid.bi_hi << 16) -
+ (prev->ip_blkid.bi_lo + (prev->ip_blkid.bi_hi << 16));
+ do
+ {
+ size++;
+ blockNumberIncr >>= 7;
+ } while (blockNumberIncr > 0);
+
+ offset = iptr->ip_posid;
+ do
+ {
+ size++;
+ offset >>= 7;
+ } while (offset > 0);
+
+ return size;
+}
+
+/*
+ * Returns size of item pointers if leaf data page after inserting another one.
+ */
+Size
+ginCheckPlaceToDataPageLeaf(ItemPointer iptr, ItemPointer prev, Size size)
+{
+ return size + ginDataPageLeafGetItemPointerSize(iptr, prev);
+}
+
+/*
* Merge two ordered arrays of itempointers, eliminating any duplicates.
* Returns the number of items in the result.
* Caller is responsible that there is enough space at *dst.
@@ -91,12 +182,12 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
if (btree->fullScan)
{
stack->off = FirstOffsetNumber;
- stack->predictNumber *= GinPageGetOpaque(page)->maxoff;
+ stack->predictNumber *= GinPageGetOpaque(page)->u.maxoff;
return btree->getLeftMostPage(btree, page);
}
low = FirstOffsetNumber;
- maxoff = high = GinPageGetOpaque(page)->maxoff;
+ maxoff = high = GinPageGetOpaque(page)->u.maxoff;
Assert(high >= low);
high++;
@@ -105,7 +196,7 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
{
OffsetNumber mid = low + ((high - low) / 2);
- pitem = (PostingItem *) GinDataPageGetItem(page, mid);
+ pitem = GinDataPageGetItem(page, mid);
if (mid == maxoff)
{
@@ -117,7 +208,7 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
}
else
{
- pitem = (PostingItem *) GinDataPageGetItem(page, mid);
+ pitem = GinDataPageGetItem(page, mid);
result = ginCompareItemPointers(btree->items + btree->curitem, &(pitem->key));
}
@@ -135,11 +226,82 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
Assert(high >= FirstOffsetNumber && high <= maxoff);
stack->off = high;
- pitem = (PostingItem *) GinDataPageGetItem(page, high);
+ pitem = GinDataPageGetItem(page, high);
+
return PostingItemGetBlockNumber(pitem);
}
/*
+ * Find item pointer in leaf data page. Returns true if given item pointer is
+ * found and false if it's not. Sets offset and iptrOut to last item pointer
+ * which is less than given one. Sets ptrOut ahead that item pointer.
+ */
+static bool
+findInLeafPage(GinBtree btree, Page page, OffsetNumber *offset,
+ ItemPointerData *iptrOut, Pointer *ptrOut)
+{
+ Pointer ptr = GinDataPageGetData(page);
+ int i;
+ OffsetNumber off;
+ ItemPointerData iptr = {{0,0},0};
+ int cmp;
+ Pointer endPtr;
+ bool result = false;
+
+ endPtr = page + GinPageGetOpaque(page)->u.endoffset;
+
+ /*
+ * At first, search index at the end of page. As the result we narrow
+ * [first, maxoff] range.
+ */
+ off = FirstOffsetNumber;
+ for (i = 0; i < GinDataLeafIndexCount; i++)
+ {
+ GinDataLeafItemIndex *index = &GinPageGetIndexes(page)[i];
+ if (index->offsetNumber == InvalidOffsetNumber)
+ break;
+
+ cmp = ginCompareItemPointers(&index->iptr, btree->items + btree->curitem);
+ if (cmp < 0)
+ {
+ ptr = GinDataPageGetData(page) + index->pageOffset;
+ iptr = index->iptr;
+ off = index->offsetNumber;
+ }
+ else
+ {
+ endPtr = page + index->pageOffset;
+ break;
+ }
+ }
+
+ /* Search page in [first, maxoff] range found by page index */
+ while (ptr < endPtr)
+ {
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+
+ cmp = ginCompareItemPointers(btree->items + btree->curitem, &iptr);
+ if (cmp == 0)
+ {
+ result = true;
+ break;
+ }
+ if (cmp < 0)
+ {
+ result = false;
+ break;
+ }
+ off++;
+ }
+
+ *ptrOut = ptr;
+ *iptrOut = iptr;
+ *offset = off;
+ return result;
+}
+
+
+/*
* Searches correct position for value on leaf page.
* Page should be correctly chosen.
* Returns true if value found on page.
@@ -148,9 +310,8 @@ static bool
dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
{
Page page = BufferGetPage(stack->buffer);
- OffsetNumber low,
- high;
- int result;
+ ItemPointerData iptr;
+ Pointer ptr;
Assert(GinPageIsLeaf(page));
Assert(GinPageIsData(page));
@@ -161,36 +322,7 @@ dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
return TRUE;
}
- low = FirstOffsetNumber;
- high = GinPageGetOpaque(page)->maxoff;
-
- if (high < low)
- {
- stack->off = FirstOffsetNumber;
- return false;
- }
-
- high++;
-
- while (high > low)
- {
- OffsetNumber mid = low + ((high - low) / 2);
-
- result = ginCompareItemPointers(btree->items + btree->curitem, (ItemPointer) GinDataPageGetItem(page, mid));
-
- if (result == 0)
- {
- stack->off = mid;
- return true;
- }
- else if (result > 0)
- low = mid + 1;
- else
- high = mid;
- }
-
- stack->off = high;
- return false;
+ return findInLeafPage(btree, page, &stack->off, &iptr, &ptr);
}
/*
@@ -201,7 +333,7 @@ static OffsetNumber
dataFindChildPtr(GinBtree btree, Page page, BlockNumber blkno, OffsetNumber storedOff)
{
OffsetNumber i,
- maxoff = GinPageGetOpaque(page)->maxoff;
+ maxoff = GinPageGetOpaque(page)->u.maxoff;
PostingItem *pitem;
Assert(!GinPageIsLeaf(page));
@@ -210,7 +342,7 @@ dataFindChildPtr(GinBtree btree, Page page, BlockNumber blkno, OffsetNumber stor
/* if page isn't changed, we return storedOff */
if (storedOff >= FirstOffsetNumber && storedOff <= maxoff)
{
- pitem = (PostingItem *) GinDataPageGetItem(page, storedOff);
+ pitem = GinDataPageGetItem(page, storedOff);
if (PostingItemGetBlockNumber(pitem) == blkno)
return storedOff;
@@ -220,7 +352,7 @@ dataFindChildPtr(GinBtree btree, Page page, BlockNumber blkno, OffsetNumber stor
*/
for (i = storedOff + 1; i <= maxoff; i++)
{
- pitem = (PostingItem *) GinDataPageGetItem(page, i);
+ pitem = GinDataPageGetItem(page, i);
if (PostingItemGetBlockNumber(pitem) == blkno)
return i;
}
@@ -231,7 +363,7 @@ dataFindChildPtr(GinBtree btree, Page page, BlockNumber blkno, OffsetNumber stor
/* last chance */
for (i = FirstOffsetNumber; i <= maxoff; i++)
{
- pitem = (PostingItem *) GinDataPageGetItem(page, i);
+ pitem = GinDataPageGetItem(page, i);
if (PostingItemGetBlockNumber(pitem) == blkno)
return i;
}
@@ -249,37 +381,38 @@ dataGetLeftMostPage(GinBtree btree, Page page)
Assert(!GinPageIsLeaf(page));
Assert(GinPageIsData(page));
- Assert(GinPageGetOpaque(page)->maxoff >= FirstOffsetNumber);
+ Assert(GinPageGetOpaque(page)->u.maxoff >= FirstOffsetNumber);
- pitem = (PostingItem *) GinDataPageGetItem(page, FirstOffsetNumber);
+ pitem = GinDataPageGetItem(page, FirstOffsetNumber);
return PostingItemGetBlockNumber(pitem);
}
/*
- * add ItemPointer or PostingItem to page. data should point to
+ * add PostingItem to page. data should point to
* correct value! depending on leaf or non-leaf page
*/
void
-GinDataPageAddItem(Page page, void *data, OffsetNumber offset)
+GinPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset)
{
- OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
- char *ptr;
+ OffsetNumber maxoff = GinPageGetOpaque(page)->u.maxoff;
+ PostingItem *ptr;
+
+ Assert(!GinPageIsLeaf(page));
if (offset == InvalidOffsetNumber)
- {
ptr = GinDataPageGetItem(page, maxoff + 1);
- }
else
{
+ Assert(offset <= maxoff + 1);
ptr = GinDataPageGetItem(page, offset);
- if (maxoff + 1 - offset != 0)
- memmove(ptr + GinSizeOfDataPageItem(page),
+ if (offset != maxoff + 1)
+ memmove(GinDataPageGetItem(page, offset +1),
ptr,
- (maxoff - offset + 1) * GinSizeOfDataPageItem(page));
+ (maxoff - offset + 1) * sizeof(PostingItem));
}
- memcpy(ptr, data, GinSizeOfDataPageItem(page));
+ memcpy(ptr, data, sizeof(PostingItem));
- GinPageGetOpaque(page)->maxoff++;
+ GinPageGetOpaque(page)->u.maxoff++;
}
/*
@@ -288,16 +421,17 @@ GinDataPageAddItem(Page page, void *data, OffsetNumber offset)
void
GinPageDeletePostingItem(Page page, OffsetNumber offset)
{
- OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
+ OffsetNumber maxoff = GinPageGetOpaque(page)->u.maxoff;
Assert(!GinPageIsLeaf(page));
Assert(offset >= FirstOffsetNumber && offset <= maxoff);
if (offset != maxoff)
- memmove(GinDataPageGetItem(page, offset), GinDataPageGetItem(page, offset + 1),
+ memmove(GinDataPageGetItem(page, offset),
+ GinDataPageGetItem(page, offset + 1),
sizeof(PostingItem) * (maxoff - offset));
- GinPageGetOpaque(page)->maxoff--;
+ GinPageGetOpaque(page)->u.maxoff--;
}
/*
@@ -314,15 +448,26 @@ dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
if (GinPageIsLeaf(page))
{
- if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
+ int j;
+ ItemPointerData iptr = {{0,0},0};
+ Size size = 0;
+
+ /*
+ * Calculate additional size using worst case assumption: varbyte
+ * encoding from zero item pointer.
+ */
+ for (j = btree->curitem; j < btree->nitem; j++)
{
- if ((btree->nitem - btree->curitem) * sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
- return true;
+ size = ginCheckPlaceToDataPageLeaf(&btree->items[j],
+ (j == btree->curitem) ? (&iptr) : &btree->items[j - 1],
+ size);
}
- else if (sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
+
+ if (GinLeafDataPageGetFreeSpace(page) >= size)
return true;
+
}
- else if (sizeof(PostingItem) <= GinDataPageGetFreeSpace(page))
+ else if (sizeof(PostingItem) <= GinNonLeafDataPageGetFreeSpace(page))
return true;
return false;
@@ -361,12 +506,12 @@ static void
dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prdata)
{
Page page = BufferGetPage(buf);
- int sizeofitem = GinSizeOfDataPageItem(page);
int cnt = 0;
/* these must be static so they can be returned to caller */
static XLogRecData rdata[3];
static ginxlogInsert data;
+ static char insertData[BLCKSZ];
*prdata = rdata;
Assert(GinPageIsData(page));
@@ -376,7 +521,7 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
data.node = btree->index->rd_node;
data.blkno = BufferGetBlockNumber(buf);
data.offset = off;
- data.nitem = 1;
+ data.nitem = 0;
data.isDelete = FALSE;
data.isData = TRUE;
data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
@@ -404,35 +549,343 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
rdata[cnt].next = &rdata[cnt + 1];
cnt++;
- rdata[cnt].buffer = InvalidBuffer;
- rdata[cnt].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem));
- rdata[cnt].len = sizeofitem;
- rdata[cnt].next = NULL;
-
if (GinPageIsLeaf(page))
{
- if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
+ int i = 0, j, max_j;
+ Pointer ptr = GinDataPageGetData(page), next_ptr, insertStart;
+ ItemPointerData iptr = {{0,0},0}, next_iptr;
+ char pageCopy[BLCKSZ];
+ int copySize = 0;
+ char *endPtr = page + GinPageGetOpaque(page)->u.endoffset;
+ bool append = true;
+
+ /*
+ * We're going to prevent var-byte re-encoding of whole page.
+ * Find position in page using page indexes.
+ */
+ findInLeafPage(btree, page, &off, &iptr, &ptr);
+
+ Assert(GinDataPageFreeSpacePre(page,ptr) >= 0);
+
+ if (ptr < endPtr)
{
- /* usually, create index... */
- uint32 savedPos = btree->curitem;
+ /*
+ * Read next item pointer: we'll have to re-encode it. Copy
+ * previous part of page
+ */
+ next_iptr = iptr;
+ next_ptr = ginDataPageLeafReadItemPointer(ptr, &next_iptr);
+ copySize = GinDataPageSize - GinLeafDataPageGetFreeSpace(page) -
+ (next_ptr - GinDataPageGetData(page));
+ memcpy(pageCopy, next_ptr, copySize);
+ append = false;
+ }
- while (btree->curitem < btree->nitem)
+ /* Check how many items we're going to add */
+ max_j = btree->curitem + btree->nitem;
+
+ /* Place items to the page while we have enough of space */
+ memcpy(insertData, &iptr, sizeof(ItemPointerData));
+ insertStart = ptr;
+ i = 0;
+ for (j = btree->curitem; j < max_j; j++)
+ {
+ Pointer ptr2;
+
+ ptr2 = page + ginCheckPlaceToDataPageLeaf(&btree->items[j],
+ &iptr, ptr - page);
+
+ if (GinDataPageFreeSpacePre(page, ptr2) < 0)
+ break;
+
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &btree->items[j], &iptr);
+ Assert(GinDataPageFreeSpacePre(page,ptr) >= 0);
+
+ iptr = btree->items[j];
+ btree->curitem++;
+ data.nitem++;
+ i++;
+ }
+ Assert(i > 0);
+
+ /* Put WAL data */
+ memcpy(insertData + sizeof(ItemPointerData), insertStart,
+ ptr - insertStart);
+ rdata[cnt].buffer = InvalidBuffer;
+ rdata[cnt].data = insertData;
+ rdata[cnt].len = sizeof(ItemPointerData) + (ptr - insertStart);
+ rdata[cnt].next = NULL;
+
+ /* Place rest of the page back */
+ if (!append)
+ {
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &next_iptr, &iptr);
+ Assert(GinDataPageFreeSpacePre(page,ptr) >= 0);
+ memcpy(ptr, pageCopy, copySize);
+ ptr += copySize;
+ }
+
+ GinPageGetOpaque(page)->u.endoffset = ptr - page;
+
+ if (GinDataPageFreeSpacePre(page,ptr) < 0)
+ elog(ERROR, "not enough space in leaf page!");
+
+ /* Update indexes in the end of page */
+ updateItemIndexes(page, btree->ginstate);
+ }
+ else
+ {
+ rdata[cnt].buffer = InvalidBuffer;
+ rdata[cnt].data = (char *) &(btree->pitem);
+ rdata[cnt].len = sizeof(PostingItem);
+ rdata[cnt].next = NULL;
+ data.nitem = 1;
+
+ GinPageAddPostingItem(page, &(btree->pitem), off);
+ }
+}
+
+/* Macro for leaf data page split: switch to right page if needed. */
+#define CHECK_SWITCH_TO_RPAGE \
+ do { \
+ if (ptr - GinDataPageGetData(page) > \
+ totalsize / 2 && page == lpage) \
+ { \
+ maxLeftIptr = iptr; \
+ prevIptr.ip_blkid.bi_hi = 0; \
+ prevIptr.ip_blkid.bi_lo = 0; \
+ prevIptr.ip_posid = 0; \
+ GinPageGetOpaque(lpage)->u.endoffset = ptr - page; \
+ page = rpage; \
+ ptr = GinDataPageGetData(rpage); \
+ separator = j; \
+ j = FirstOffsetNumber; \
+ } \
+ else \
+ { \
+ j++; \
+ } \
+ } while (0)
+
+
+
+/*
+ * Place tuple and split page, original buffer(lbuf) leaves untouched,
+ * returns shadow page of lbuf filled new data.
+ */
+static Page
+dataSplitPageLeaf(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+ XLogRecData **prdata)
+{
+ OffsetNumber i, j;
+ Size totalsize = 0, prevTotalsize;
+ Pointer ptr, copyPtr, copyEndPtr;
+ Page page;
+ Page lpage = PageGetTempPageCopy(BufferGetPage(lbuf));
+ Page rpage = BufferGetPage(rbuf);
+ Size pageSize = PageGetPageSize(lpage);
+ Size maxItemSize = 0;
+ ItemPointerData iptr, prevIptr, maxLeftIptr;
+ int totalCount = 0;
+ int maxItemIndex = btree->curitem;
+ int separator = 0;
+
+ /* these must be static so they can be returned to caller */
+ static XLogRecData rdata[3];
+ static ginxlogSplit data;
+ static char lpageCopy[BLCKSZ];
+
+ *prdata = rdata;
+ data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
+ InvalidOffsetNumber : GinGetDownlink(btree->entry);
+ data.updateBlkno = dataPrepareData(btree, lpage, off);
+
+ /* Copy original data of the page */
+ memcpy(lpageCopy, lpage, BLCKSZ);
+
+ /* Reinitialize pages */
+ GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
+ GinInitPage(lpage, GinPageGetOpaque(rpage)->flags, pageSize);
+
+ GinPageGetOpaque(lpage)->u.endoffset = GinDataPageGetData(lpage) - lpage;
+ GinPageGetOpaque(rpage)->u.endoffset = GinDataPageGetData(rpage) - rpage;
+
+ /* Calculate the whole size we're going to place */
+ copyPtr = GinDataPageGetData(lpageCopy);
+ copyEndPtr = lpageCopy + GinPageGetOpaque(lpageCopy)->u.endoffset;
+ iptr.ip_blkid.bi_hi = 0;
+ iptr.ip_blkid.bi_lo = 0;
+ iptr.ip_posid = 0;
+ i = FirstOffsetNumber;
+ while (copyPtr < copyEndPtr)
+ {
+ if (i == off)
+ {
+ prevIptr = iptr;
+ iptr = btree->items[maxItemIndex];
+
+ prevTotalsize = totalsize;
+ totalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize);
+
+ maxItemIndex++;
+ totalCount++;
+ maxItemSize = Max(maxItemSize, totalsize - prevTotalsize);
+ }
+
+ prevIptr = iptr;
+ copyPtr = ginDataPageLeafReadItemPointer(copyPtr, &iptr);
+
+ prevTotalsize = totalsize;
+ totalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize);
+
+ totalCount++;
+ maxItemSize = Max(maxItemSize, totalsize - prevTotalsize);
+ i++;
+ }
+
+ Assert(copyPtr <= copyEndPtr);
+ if (copyPtr == copyEndPtr)
+ {
+ prevIptr = iptr;
+ iptr = btree->items[maxItemIndex];
+ if (GinPageRightMost(lpage))
+ {
+ Size newTotalsize;
+
+ /*
+ * Found how many new item pointer we're going to add using
+ * worst case assumptions about odd placement and alignment.
+ */
+ while (maxItemIndex < btree->nitem &&
+ (newTotalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize)) <
+ 2 * GinDataPageSize - 2 * maxItemSize - 2 * MAXIMUM_ALIGNOF
+ )
{
- GinDataPageAddItem(page, btree->items + btree->curitem, off);
- off++;
- btree->curitem++;
+ maxItemIndex++;
+ totalCount++;
+ maxItemSize = Max(maxItemSize, newTotalsize - totalsize);
+ totalsize = newTotalsize;
+
+ prevIptr = iptr;
+ if (maxItemIndex < btree->nitem)
+ iptr = btree->items[maxItemIndex];
}
- data.nitem = btree->curitem - savedPos;
- rdata[cnt].len = sizeofitem * data.nitem;
}
else
{
- GinDataPageAddItem(page, btree->items + btree->curitem, off);
- btree->curitem++;
+ prevTotalsize = totalsize;
+ totalsize = ginCheckPlaceToDataPageLeaf(&iptr, &prevIptr, totalsize);
+ maxItemIndex++;
+
+ totalCount++;
+ maxItemSize = Max(maxItemSize, totalsize - prevTotalsize);
}
}
- else
- GinDataPageAddItem(page, &(btree->pitem), off);
+
+ /*
+ * Place item pointers with additional information to the pages using
+ * previous calculations. XXX: what does this do now that I removed the
+ * additional information stuff from the patch?
+ */
+ ptr = GinDataPageGetData(lpage);
+ page = lpage;
+ j = FirstOffsetNumber;
+ iptr.ip_blkid.bi_hi = 0;
+ iptr.ip_blkid.bi_lo = 0;
+ iptr.ip_posid = 0;
+ prevIptr = iptr;
+ copyPtr = GinDataPageGetData(lpageCopy);
+ i = 0;
+ while (copyPtr < copyEndPtr)
+ {
+ if (i == off)
+ {
+ while (btree->curitem < maxItemIndex)
+ {
+ iptr = btree->items[btree->curitem];
+
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &iptr, &prevIptr);
+ Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
+
+ btree->curitem++;
+ prevIptr = iptr;
+
+ CHECK_SWITCH_TO_RPAGE;
+ i++;
+ }
+ }
+
+ copyPtr = ginDataPageLeafReadItemPointer(copyPtr, &iptr);
+
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &iptr, &prevIptr);
+ Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
+
+ prevIptr = iptr;
+
+ CHECK_SWITCH_TO_RPAGE;
+ i++;
+ }
+
+ /*
+ * If we didn't add the new items yet, they belong at the end. Insert them
+ * there.
+ */
+ while (btree->curitem < maxItemIndex)
+ {
+ iptr = btree->items[btree->curitem];
+
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &iptr, &prevIptr);
+ Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
+
+ btree->curitem++;
+
+ prevIptr = iptr;
+
+ CHECK_SWITCH_TO_RPAGE;
+ i++;
+ }
+
+ Assert(page == rpage);
+ GinPageGetOpaque(rpage)->u.endoffset = ptr - rpage;
+
+ PostingItemSetBlockNumber(&(btree->pitem), BufferGetBlockNumber(lbuf));
+ btree->pitem.key = maxLeftIptr;
+ btree->rightblkno = BufferGetBlockNumber(rbuf);
+
+ *GinDataPageGetRightBound(rpage) = *GinDataPageGetRightBound(lpage);
+ *GinDataPageGetRightBound(lpage) = maxLeftIptr;
+
+ /* Fill indexes at the end of pages */
+ updateItemIndexes(lpage, btree->ginstate);
+ updateItemIndexes(rpage, btree->ginstate);
+
+ data.node = btree->index->rd_node;
+ data.rootBlkno = InvalidBlockNumber;
+ data.lblkno = BufferGetBlockNumber(lbuf);
+ data.rblkno = BufferGetBlockNumber(rbuf);
+ data.separator = separator;
+ data.nitem = i;
+ data.isData = TRUE;
+ data.isLeaf = TRUE;
+ data.isRootSplit = FALSE;
+ data.rightbound = *GinDataPageGetRightBound(rpage);
+
+ rdata[0].buffer = InvalidBuffer;
+ rdata[0].data = (char *) &data;
+ rdata[0].len = sizeof(ginxlogSplit);
+ rdata[0].next = &rdata[1];
+
+ rdata[1].buffer = InvalidBuffer;
+ rdata[1].data = GinDataPageGetData(lpage);
+ rdata[1].len = GinDataPageSize - GinLeafDataPageGetFreeSpace(lpage);
+ rdata[1].next = &rdata[2];
+
+ rdata[2].buffer = InvalidBuffer;
+ rdata[2].data = GinDataPageGetData(rpage);
+ rdata[2].len = GinDataPageSize - GinLeafDataPageGetFreeSpace(rpage);
+ rdata[2].next = NULL;
+
+ return lpage;
}
/*
@@ -442,19 +895,18 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
* left page
*/
static Page
-dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRecData **prdata)
+dataSplitPageInternal(GinBtree btree, Buffer lbuf, Buffer rbuf,
+ OffsetNumber off, XLogRecData **prdata)
{
char *ptr;
OffsetNumber separator;
ItemPointer bound;
Page lpage = PageGetTempPageCopy(BufferGetPage(lbuf));
ItemPointerData oldbound = *GinDataPageGetRightBound(lpage);
- int sizeofitem = GinSizeOfDataPageItem(lpage);
- OffsetNumber maxoff = GinPageGetOpaque(lpage)->maxoff;
+ OffsetNumber maxoff = GinPageGetOpaque(lpage)->u.maxoff;
Page rpage = BufferGetPage(rbuf);
Size pageSize = PageGetPageSize(lpage);
Size freeSpace;
- uint32 nCopied = 1;
/* these must be static so they can be returned to caller */
static ginxlogSplit data;
@@ -462,7 +914,7 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
static char vector[2 * BLCKSZ];
GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
- freeSpace = GinDataPageGetFreeSpace(rpage);
+ freeSpace = GinNonLeafDataPageGetFreeSpace(rpage);
*prdata = rdata;
data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
@@ -470,63 +922,36 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
data.updateBlkno = dataPrepareData(btree, lpage, off);
memcpy(vector, GinDataPageGetItem(lpage, FirstOffsetNumber),
- maxoff * sizeofitem);
+ maxoff * sizeof(PostingItem));
- if (GinPageIsLeaf(lpage) && GinPageRightMost(lpage) && off > GinPageGetOpaque(lpage)->maxoff)
- {
- nCopied = 0;
- while (btree->curitem < btree->nitem &&
- maxoff * sizeof(ItemPointerData) < 2 * (freeSpace - sizeof(ItemPointerData)))
- {
- memcpy(vector + maxoff * sizeof(ItemPointerData),
- btree->items + btree->curitem,
- sizeof(ItemPointerData));
- maxoff++;
- nCopied++;
- btree->curitem++;
- }
- }
- else
- {
- ptr = vector + (off - 1) * sizeofitem;
- if (maxoff + 1 - off != 0)
- memmove(ptr + sizeofitem, ptr, (maxoff - off + 1) * sizeofitem);
- if (GinPageIsLeaf(lpage))
- {
- memcpy(ptr, btree->items + btree->curitem, sizeofitem);
- btree->curitem++;
- }
- else
- memcpy(ptr, &(btree->pitem), sizeofitem);
+ ptr = vector + (off - 1) * sizeof(PostingItem);
+ if (maxoff + 1 - off != 0)
+ memmove(ptr + sizeof(PostingItem), ptr, (maxoff - off + 1) * sizeof(PostingItem));
+ memcpy(ptr, &(btree->pitem), sizeof(PostingItem));
- maxoff++;
- }
+ maxoff++;
/*
* we suppose that during index creation table scaned from begin to end,
* so ItemPointers are monotonically increased..
*/
if (btree->isBuild && GinPageRightMost(lpage))
- separator = freeSpace / sizeofitem;
+ separator = freeSpace / sizeof(PostingItem);
else
separator = maxoff / 2;
GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
GinInitPage(lpage, GinPageGetOpaque(rpage)->flags, pageSize);
- memcpy(GinDataPageGetItem(lpage, FirstOffsetNumber), vector, separator * sizeofitem);
- GinPageGetOpaque(lpage)->maxoff = separator;
+ memcpy(GinDataPageGetItem(lpage, FirstOffsetNumber), vector, separator * sizeof(PostingItem));
+ GinPageGetOpaque(lpage)->u.maxoff = separator;
memcpy(GinDataPageGetItem(rpage, FirstOffsetNumber),
- vector + separator * sizeofitem, (maxoff - separator) * sizeofitem);
- GinPageGetOpaque(rpage)->maxoff = maxoff - separator;
+ vector + separator * sizeof(PostingItem), (maxoff - separator) * sizeof(PostingItem));
+ GinPageGetOpaque(rpage)->u.maxoff = maxoff - separator;
PostingItemSetBlockNumber(&(btree->pitem), BufferGetBlockNumber(lbuf));
- if (GinPageIsLeaf(lpage))
- btree->pitem.key = *(ItemPointerData *) GinDataPageGetItem(lpage,
- GinPageGetOpaque(lpage)->maxoff);
- else
- btree->pitem.key = ((PostingItem *) GinDataPageGetItem(lpage,
- GinPageGetOpaque(lpage)->maxoff))->key;
+ btree->pitem.key = ((PostingItem *) GinDataPageGetItem(lpage,
+ GinPageGetOpaque(lpage)->u.maxoff))->key;
btree->rightblkno = BufferGetBlockNumber(rbuf);
/* set up right bound for left page */
@@ -544,7 +969,7 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
data.separator = separator;
data.nitem = maxoff;
data.isData = TRUE;
- data.isLeaf = GinPageIsLeaf(lpage) ? TRUE : FALSE;
+ data.isLeaf = FALSE;
data.isRootSplit = FALSE;
data.rightbound = oldbound;
@@ -555,13 +980,83 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
rdata[1].buffer = InvalidBuffer;
rdata[1].data = vector;
- rdata[1].len = MAXALIGN(maxoff * sizeofitem);
+ rdata[1].len = MAXALIGN(maxoff * sizeof(PostingItem));
rdata[1].next = NULL;
return lpage;
}
/*
+ * Split page of posting tree. Calls relevant function of internal of leaf page
+ * because they are handled very different.
+ */
+static Page
+dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+ XLogRecData **prdata)
+{
+ if (GinPageIsLeaf(BufferGetPage(lbuf)))
+ return dataSplitPageLeaf(btree, lbuf, rbuf, off, prdata);
+ else
+ return dataSplitPageInternal(btree, lbuf, rbuf, off, prdata);
+}
+
+/*
+ * Updates indexes in the end of leaf page which are used for faster search.
+ * Also updates freespace opaque field of page. Returns last item pointer of
+ * page.
+ */
+ItemPointerData
+updateItemIndexes(Page page, GinState *ginstate)
+{
+ Pointer beginptr;
+ Pointer ptr;
+ Pointer endptr;
+ Pointer nextptr;
+ ItemPointerData iptr;
+ int j = 0, i;
+ int totalsize;
+
+ Assert(GinPageIsLeaf(page));
+
+ /* Iterate over page */
+
+ ptr = beginptr = GinDataPageGetData(page);
+ endptr = page + GinPageGetOpaque(page)->u.endoffset;
+ iptr.ip_blkid.bi_lo = 0;
+ iptr.ip_blkid.bi_hi = 0;
+ iptr.ip_posid = 0;
+
+ totalsize = endptr - beginptr;
+ nextptr = beginptr + (int) ((double) totalsize / (double) (GinDataLeafIndexCount + 1));
+ j = 0;
+ i = FirstOffsetNumber;
+ while (ptr < endptr && j < GinDataLeafIndexCount)
+ {
+ /* Place next page index entry if it's time to */
+ if (ptr >= nextptr)
+ {
+ GinPageGetIndexes(page)[j].iptr = iptr;
+ GinPageGetIndexes(page)[j].offsetNumber = i;
+ GinPageGetIndexes(page)[j].pageOffset = ptr - GinDataPageGetData(page);
+ j++;
+ nextptr = beginptr + (int) ((double) (j + 1) * (double) totalsize / (double) (GinDataLeafIndexCount + 1));
+ }
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ i++;
+ }
+ /* Fill rest of page indexes with InvalidOffsetNumber if any */
+ for (; j < GinDataLeafIndexCount; j++)
+ {
+ GinDataLeafItemIndex *idx = &GinPageGetIndexes(page)[j];
+ MemSet(&idx->iptr, 0, sizeof(ItemPointerData));
+ idx->offsetNumber = InvalidOffsetNumber;
+ idx->pageOffset = 0;
+ }
+
+ return iptr;
+}
+
+/*
* Fills new root by right bound values from child.
* Also called from ginxlog, should not use btree
*/
@@ -576,11 +1071,11 @@ ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
li.key = *GinDataPageGetRightBound(lpage);
PostingItemSetBlockNumber(&li, BufferGetBlockNumber(lbuf));
- GinDataPageAddItem(page, &li, InvalidOffsetNumber);
+ GinPageAddPostingItem(page, &li, InvalidOffsetNumber);
ri.key = *GinDataPageGetRightBound(rpage);
PostingItemSetBlockNumber(&ri, BufferGetBlockNumber(rbuf));
- GinDataPageAddItem(page, &ri, InvalidOffsetNumber);
+ GinPageAddPostingItem(page, &ri, InvalidOffsetNumber);
}
void
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 7733028..f069ea6 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -18,151 +18,24 @@
#include "utils/rel.h"
/*
- * Form a tuple for entry tree.
- *
- * If the tuple would be too big to be stored, function throws a suitable
- * error if errorTooBig is TRUE, or returns NULL if errorTooBig is FALSE.
- *
- * See src/backend/access/gin/README for a description of the index tuple
- * format that is being built here. We build on the assumption that we
- * are making a leaf-level key entry containing a posting list of nipd items.
- * If the caller is actually trying to make a posting-tree entry, non-leaf
- * entry, or pending-list entry, it should pass nipd = 0 and then overwrite
- * the t_tid fields as necessary. In any case, ipd can be NULL to skip
- * copying any itempointers into the posting list; the caller is responsible
- * for filling the posting list afterwards, if ipd = NULL and nipd > 0.
+ * Read item pointers from leaf data page. Information is stored in the same
+ * manner as in leaf data pages.
*/
-IndexTuple
-GinFormTuple(GinState *ginstate,
- OffsetNumber attnum, Datum key, GinNullCategory category,
- ItemPointerData *ipd, uint32 nipd,
- bool errorTooBig)
+void
+ginReadTuple(GinState *ginstate, OffsetNumber attnum,
+ IndexTuple itup, ItemPointerData *ipd)
{
- Datum datums[2];
- bool isnull[2];
- IndexTuple itup;
- uint32 newsize;
-
- /* Build the basic tuple: optional column number, plus key datum */
- if (ginstate->oneCol)
- {
- datums[0] = key;
- isnull[0] = (category != GIN_CAT_NORM_KEY);
- }
- else
- {
- datums[0] = UInt16GetDatum(attnum);
- isnull[0] = false;
- datums[1] = key;
- isnull[1] = (category != GIN_CAT_NORM_KEY);
- }
-
- itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
-
- /*
- * Determine and store offset to the posting list, making sure there is
- * room for the category byte if needed.
- *
- * Note: because index_form_tuple MAXALIGNs the tuple size, there may well
- * be some wasted pad space. Is it worth recomputing the data length to
- * prevent that? That would also allow us to Assert that the real data
- * doesn't overlap the GinNullCategory byte, which this code currently
- * takes on faith.
- */
- newsize = IndexTupleSize(itup);
-
- if (IndexTupleHasNulls(itup))
- {
- uint32 minsize;
-
- Assert(category != GIN_CAT_NORM_KEY);
- minsize = GinCategoryOffset(itup, ginstate) + sizeof(GinNullCategory);
- newsize = Max(newsize, minsize);
- }
-
- newsize = SHORTALIGN(newsize);
-
- GinSetPostingOffset(itup, newsize);
-
- GinSetNPosting(itup, nipd);
-
- /*
- * Add space needed for posting list, if any. Then check that the tuple
- * won't be too big to store.
- */
- newsize += sizeof(ItemPointerData) * nipd;
- newsize = MAXALIGN(newsize);
- if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
- {
- if (errorTooBig)
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
- (unsigned long) newsize,
- (unsigned long) Min(INDEX_SIZE_MASK,
- GinMaxItemSize),
- RelationGetRelationName(ginstate->index))));
- pfree(itup);
- return NULL;
- }
+ Pointer ptr;
+ int nipd = GinGetNPosting(itup), i;
+ ItemPointerData ip = {{0,0},0};
- /*
- * Resize tuple if needed
- */
- if (newsize != IndexTupleSize(itup))
- {
- itup = repalloc(itup, newsize);
- /*
- * PostgreSQL 9.3 and earlier did not clear this new space, so we
- * might find uninitialized padding when reading tuples from disk.
- */
- memset((char *) itup + IndexTupleSize(itup),
- 0, newsize - IndexTupleSize(itup));
+ ptr = GinGetPosting(itup);
- /* set new size in tuple header */
- itup->t_info &= ~INDEX_SIZE_MASK;
- itup->t_info |= newsize;
- }
-
- /*
- * Insert category byte, if needed
- */
- if (category != GIN_CAT_NORM_KEY)
+ for (i = 0; i < nipd; i++)
{
- Assert(IndexTupleHasNulls(itup));
- GinSetNullCategory(itup, ginstate, category);
+ ptr = ginDataPageLeafReadItemPointer(ptr, &ip);
+ ipd[i] = ip;
}
-
- /*
- * Copy in the posting list, if provided
- */
- if (ipd)
- memcpy(GinGetPosting(itup), ipd, sizeof(ItemPointerData) * nipd);
-
- return itup;
-}
-
-/*
- * Sometimes we reduce the number of posting list items in a tuple after
- * having built it with GinFormTuple. This function adjusts the size
- * fields to match.
- */
-void
-GinShortenTuple(IndexTuple itup, uint32 nipd)
-{
- uint32 newsize;
-
- Assert(nipd <= GinGetNPosting(itup));
-
- newsize = GinGetPostingOffset(itup) + sizeof(ItemPointerData) * nipd;
- newsize = MAXALIGN(newsize);
-
- Assert(newsize <= (itup->t_info & INDEX_SIZE_MASK));
-
- itup->t_info &= ~INDEX_SIZE_MASK;
- itup->t_info |= newsize;
-
- GinSetNPosting(itup, nipd);
}
/*
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 4f2c118..49f799d 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -23,6 +23,7 @@
#include "miscadmin.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "access/htup_details.h"
#define GIN_PAGE_FREESIZE \
@@ -94,11 +95,11 @@ writeListPage(Relation index, Buffer buffer,
if (rightlink == InvalidBlockNumber)
{
GinPageSetFullRow(page);
- GinPageGetOpaque(page)->maxoff = 1;
+ GinPageGetOpaque(page)->u.maxoff = 1;
}
else
{
- GinPageGetOpaque(page)->maxoff = 0;
+ GinPageGetOpaque(page)->u.maxoff = 0;
}
MarkBufferDirty(buffer);
@@ -368,8 +369,8 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
/*
* Increase counter of heap tuples
*/
- Assert(GinPageGetOpaque(page)->maxoff <= metadata->nPendingHeapTuples);
- GinPageGetOpaque(page)->maxoff++;
+ Assert(GinPageGetOpaque(page)->u.maxoff <= metadata->nPendingHeapTuples);
+ GinPageGetOpaque(page)->u.maxoff++;
metadata->nPendingHeapTuples++;
for (i = 0; i < collector->ntuples; i++)
@@ -437,6 +438,89 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
ginInsertCleanup(ginstate, false, NULL);
}
+static IndexTuple
+GinFastFormTuple(GinState *ginstate,
+ OffsetNumber attnum, Datum key, GinNullCategory category)
+{
+ Datum datums[2];
+ bool isnull[2];
+ IndexTuple itup;
+ uint32 newsize;
+
+ /* Build the basic tuple: optional column number, plus key datum */
+
+ if (ginstate->oneCol)
+ {
+ datums[0] = key;
+ isnull[0] = (category != GIN_CAT_NORM_KEY);
+ }
+ else
+ {
+ datums[0] = UInt16GetDatum(attnum);
+ isnull[0] = false;
+ datums[1] = key;
+ isnull[1] = (category != GIN_CAT_NORM_KEY);
+ }
+
+ itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
+
+ /*
+ * Place category to the last byte of index tuple extending it's size if
+ * needed
+ */
+ newsize = IndexTupleSize(itup);
+
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ uint32 minsize;
+
+ Assert(IndexTupleHasNulls(itup));
+ minsize = IndexInfoFindDataOffset(itup->t_info) +
+ heap_compute_data_size(ginstate->tupdesc[attnum - 1], datums, isnull) +
+ sizeof(GinNullCategory);
+ newsize = Max(newsize, minsize);
+ }
+
+ newsize = MAXALIGN(newsize);
+
+ if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
+ (unsigned long) newsize,
+ (unsigned long) Min(INDEX_SIZE_MASK,
+ GinMaxItemSize),
+ RelationGetRelationName(ginstate->index))));
+ pfree(itup);
+ return NULL;
+ }
+
+ /*
+ * Resize tuple if needed
+ */
+ if (newsize != IndexTupleSize(itup))
+ {
+ itup = repalloc(itup, newsize);
+
+ /* set new size in tuple header */
+ itup->t_info &= ~INDEX_SIZE_MASK;
+ itup->t_info |= newsize;
+ }
+
+ /*
+ * Insert category byte, if needed
+ */
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ GinSetNullCategory(itup, ginstate, category);
+ }
+
+ return itup;
+}
+
+
/*
* Create temporary index tuples for a single indexable item (one index column
* for the heap tuple specified by ht_ctid), and append them to the array
@@ -486,8 +570,7 @@ ginHeapTupleFastCollect(GinState *ginstate,
{
IndexTuple itup;
- itup = GinFormTuple(ginstate, attnum, entries[i], categories[i],
- NULL, 0, true);
+ itup = GinFastFormTuple(ginstate, attnum, entries[i], categories[i]);
itup->t_tid = *ht_ctid;
collector->tuples[collector->ntuples++] = itup;
collector->sumsize += IndexTupleSize(itup);
@@ -550,7 +633,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
return true;
}
- nDeletedHeapTuples += GinPageGetOpaque(page)->maxoff;
+ nDeletedHeapTuples += GinPageGetOpaque(page)->u.maxoff;
blknoToDelete = GinPageGetOpaque(page)->rightlink;
}
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index cb17d38..7422f52 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -71,20 +71,25 @@ callConsistentFn(GinState *ginstate, GinScanKey key)
static bool
findItemInPostingPage(Page page, ItemPointer item, OffsetNumber *off)
{
- OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
int res;
+ Pointer ptr;
+ Pointer endPtr;
+ ItemPointerData iptr = {{0, 0}, 0};
if (GinPageGetOpaque(page)->flags & GIN_DELETED)
/* page was deleted by concurrent vacuum */
return false;
+ ptr = GinDataPageGetData(page);
+ endPtr = page + GinPageGetOpaque(page)->u.endoffset;
/*
* scan page to find equal or first greater value
*/
- for (*off = FirstOffsetNumber; *off <= maxoff; (*off)++)
+ while (ptr < endPtr)
{
- res = ginCompareItemPointers(item, (ItemPointer) GinDataPageGetItem(page, *off));
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ res = ginCompareItemPointers(item, &iptr);
if (res <= 0)
return true;
}
@@ -148,15 +153,27 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
*/
for (;;)
{
- page = BufferGetPage(buffer);
+ Pointer ptr;
+ Pointer endPtr;
+ page = BufferGetPage(buffer);
+ ptr = GinDataPageGetData(page);
+ endPtr = page + GinPageGetOpaque(page)->u.endoffset;
if ((GinPageGetOpaque(page)->flags & GIN_DELETED) == 0 &&
- GinPageGetOpaque(page)->maxoff >= FirstOffsetNumber)
+ ptr < endPtr)
{
- tbm_add_tuples(scanEntry->matchBitmap,
- (ItemPointer) GinDataPageGetItem(page, FirstOffsetNumber),
- GinPageGetOpaque(page)->maxoff, false);
- scanEntry->predictNumberResult += GinPageGetOpaque(page)->maxoff;
+ ItemPointerData iptr = {{0, 0}, 0};
+ int i;
+
+ i = 0;
+ while (ptr < endPtr)
+ {
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ tbm_add_tuples(scanEntry->matchBitmap, &iptr, 1, false);
+ i++;
+ }
+
+ scanEntry->predictNumberResult += i;
}
if (GinPageRightMost(page))
@@ -344,8 +361,12 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
}
else
{
+ ItemPointerData *ipd = (ItemPointerData *)palloc(
+ sizeof(ItemPointerData) * GinGetNPosting(itup));
+ ginReadTuple(btree->ginstate, scanEntry->attnum, itup, ipd);
+
tbm_add_tuples(scanEntry->matchBitmap,
- GinGetPosting(itup), GinGetNPosting(itup), false);
+ ipd, GinGetNPosting(itup), false);
scanEntry->predictNumberResult += GinGetNPosting(itup);
}
@@ -438,6 +459,9 @@ restartScanEntry:
BlockNumber rootPostingTree = GinGetPostingTree(itup);
GinPostingTreeScan *gdi;
Page page;
+ Pointer ptr;
+ Pointer endPtr;
+ ItemPointerData iptr = {{0,0},0};
/*
* We should unlock entry page before touching posting tree to
@@ -460,15 +484,22 @@ restartScanEntry:
IncrBufferRefCount(entry->buffer);
page = BufferGetPage(entry->buffer);
- entry->predictNumberResult = gdi->stack->predictNumber * GinPageGetOpaque(page)->maxoff;
/*
* Keep page content in memory to prevent durable page locking
*/
- entry->list = (ItemPointerData *) palloc(BLCKSZ);
- entry->nlist = GinPageGetOpaque(page)->maxoff;
- memcpy(entry->list, GinDataPageGetItem(page, FirstOffsetNumber),
- GinPageGetOpaque(page)->maxoff * sizeof(ItemPointerData));
+ entry->list = (ItemPointerData *) palloc(BLCKSZ * sizeof(ItemPointerData));
+ entry->nlist = 0;
+
+ ptr = GinDataPageGetData(page);
+ endPtr = page + GinPageGetOpaque(page)->u.endoffset;
+ while (ptr < endPtr)
+ {
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ entry->list[entry->nlist++] = iptr;
+ }
+
+ entry->predictNumberResult = gdi->stack->predictNumber * entry->nlist;
LockBuffer(entry->buffer, GIN_UNLOCK);
freeGinBtreeStack(gdi->stack);
@@ -478,8 +509,11 @@ restartScanEntry:
else if (GinGetNPosting(itup) > 0)
{
entry->nlist = GinGetNPosting(itup);
+ entry->predictNumberResult = entry->nlist;
entry->list = (ItemPointerData *) palloc(sizeof(ItemPointerData) * entry->nlist);
- memcpy(entry->list, GinGetPosting(itup), sizeof(ItemPointerData) * entry->nlist);
+
+ ginReadTuple(ginstate, entry->attnum, itup, entry->list);
+
entry->isFinished = FALSE;
}
}
@@ -583,12 +617,21 @@ entryGetNextItem(GinState *ginstate, GinScanEntry entry)
if (!ItemPointerIsValid(&entry->curItem) ||
findItemInPostingPage(page, &entry->curItem, &entry->offset))
{
+ Pointer ptr;
+ Pointer endPtr;
+ ItemPointerData iptr = {{0,0},0};
+
/*
* Found position equal to or greater than stored
*/
- entry->nlist = GinPageGetOpaque(page)->maxoff;
- memcpy(entry->list, GinDataPageGetItem(page, FirstOffsetNumber),
- GinPageGetOpaque(page)->maxoff * sizeof(ItemPointerData));
+ ptr = GinDataPageGetData(page);
+ endPtr = page + GinPageGetOpaque(page)->u.endoffset;
+ entry->nlist = 0;
+ while (ptr < endPtr)
+ {
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ entry->list[entry->nlist++] = iptr;
+ }
LockBuffer(entry->buffer, GIN_UNLOCK);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index beaa653..6177290 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -42,14 +42,16 @@ typedef struct
* items[] must be in sorted order with no duplicates.
*/
static BlockNumber
-createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
+createPostingTree(GinState *ginstate, ItemPointerData *items, uint32 nitems)
{
BlockNumber blkno;
- Buffer buffer = GinNewBuffer(index);
+ Buffer buffer = GinNewBuffer(ginstate->index);
Page page;
+ int i;
+ Pointer ptr;
+ ItemPointerData prev_iptr = {{0,0},0};
/* Assert that the items[] array will fit on one page */
- Assert(nitems <= GinMaxLeafDataItems);
START_CRIT_SECTION();
@@ -57,18 +59,26 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
page = BufferGetPage(buffer);
blkno = BufferGetBlockNumber(buffer);
- memcpy(GinDataPageGetData(page), items, sizeof(ItemPointerData) * nitems);
- GinPageGetOpaque(page)->maxoff = nitems;
+ ptr = GinDataPageGetData(page);
+ for (i = 0; i < nitems; i++)
+ {
+ if (i > 0)
+ prev_iptr = items[i - 1];
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &items[i], &prev_iptr);
+ }
+ GinPageGetOpaque(page)->u.endoffset = ptr - page;
+ Assert(GinDataPageFreeSpacePre(page, ptr) >= 0);
+ updateItemIndexes(page, ginstate);
MarkBufferDirty(buffer);
- if (RelationNeedsWAL(index))
+ if (RelationNeedsWAL(ginstate->index))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
ginxlogCreatePostingTree data;
- data.node = index->rd_node;
+ data.node = ginstate->index->rd_node;
data.blkno = blkno;
data.nitem = nitems;
@@ -78,8 +88,8 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
rdata[0].next = &rdata[1];
rdata[1].buffer = InvalidBuffer;
- rdata[1].data = (char *) items;
- rdata[1].len = sizeof(ItemPointerData) * nitems;
+ rdata[1].data = GinDataPageGetData(page);
+ rdata[1].len = ptr - GinDataPageGetData(page);
rdata[1].next = NULL;
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_PTREE, rdata);
@@ -93,6 +103,137 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
return blkno;
}
+/*
+ * Form a tuple for entry tree.
+ *
+ * If the tuple would be too big to be stored, function throws a suitable
+ * error if errorTooBig is TRUE, or returns NULL if errorTooBig is FALSE.
+ *
+ * See src/backend/access/gin/README for a description of the index tuple
+ * format that is being built here. We build on the assumption that we
+ * are making a leaf-level key entry containing a posting list of nipd items.
+ * If the caller is actually trying to make a posting-tree entry, non-leaf
+ * entry, or pending-list entry, it should pass nipd = 0 and then overwrite
+ * the t_tid fields as necessary. In any case, ipd can be NULL to skip
+ * copying any itempointers into the posting list; the caller is responsible
+ * for filling the posting list afterwards, if ipd = NULL and nipd > 0.
+ */
+static IndexTuple
+GinFormTuple(GinState *ginstate,
+ OffsetNumber attnum, Datum key, GinNullCategory category,
+ ItemPointerData *ipd,
+ uint32 nipd,
+ bool errorTooBig)
+{
+ Datum datums[3];
+ bool isnull[3];
+ IndexTuple itup;
+ uint32 newsize;
+ int i;
+ ItemPointerData nullItemPointer = {{0,0},0};
+
+ /* Build the basic tuple: optional column number, plus key datum */
+ if (ginstate->oneCol)
+ {
+ datums[0] = key;
+ isnull[0] = (category != GIN_CAT_NORM_KEY);
+ isnull[1] = true;
+ }
+ else
+ {
+ datums[0] = UInt16GetDatum(attnum);
+ isnull[0] = false;
+ datums[1] = key;
+ isnull[1] = (category != GIN_CAT_NORM_KEY);
+ isnull[2] = true;
+ }
+
+ itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
+
+ /*
+ * Determine and store offset to the posting list, making sure there is
+ * room for the category byte if needed.
+ *
+ * Note: because index_form_tuple MAXALIGNs the tuple size, there may well
+ * be some wasted pad space. Is it worth recomputing the data length to
+ * prevent that? That would also allow us to Assert that the real data
+ * doesn't overlap the GinNullCategory byte, which this code currently
+ * takes on faith.
+ */
+ newsize = IndexTupleSize(itup);
+
+ GinSetPostingOffset(itup, newsize);
+
+ GinSetNPosting(itup, nipd);
+
+ /*
+ * Add space needed for posting list, if any. Then check that the tuple
+ * won't be too big to store.
+ */
+
+ if (nipd > 0)
+ {
+ newsize = ginCheckPlaceToDataPageLeaf(&ipd[0], &nullItemPointer, newsize);
+ for (i = 1; i < nipd; i++)
+ {
+ newsize = ginCheckPlaceToDataPageLeaf(&ipd[i], &ipd[i - 1], newsize);
+ }
+ }
+
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ newsize = newsize + sizeof(GinNullCategory);
+ }
+ newsize = MAXALIGN(newsize);
+
+ if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
+ {
+ if (errorTooBig)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
+ (unsigned long) newsize,
+ (unsigned long) Min(INDEX_SIZE_MASK,
+ GinMaxItemSize),
+ RelationGetRelationName(ginstate->index))));
+ pfree(itup);
+ return NULL;
+ }
+
+ /*
+ * Resize tuple if needed
+ */
+ if (newsize != IndexTupleSize(itup))
+ {
+ itup = repalloc(itup, newsize);
+
+ /* set new size in tuple header */
+ itup->t_info &= ~INDEX_SIZE_MASK;
+ itup->t_info |= newsize;
+ }
+
+ /*
+ * Copy in the posting list, if provided
+ */
+ if (nipd > 0)
+ {
+ char *ptr = GinGetPosting(itup);
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &ipd[0], &nullItemPointer);
+ for (i = 1; i < nipd; i++)
+ ptr = ginDataPageLeafWriteItemPointer(ptr, &ipd[i], &ipd[i-1]);
+ }
+
+ /*
+ * Insert category byte, if needed
+ */
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ GinSetNullCategory(itup, ginstate, category);
+ }
+ return itup;
+}
/*
* Adds array of item pointers to tuple's posting list, or
@@ -111,31 +252,30 @@ addItemPointersToLeafTuple(GinState *ginstate,
Datum key;
GinNullCategory category;
IndexTuple res;
+ ItemPointerData *newItems, *oldItems;
+ int oldNPosting, newNPosting;
Assert(!GinIsPostingTree(old));
attnum = gintuple_get_attrnum(ginstate, old);
key = gintuple_get_key(ginstate, old, &category);
+ oldNPosting = GinGetNPosting(old);
+ oldItems = (ItemPointerData *) palloc(sizeof(ItemPointerData) * oldNPosting);
+
+ newNPosting = oldNPosting + nitem;
+ newItems = (ItemPointerData *) palloc(sizeof(ItemPointerData) * newNPosting);
+
+ ginReadTuple(ginstate, attnum, old, oldItems);
+
+ newNPosting = ginMergeItemPointers(newItems,
+ items, nitem,
+ oldItems, oldNPosting);
+
/* try to build tuple with room for all the items */
res = GinFormTuple(ginstate, attnum, key, category,
- NULL, nitem + GinGetNPosting(old),
- false);
-
- if (res)
- {
- /* good, small enough */
- uint32 newnitem;
-
- /* fill in the posting list with union of old and new TIDs */
- newnitem = ginMergeItemPointers(GinGetPosting(res),
- GinGetPosting(old),
- GinGetNPosting(old),
- items, nitem);
- /* merge might have eliminated some duplicate items */
- GinShortenTuple(res, newnitem);
- }
- else
+ newItems, newNPosting, false);
+ if (!res)
{
/* posting list would be too big, convert to posting tree */
BlockNumber postingRoot;
@@ -146,9 +286,9 @@ addItemPointersToLeafTuple(GinState *ginstate,
* surely small enough to fit on one posting-tree page, and should
* already be in order with no duplicates.
*/
- postingRoot = createPostingTree(ginstate->index,
- GinGetPosting(old),
- GinGetNPosting(old));
+ postingRoot = createPostingTree(ginstate,
+ oldItems,
+ oldNPosting);
/* During index build, count the newly-added data page */
if (buildStats)
@@ -194,6 +334,19 @@ buildFreshLeafTuple(GinState *ginstate,
{
/* posting list would be too big, build posting tree */
BlockNumber postingRoot;
+ ItemPointerData prevIptr = {{0,0},0};
+ Size size = 0;
+ int itemsCount = 0;
+
+ do
+ {
+ size = ginCheckPlaceToDataPageLeaf(&items[itemsCount], &prevIptr,
+ size);
+ prevIptr = items[itemsCount];
+ itemsCount++;
+ }
+ while (itemsCount < nitem && size < GinDataPageSize);
+ itemsCount--;
/*
* Build posting-tree-only result tuple. We do this first so as to
@@ -205,16 +358,16 @@ buildFreshLeafTuple(GinState *ginstate,
* Initialize posting tree with as many TIDs as will fit on the first
* page.
*/
- postingRoot = createPostingTree(ginstate->index,
+ postingRoot = createPostingTree(ginstate,
items,
- Min(nitem, GinMaxLeafDataItems));
+ itemsCount);
/* During index build, count the newly-added data page */
if (buildStats)
buildStats->nDataPages++;
/* Add any remaining TIDs to the posting tree */
- if (nitem > GinMaxLeafDataItems)
+ if (nitem > itemsCount)
{
GinPostingTreeScan *gdi;
@@ -222,8 +375,8 @@ buildFreshLeafTuple(GinState *ginstate,
gdi->btree.isBuild = (buildStats != NULL);
ginInsertItemPointers(gdi,
- items + GinMaxLeafDataItems,
- nitem - GinMaxLeafDataItems,
+ items + itemsCount,
+ nitem - itemsCount,
buildStats);
pfree(gdi);
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index b84d3a3..5e77ac5 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -38,47 +38,177 @@ typedef struct
* if it's needed. In case of *cleaned!=NULL caller is responsible to
* have allocated enough space. *cleaned and items may point to the same
* memory address.
+ *
+ * The caller can specify the size of 'src'
*/
static uint32
-ginVacuumPostingList(GinVacuumState *gvs, ItemPointerData *items, uint32 nitem, ItemPointerData **cleaned)
+ginVacuumPostingList(GinVacuumState *gvs,
+ Pointer src, uint32 nitem, Pointer srcEnd,
+ Pointer *cleaned, Size size, Size *newSize)
{
uint32 i,
j = 0;
+ ItemPointerData iptr = {{0,0},0}, prevIptr;
+ Pointer dst = NULL, prev, ptr = src;
/*
* just scan over ItemPointer array
*/
- for (i = 0; i < nitem; i++)
+ prevIptr = iptr;
+ for (i = 0; srcEnd ? (ptr < srcEnd) : (i < nitem); i++)
{
- if (gvs->callback(items + i, gvs->callback_state))
+ prev = ptr;
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ if (gvs->callback(&iptr, gvs->callback_state))
{
gvs->result->tuples_removed += 1;
- if (!*cleaned)
+ if (!dst)
{
- *cleaned = (ItemPointerData *) palloc(sizeof(ItemPointerData) * nitem);
+ dst = (Pointer) palloc(size);
+ *cleaned = dst;
if (i != 0)
- memcpy(*cleaned, items, sizeof(ItemPointerData) * i);
+ {
+ memcpy(dst, src, prev - src);
+ dst += prev - src;
+ }
}
}
else
{
gvs->result->num_index_tuples += 1;
if (i != j)
- (*cleaned)[j] = items[i];
+ dst = ginDataPageLeafWriteItemPointer(dst, &iptr, &prevIptr);
j++;
+ prevIptr = iptr;
}
}
+ if (i != j)
+ *newSize = dst - *cleaned;
return j;
}
/*
+ * Form a tuple for entry tree based on already encoded array of item pointers
+ * with additional information.
+ */
+static IndexTuple
+GinFormTuple(GinState *ginstate,
+ OffsetNumber attnum, Datum key, GinNullCategory category,
+ Pointer data,
+ Size dataSize,
+ uint32 nipd,
+ bool errorTooBig)
+{
+ Datum datums[3];
+ bool isnull[3];
+ IndexTuple itup;
+ uint32 newsize;
+
+ /* Build the basic tuple: optional column number, plus key datum */
+ if (ginstate->oneCol)
+ {
+ datums[0] = key;
+ isnull[0] = (category != GIN_CAT_NORM_KEY);
+ isnull[1] = true;
+ }
+ else
+ {
+ datums[0] = UInt16GetDatum(attnum);
+ isnull[0] = false;
+ datums[1] = key;
+ isnull[1] = (category != GIN_CAT_NORM_KEY);
+ isnull[2] = true;
+ }
+
+ itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
+
+ /*
+ * Determine and store offset to the posting list, making sure there is
+ * room for the category byte if needed.
+ *
+ * Note: because index_form_tuple MAXALIGNs the tuple size, there may well
+ * be some wasted pad space. Is it worth recomputing the data length to
+ * prevent that? That would also allow us to Assert that the real data
+ * doesn't overlap the GinNullCategory byte, which this code currently
+ * takes on faith.
+ */
+ newsize = IndexTupleSize(itup);
+
+ GinSetPostingOffset(itup, newsize);
+
+ GinSetNPosting(itup, nipd);
+
+ /*
+ * Add space needed for posting list, if any. Then check that the tuple
+ * won't be too big to store.
+ */
+
+ if (nipd > 0)
+ {
+ newsize += dataSize;
+ }
+
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ newsize = newsize + sizeof(GinNullCategory);
+ }
+ newsize = MAXALIGN(newsize);
+
+ if (newsize > Min(INDEX_SIZE_MASK, GinMaxItemSize))
+ {
+ if (errorTooBig)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %lu exceeds maximum %lu for index \"%s\"",
+ (unsigned long) newsize,
+ (unsigned long) Min(INDEX_SIZE_MASK,
+ GinMaxItemSize),
+ RelationGetRelationName(ginstate->index))));
+ pfree(itup);
+ return NULL;
+ }
+
+ /*
+ * Resize tuple if needed
+ */
+ if (newsize != IndexTupleSize(itup))
+ {
+ itup = repalloc(itup, newsize);
+
+ /* set new size in tuple header */
+ itup->t_info &= ~INDEX_SIZE_MASK;
+ itup->t_info |= newsize;
+ }
+
+ /*
+ * Copy in the posting list, if provided
+ */
+ if (nipd > 0)
+ {
+ char *ptr = GinGetPosting(itup);
+ memcpy(ptr, data, dataSize);
+ }
+
+ /*
+ * Insert category byte, if needed
+ */
+ if (category != GIN_CAT_NORM_KEY)
+ {
+ Assert(IndexTupleHasNulls(itup));
+ GinSetNullCategory(itup, ginstate, category);
+ }
+ return itup;
+}
+
+/*
* fills WAL record for vacuum leaf page
*/
static void
-xlogVacuumPage(Relation index, Buffer buffer)
+xlogVacuumPage(Relation index, Buffer buffer, OffsetNumber maxoff)
{
Page page = BufferGetPage(buffer);
XLogRecPtr recptr;
@@ -95,13 +225,14 @@ xlogVacuumPage(Relation index, Buffer buffer)
data.node = index->rd_node;
data.blkno = BufferGetBlockNumber(buffer);
-
+ data.nitem = maxoff;
if (GinPageIsData(page))
{
backup = GinDataPageGetData(page);
- data.nitem = GinPageGetOpaque(page)->maxoff;
- if (data.nitem)
- len = MAXALIGN(sizeof(ItemPointerData) * data.nitem);
+ if (GinPageIsLeaf(page))
+ len = page + GinPageGetOpaque(page)->u.endoffset - GinDataPageGetData(page);
+ else if (maxoff)
+ len = MAXALIGN(GinDataPageSize - GinNonLeafDataPageGetFreeSpace(page));
}
else
{
@@ -176,30 +307,37 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
if (GinPageIsLeaf(page))
{
- OffsetNumber newMaxOff,
- oldMaxOff = GinPageGetOpaque(page)->maxoff;
- ItemPointerData *cleaned = NULL;
+ Pointer cleaned = NULL;
+ Size newSize;
+ Size oldSize;
+ Pointer endPtr = page + GinPageGetOpaque(page)->u.endoffset;
+ OffsetNumber newMaxOff;
+ oldSize = GinDataPageSize - GinLeafDataPageGetFreeSpace(page);
newMaxOff = ginVacuumPostingList(gvs,
- (ItemPointer) GinDataPageGetData(page), oldMaxOff, &cleaned);
+ GinDataPageGetData(page), 0, endPtr,
+ &cleaned,
+ oldSize, &newSize);
/* saves changes about deleted tuple ... */
- if (oldMaxOff != newMaxOff)
+ if (cleaned)
{
START_CRIT_SECTION();
if (newMaxOff > 0)
- memcpy(GinDataPageGetData(page), cleaned, sizeof(ItemPointerData) * newMaxOff);
+ memcpy(GinDataPageGetData(page), cleaned, newSize);
+
pfree(cleaned);
- GinPageGetOpaque(page)->maxoff = newMaxOff;
+ GinPageGetOpaque(page)->u.endoffset = GinDataPageGetData(page) + newSize - page;
+ updateItemIndexes(page, &gvs->ginstate);
MarkBufferDirty(buffer);
- xlogVacuumPage(gvs->index, buffer);
+ xlogVacuumPage(gvs->index, buffer, newMaxOff);
END_CRIT_SECTION();
/* if root is a leaf page, we don't desire further processing */
- if (!isRoot && GinPageGetOpaque(page)->maxoff < FirstOffsetNumber)
+ if (!isRoot && newMaxOff < FirstOffsetNumber)
hasVoidPage = TRUE;
}
}
@@ -208,7 +346,7 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
OffsetNumber i;
bool isChildHasVoid = FALSE;
- for (i = FirstOffsetNumber; i <= GinPageGetOpaque(page)->maxoff; i++)
+ for (i = FirstOffsetNumber; i <= GinPageGetOpaque(page)->u.maxoff; i++)
{
PostingItem *pitem = (PostingItem *) GinDataPageGetItem(page, i);
@@ -391,6 +529,7 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot, DataPageDel
Buffer buffer;
Page page;
bool meDelete = FALSE;
+ bool isempty;
if (isRoot)
{
@@ -420,7 +559,7 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot, DataPageDel
OffsetNumber i;
me->blkno = blkno;
- for (i = FirstOffsetNumber; i <= GinPageGetOpaque(page)->maxoff; i++)
+ for (i = FirstOffsetNumber; i <= GinPageGetOpaque(page)->u.maxoff; i++)
{
PostingItem *pitem = (PostingItem *) GinDataPageGetItem(page, i);
@@ -429,17 +568,19 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot, DataPageDel
}
}
- if (GinPageGetOpaque(page)->maxoff < FirstOffsetNumber)
+ if (GinPageIsLeaf(page))
+ isempty = page + GinPageGetOpaque(page)->u.endoffset == GinDataPageGetData(page);
+ else
+ isempty = GinPageGetOpaque(page)->u.maxoff < FirstOffsetNumber;
+
+ if (isempty)
{
if (!(me->leftBlkno == InvalidBlockNumber && GinPageRightMost(page)))
{
/* we never delete right most branch */
Assert(!isRoot);
- if (GinPageGetOpaque(page)->maxoff < FirstOffsetNumber)
- {
- ginDeletePage(gvs, blkno, me->leftBlkno, me->parent->blkno, myoff, me->parent->isRoot);
- meDelete = TRUE;
- }
+ ginDeletePage(gvs, blkno, me->leftBlkno, me->parent->blkno, myoff, me->parent->isRoot);
+ meDelete = TRUE;
}
}
@@ -520,8 +661,14 @@ ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint3
* if we already create temporary page, we will make changes in
* place
*/
- ItemPointerData *cleaned = (tmppage == origpage) ? NULL : GinGetPosting(itup);
- uint32 newN = ginVacuumPostingList(gvs, GinGetPosting(itup), GinGetNPosting(itup), &cleaned);
+ Size cleanedSize;
+ Pointer cleaned = NULL;
+ uint32 newN =
+ ginVacuumPostingList(gvs,
+ GinGetPosting(itup), GinGetNPosting(itup), NULL,
+ &cleaned,
+ IndexTupleSize(itup) - GinGetPostingOffset(itup),
+ &cleanedSize);
if (GinGetNPosting(itup) != newN)
{
@@ -542,23 +689,16 @@ ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint3
*/
tmppage = PageGetTempPageCopy(origpage);
- if (newN > 0)
- {
- Size pos = ((char *) GinGetPosting(itup)) - ((char *) origpage);
-
- memcpy(tmppage + pos, cleaned, sizeof(ItemPointerData) * newN);
- }
-
- pfree(cleaned);
-
/* set itup pointer to new page */
itup = (IndexTuple) PageGetItem(tmppage, PageGetItemId(tmppage, i));
}
attnum = gintuple_get_attrnum(&gvs->ginstate, itup);
key = gintuple_get_key(&gvs->ginstate, itup, &category);
+ /* FIXME */
itup = GinFormTuple(&gvs->ginstate, attnum, key, category,
- GinGetPosting(itup), newN, true);
+ cleaned, cleanedSize, newN, true);
+ pfree(cleaned);
PageIndexTupleDelete(tmppage, i);
if (PageAddItem(tmppage, (Item) itup, IndexTupleSize(itup), i, false, false) != i)
@@ -662,7 +802,7 @@ ginbulkdelete(PG_FUNCTION_ARGS)
START_CRIT_SECTION();
PageRestoreTempPage(resPage, page);
MarkBufferDirty(buffer);
- xlogVacuumPage(gvs.index, buffer);
+ xlogVacuumPage(gvs.index, buffer, GinPageGetOpaque(page)->u.maxoff);
UnlockReleaseBuffer(buffer);
END_CRIT_SECTION();
}
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 5daabb0..a520cbf 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -106,9 +106,12 @@ static void
ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
{
ginxlogCreatePostingTree *data = (ginxlogCreatePostingTree *) XLogRecGetData(record);
- ItemPointerData *items = (ItemPointerData *) (XLogRecGetData(record) + sizeof(ginxlogCreatePostingTree));
+ Pointer ptr = XLogRecGetData(record) + sizeof(ginxlogCreatePostingTree), tmp;
Buffer buffer;
Page page;
+ GinState ginstate;
+ ItemPointerData iptr = {{0, 0}, 0};
+ OffsetNumber i;
/* Backup blocks are not used in create_ptree records */
Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
@@ -118,11 +121,19 @@ ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
page = (Page) BufferGetPage(buffer);
GinInitBuffer(buffer, GIN_DATA | GIN_LEAF);
- memcpy(GinDataPageGetData(page), items, sizeof(ItemPointerData) * data->nitem);
- GinPageGetOpaque(page)->maxoff = data->nitem;
+
+ tmp = ptr;
+ for (i = 1; i <= data->nitem; i++)
+ tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
+
+ memcpy(GinDataPageGetData(page), ptr, tmp - ptr);
+
+ GinPageGetOpaque(page)->u.endoffset = tmp - ptr;
PageSetLSN(page, lsn);
+ updateItemIndexes(page, &ginstate);
+
MarkBufferDirty(buffer);
UnlockReleaseBuffer(buffer);
}
@@ -182,14 +193,48 @@ ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
if (data->isLeaf)
{
+ ItemPointer startIptr = (ItemPointer) (XLogRecGetData(record) + sizeof(ginxlogInsert));
OffsetNumber i;
- ItemPointerData *items = (ItemPointerData *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
+ OffsetNumber j;
+ Pointer dataPtr = (Pointer)(startIptr + 1);
+ GinState ginstate;
+ ItemPointerData iptr = {{0, 0}, 0}, prev_iptr;
+ char pageCopy[BLCKSZ];
+ Pointer ptr, destPtr, dataFinish;
+ Pointer endPtr;
Assert(GinPageIsLeaf(page));
Assert(data->updateBlkno == InvalidBlockNumber);
- for (i = 0; i < data->nitem; i++)
- GinDataPageAddItem(page, items + i, data->offset + i);
+ memcpy(pageCopy, page, BLCKSZ);
+ ptr = GinDataPageGetData(page);
+ for (i = 1; i < data->offset ; i++)
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+
+ ptr = GinDataPageGetData(pageCopy);
+ for (i = 1; i < data->offset ; i++)
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+
+ prev_iptr = iptr;
+ destPtr = page + (ptr - pageCopy);
+
+ dataFinish = dataPtr;
+ for (j = 0; j < data->nitem; j++)
+ dataFinish = ginDataPageLeafReadItemPointer(dataFinish, &prev_iptr);
+
+ memcpy(destPtr, dataPtr, dataFinish - dataPtr);
+ destPtr += dataFinish - dataPtr;
+
+ endPtr = pageCopy + GinPageGetOpaque(pageCopy)->u.endoffset;
+ while (ptr < endPtr)
+ {
+ ptr = ginDataPageLeafReadItemPointer(ptr, &iptr);
+ destPtr = ginDataPageLeafWriteItemPointer(destPtr, &iptr, &prev_iptr);
+ prev_iptr = iptr;
+ }
+
+ GinPageGetOpaque(page)->u.endoffset = destPtr - page;
+ updateItemIndexes(page, &ginstate);
}
else
{
@@ -206,7 +251,7 @@ ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
pitem = (PostingItem *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
- GinDataPageAddItem(page, pitem, data->offset);
+ GinPageAddPostingItem(page, pitem, data->offset);
}
}
else
@@ -255,6 +300,7 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
Page lpage,
rpage;
uint32 flags = 0;
+ GinState ginstate;
if (data->isLeaf)
flags |= GIN_LEAF;
@@ -280,31 +326,62 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
if (data->isData)
{
char *ptr = XLogRecGetData(record) + sizeof(ginxlogSplit);
- Size sizeofitem = GinSizeOfDataPageItem(lpage);
OffsetNumber i;
ItemPointer bound;
- for (i = 0; i < data->separator; i++)
+ if (data->isLeaf)
{
- GinDataPageAddItem(lpage, ptr, InvalidOffsetNumber);
- ptr += sizeofitem;
+ Pointer tmp, ptr2;
+ ItemPointerData iptr = {{0, 0}, 0};
+ Size lsize, rsize;
+
+ tmp = ptr;
+ for (i = 1; i <= data->separator; i++)
+ tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
+ lsize = tmp - ptr;
+ ptr2 = ptr + MAXALIGN(lsize);
+ tmp = ptr2;
+ for (; i <= data->nitem; i++)
+ tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
+ rsize = tmp - ptr2;
+
+ Assert(lsize < GinDataPageSize);
+ Assert(rsize < GinDataPageSize);
+
+ memcpy(GinDataPageGetData(lpage), ptr, lsize);
+ memcpy(GinDataPageGetData(rpage), ptr2, rsize);
+
+ GinPageGetOpaque(lpage)->u.endoffset = ptr + lsize - lpage;
+ GinPageGetOpaque(rpage)->u.endoffset = ptr2 + rsize - rpage;
+ *GinDataPageGetRightBound(lpage) = updateItemIndexes(lpage, &ginstate);
+ updateItemIndexes(rpage, &ginstate);
+
+ *GinDataPageGetRightBound(rpage) = data->rightbound;
+
+ Assert(GinPageGetOpaque(lpage)->flags == flags);
+ Assert(GinPageGetOpaque(rpage)->flags == flags);
}
-
- for (i = data->separator; i < data->nitem; i++)
+ else
{
- GinDataPageAddItem(rpage, ptr, InvalidOffsetNumber);
- ptr += sizeofitem;
- }
+ PostingItem *pitem = (PostingItem *) ptr;
+ for (i = 0; i < data->separator; i++)
+ {
+ GinPageAddPostingItem(lpage, pitem, InvalidOffsetNumber);
+ ptr++;
+ }
- /* set up right key */
- bound = GinDataPageGetRightBound(lpage);
- if (data->isLeaf)
- *bound = *(ItemPointerData *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->maxoff);
- else
- *bound = ((PostingItem *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->maxoff))->key;
+ for (i = data->separator; i < data->nitem; i++)
+ {
+ GinPageAddPostingItem(rpage, pitem, InvalidOffsetNumber);
+ ptr++;
+ }
+ /* set up right key */
+ bound = GinDataPageGetRightBound(lpage);
+ *bound = ((PostingItem *) GinDataPageGetItem(lpage, GinPageGetOpaque(lpage)->u.maxoff))->key;
- bound = GinDataPageGetRightBound(rpage);
- *bound = data->rightbound;
+ bound = GinDataPageGetRightBound(rpage);
+ *bound = data->rightbound;
+ }
}
else
{
@@ -390,10 +467,30 @@ ginRedoVacuumPage(XLogRecPtr lsn, XLogRecord *record)
{
if (GinPageIsData(page))
{
- memcpy(GinDataPageGetData(page),
- XLogRecGetData(record) + sizeof(ginxlogVacuumPage),
- data->nitem * GinSizeOfDataPageItem(page));
- GinPageGetOpaque(page)->maxoff = data->nitem;
+ if (GinPageIsLeaf(page))
+ {
+ GinState ginstate;
+ ItemPointerData iptr = {{0, 0}, 0};
+ Pointer ptr, tmp;
+ OffsetNumber i;
+
+ ptr = XLogRecGetData(record) + sizeof(ginxlogVacuumPage);
+ tmp = ptr;
+ for (i = 1; i <= data->nitem; i++)
+ tmp = ginDataPageLeafReadItemPointer(tmp, &iptr);
+
+ memcpy(GinDataPageGetData(page), ptr, tmp - ptr);
+
+ GinPageGetOpaque(page)->u.endoffset = tmp - page;
+ updateItemIndexes(page, &ginstate);
+ }
+ else
+ {
+ memcpy(GinDataPageGetData(page),
+ XLogRecGetData(record) + sizeof(ginxlogVacuumPage),
+ data->nitem * sizeof(ItemPointerData));
+ GinPageGetOpaque(page)->u.maxoff = data->nitem;
+ }
}
else
{
@@ -554,7 +651,7 @@ ginRedoUpdateMetapage(XLogRecPtr lsn, XLogRecord *record)
/*
* Increase counter of heap tuples
*/
- GinPageGetOpaque(page)->maxoff++;
+ GinPageGetOpaque(page)->u.maxoff++;
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
@@ -621,11 +718,11 @@ ginRedoInsertListPage(XLogRecPtr lsn, XLogRecord *record)
{
/* tail of sublist */
GinPageSetFullRow(page);
- GinPageGetOpaque(page)->maxoff = 1;
+ GinPageGetOpaque(page)->u.maxoff = 1;
}
else
{
- GinPageGetOpaque(page)->maxoff = 0;
+ GinPageGetOpaque(page)->u.maxoff = 0;
}
for (i = 0; i < data->ntuples; i++)
@@ -802,12 +899,7 @@ ginContinueSplit(ginIncompleteSplit *split)
ginPrepareDataScan(&btree, reln);
PostingItemSetBlockNumber(&(btree.pitem), split->leftBlkno);
- if (GinPageIsLeaf(page))
- btree.pitem.key = *(ItemPointerData *) GinDataPageGetItem(page,
- GinPageGetOpaque(page)->maxoff);
- else
- btree.pitem.key = ((PostingItem *) GinDataPageGetItem(page,
- GinPageGetOpaque(page)->maxoff))->key;
+ btree.pitem.key = *GinDataPageGetRightBound(page);
}
btree.rightblkno = split->rightBlkno;
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index c603521..92cdf0b 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -32,11 +32,14 @@
typedef struct GinPageOpaqueData
{
BlockNumber rightlink; /* next page if any */
- OffsetNumber maxoff; /* number entries on GIN_DATA page: number of
- * heap ItemPointers on GIN_DATA|GIN_LEAF page
- * or number of PostingItems on GIN_DATA &
- * ~GIN_LEAF page. On GIN_LIST page, number of
- * heap tuples. */
+ union
+ {
+ OffsetNumber maxoff; /* number entries on GIN_DATA page: number of
+ * PostingItems on GIN_DATA & ~GIN_LEAF page.
+ * On GIN_LIST page, number of heap tuples. */
+ uint16 endoffset; /* beginning of free space on a
+ * GIN_DATA|GIN_LEAF page */
+ } u;
uint16 flags; /* see bit definitions below */
} GinPageOpaqueData;
@@ -196,10 +199,16 @@ typedef signed char GinNullCategory;
#define GinCategoryOffset(itup,ginstate) \
(IndexInfoFindDataOffset((itup)->t_info) + \
((ginstate)->oneCol ? 0 : sizeof(int16)))
-#define GinGetNullCategory(itup,ginstate) \
+/*#define GinGetNullCategory(itup,ginstate) \
(*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))))
#define GinSetNullCategory(itup,ginstate,c) \
- (*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))) = (c))
+ (*((GinNullCategory *) ((char*)(itup) + GinCategoryOffset(itup,ginstate))) = (c))*/
+
+#define GinGetNullCategory(itup,ginstate) \
+ (*((GinNullCategory *) ((char*)(itup) + IndexTupleSize(itup) - sizeof(GinNullCategory))))
+#define GinSetNullCategory(itup,ginstate,c) \
+ (*((GinNullCategory *) ((char*)(itup) + IndexTupleSize(itup) - sizeof(GinNullCategory))) = (c))
+
/*
* Access macros for leaf-page entry tuples (see discussion in README)
@@ -213,11 +222,11 @@ typedef signed char GinNullCategory;
#define GinGetPostingOffset(itup) GinItemPointerGetBlockNumber(&(itup)->t_tid)
#define GinSetPostingOffset(itup,n) ItemPointerSetBlockNumber(&(itup)->t_tid,n)
-#define GinGetPosting(itup) ((ItemPointer) ((char*)(itup) + GinGetPostingOffset(itup)))
+#define GinGetPosting(itup) ((Pointer) ((char*)(itup) + GinGetPostingOffset(itup)))
#define GinMaxItemSize \
MAXALIGN_DOWN(((BLCKSZ - SizeOfPageHeaderData - \
- MAXALIGN(sizeof(GinPageOpaqueData))) / 3 - sizeof(ItemIdData)))
+ MAXALIGN(sizeof(GinPageOpaqueData))) / 6 - sizeof(ItemIdData)))
/*
* Access macros for non-leaf entry tuples
@@ -232,16 +241,19 @@ typedef signed char GinNullCategory;
#define GinDataPageGetRightBound(page) ((ItemPointer) PageGetContents(page))
#define GinDataPageGetData(page) \
(PageGetContents(page) + MAXALIGN(sizeof(ItemPointerData)))
-#define GinSizeOfDataPageItem(page) \
- (GinPageIsLeaf(page) ? sizeof(ItemPointerData) : sizeof(PostingItem))
#define GinDataPageGetItem(page,i) \
- (GinDataPageGetData(page) + ((i)-1) * GinSizeOfDataPageItem(page))
+ ((PostingItem *)(GinDataPageGetData(page) + ((i)-1) * sizeof(PostingItem)))
-#define GinDataPageGetFreeSpace(page) \
+#define GinNonLeafDataPageGetFreeSpace(page) \
(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
- MAXALIGN(sizeof(ItemPointerData)) \
- - GinPageGetOpaque(page)->maxoff * GinSizeOfDataPageItem(page) \
+ - GinPageGetOpaque(page)->u.maxoff * sizeof(PostingItem) \
- MAXALIGN(sizeof(GinPageOpaqueData)))
+#define GinLeafDataPageGetFreeSpace(page) \
+ (BLCKSZ \
+ - GinPageGetOpaque(page)->u.endoffset \
+ - MAXALIGN(sizeof(GinPageOpaqueData)) \
+ - MAXALIGN(sizeof(GinDataLeafItemIndex) * GinDataLeafIndexCount))
#define GinMaxLeafDataItems \
((BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \
@@ -255,6 +267,29 @@ typedef signed char GinNullCategory;
#define GinListPageSize \
( BLCKSZ - SizeOfPageHeaderData - MAXALIGN(sizeof(GinPageOpaqueData)) )
+typedef struct
+{
+ ItemPointerData iptr;
+ OffsetNumber offsetNumber;
+ uint16 pageOffset;
+} GinDataLeafItemIndex;
+
+#define GinDataLeafIndexCount 32
+
+#define GinDataPageSize \
+ (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+ - MAXALIGN(sizeof(ItemPointerData)) \
+ - MAXALIGN(sizeof(GinPageOpaqueData)) \
+ - MAXALIGN(sizeof(GinDataLeafItemIndex) * GinDataLeafIndexCount))
+
+#define GinDataPageFreeSpacePre(page,ptr) \
+ (GinDataPageSize \
+ - ((ptr) - GinDataPageGetData(page)))
+
+#define GinPageGetIndexes(page) \
+ ((GinDataLeafItemIndex *)(GinDataPageGetData(page) + GinDataPageSize))
+
+
/*
* Storage type for GIN's reloptions
*/
@@ -519,22 +554,24 @@ extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
extern void ginFindParents(GinBtree btree, GinBtreeStack *stack, BlockNumber rootBlkno);
/* ginentrypage.c */
-extern IndexTuple GinFormTuple(GinState *ginstate,
- OffsetNumber attnum, Datum key, GinNullCategory category,
- ItemPointerData *ipd, uint32 nipd, bool errorTooBig);
-extern void GinShortenTuple(IndexTuple itup, uint32 nipd);
extern void ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
Datum key, GinNullCategory category,
GinState *ginstate);
extern void ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
extern IndexTuple ginPageGetLinkItup(Buffer buf);
+extern void ginReadTuple(GinState *ginstate, OffsetNumber attnum,
+ IndexTuple itup, ItemPointerData *ipd);
+extern ItemPointerData updateItemIndexes(Page page, GinState *ginstate);
/* gindatapage.c */
+extern char *ginDataPageLeafWriteItemPointer(char *ptr, ItemPointer iptr, ItemPointer prev);
+extern Size ginCheckPlaceToDataPageLeaf(ItemPointer iptr, ItemPointer prev,
+ Size size);
extern uint32 ginMergeItemPointers(ItemPointerData *dst,
ItemPointerData *a, uint32 na,
ItemPointerData *b, uint32 nb);
-extern void GinDataPageAddItem(Page page, void *data, OffsetNumber offset);
+extern void GinPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
typedef struct
@@ -724,6 +761,53 @@ extern void ginInsertCleanup(GinState *ginstate,
bool vac_delay, IndexBulkDeleteResult *stats);
/*
+ * Function for reading packed ItemPointers. Used in various .c files and
+ * have to be inline for being fast.
+ *
+ * Read next item pointer from leaf data page. Replaces current item pointer
+ * with the next one. Zero item pointer should be passed in order to read the
+ * first item pointer.
+ */
+static inline char *
+ginDataPageLeafReadItemPointer(char *ptr, ItemPointer iptr)
+{
+ uint32 blockNumberIncr;
+ uint16 offset;
+ int i;
+ uint8 v;
+
+ i = 0;
+ blockNumberIncr = 0;
+ do
+ {
+ v = *ptr;
+ ptr++;
+ blockNumberIncr |= (uint32) (v & (~HIGHBIT)) << i;
+ i += 7;
+ }
+ while (IS_HIGHBIT_SET(v));
+
+ blockNumberIncr += iptr->ip_blkid.bi_lo + (iptr->ip_blkid.bi_hi << 16);
+
+ iptr->ip_blkid.bi_lo = blockNumberIncr & 0xFFFF;
+ iptr->ip_blkid.bi_hi = (blockNumberIncr >> 16) & 0xFFFF;
+
+ i = 0;
+ offset = 0;
+ do
+ {
+ v = *ptr;
+ ptr++;
+ offset |= (uint32) (v & (~HIGHBIT)) << i;
+ i += 7;
+ } while(IS_HIGHBIT_SET(v));
+
+ iptr->ip_posid = offset;
+
+ return ptr;
+}
+
+/*
* Merging the results of several gin scans compares item pointers a lot,
* so we want this to be inlined. But if the compiler doesn't support that,
* fall back on the non-inline version from itemptr.c. See STATIC_IF_INLINE in
On Sun, Jun 30, 2013 at 2:50 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 29.06.2013 20:08, Heikki Linnakangas wrote:
On 29.06.2013 11:56, Heikki Linnakangas wrote:
I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..Here's the correct version. I've probably broken things, sorry about that.
I'm going to mark this as "returned with feedback" now. The code still
needs a lot of general cleanup, including comment and README updates. Also,
please take some time to consider the open questions I listed here:
archives.postgresql.org/**message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com>
.
Thanks! So, we have a lot of stuff and you give the points for further
work. Could you please verify my plan of work on these patches:
1) Solving questions of archives.postgresql.org/**
message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com>
for
packed postinglists.
2) Extract additional info patch based on packed postinglists.
3) Rewrite interface of fast scan. Do CPU and IO benchmarking.
4) Do IO benchmarking of index ordering.
Cleanup, comments and READMEs are assumed in each item.
------
With best regards,
Alexander Korotkov.
On Thu, Jun 27, 2013 at 6:20 PM, Antonin Houska <antonin.houska@gmail.com>wrote:
On 06/25/2013 12:03 AM, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Hi,
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:* gindatapage.c:**dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever* gin_private.h:**ginDataPageLeafRead() - fetch_att() is used to retrieve
the additional info, but per the definition and comments in tupmacs.h it
expects aligned pointer.* gindatapage.c:**ginCheckPlaceToDataPageLeaf() - comment "if leaf data
page" should probably be "on a leaf data page" or so.
Hi!
Thanks for pointing these.
------
With best regards,
Alexander Korotkov.
On 01.07.2013 13:28, Alexander Korotkov wrote:
Thanks! So, we have a lot of stuff and you give the points for further
work. Could you please verify my plan of work on these patches:
1) Solving questions of archives.postgresql.org/**
message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com>
for
packed postinglists.
2) Extract additional info patch based on packed postinglists.
3) Rewrite interface of fast scan. Do CPU and IO benchmarking.
4) Do IO benchmarking of index ordering.
Cleanup, comments and READMEs are assumed in each item.
Yep, sounds good!
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I've done a fair amount of testing by loading pgsql-general archives
into a database and running a bunch of simple ts queries that use a GIN
index.
I've tested this as well as the two other patches, but as I was able to
get meaningful results only from this patch, I'll post the results here
and info about segfaults and other observed errors to the other threads.
First of all - update the commitfest page whenever you submit a new
patch version, please. I've spent two or three hours testing and
debugging a patches linked from those pages only to find out that there
are newer versions. I should have checked that initially, but let's keep
that updated.
I wan't able to apply the patches to the current head, so I've used
b8fd1a09 (from 17/06) as a base commit.
The following table shows these metrics:
* data load
- how long it took to import ~200k messages from the list archive
- includes a lot of time spent in Python (parsing), checking FKs ...
- so unless this is significantly higher, it's probably OK
* index size
- size of the main GIN index on message body
* 1/2/3-word(s)
- number of queries in the form
SELECT id FROM messages
WHERE body_tsvector @@ plainto_tsquery('english', 'w1 w2')
LIMIT 100
(executed over 60 seconds, and 'per second' speed)
All the scripts are available at https://bitbucket.org/tvondra/archie
Now, the results:
no patches:
data load: 710 s
index size: 545 MB
1 word: 37500 (630/s)
2 words: 49800 (800/s)
3 words: 40000 (660/s)
additional info (ginaddinfo.7.patch):
data load: 693 s
index size: 448 MB
1 word: 135000 (2250/s)
2 words: 85000 (1430/s)
3 words: 54000 ( 900/s)
additional info + fast scan (gin_fast_scan.4.patch):
data load: 720 s
index size: 455 MB
1 word: FAIL
2 words: FAIL
3 words: FAIL
additional info + fast scan + ordering (gin_ordering.4.patch):
data load: FAIL
index size: N/A
1 word: N/A
2 words: N/A
3 words: N/A
So the speedup after adding info into GIN seems very promising, although
I don't quite understand why searching for two words is so much slower.
Also the index size seems to decrease significantly.
After applying 'fast scan' the things started to break down, so I wasn't
able to run the queries and then even the load failed consistently.
I'll post the info into the appropriate threads.
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.
It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today.
------
With best regards,
Alexander Korotkov.
Attachments:
Please fix compiler warnings:
gindatapage.c: In function ‘dataPlaceToPage’:
gindatapage.c:706:24: warning: unused variable ‘pageBackup’ [-Wunused-variable]
gindatapage.c: In function ‘updateItemIndexes’:
gindatapage.c:1182:6: warning: variable ‘totalsize’ set but not used [-Wunused-but-set-variable]
gindatapage.c: In function ‘dataPlaceToPage’:
gindatapage.c:633:28: warning: ‘startI’ may be used uninitialized in this function [-Wuninitialized]
gindatapage.c:617:21: note: ‘startI’ was declared here
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today..
Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.
A couple of quick comments after a quick glance (in addition to the above):
The varbyte encoding stuff is a quite isolated piece of functionality.
And potentially useful elsewhere, too. It would be good to separate that
into a separate .c/.h files. I'm thinking of
src/backend/access/gin/packeditemptr.c, which would contain
ginDataPageLeafReadItemPointer, ginDataPageLeafWriteItemPointer and
ginDataPageLeafGetItemPointerSize functions. A new typedef for
varbyte-encoded things would probably be good too, something like
"typedef char *PackedItemPointer". In the new .c file, please also add a
file-header comment explaining the encoding.
The README really needs to be updated. The posting tree page structure
is a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.
It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending
on what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner
or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version
of
patch by introducing incremental updating of that index. Benchmarks will
be
posted later today..Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.A couple of quick comments after a quick glance (in addition to the above):
The varbyte encoding stuff is a quite isolated piece of functionality. And
potentially useful elsewhere, too. It would be good to separate that into a
separate .c/.h files. I'm thinking of src/backend/access/gin/**packeditemptr.c,
which would contain ginDataPageLeafReadItemPointer**,
ginDataPageLeafWriteItemPointe**r and ginDataPageLeafGetItemPointerS**ize
functions. A new typedef for varbyte-encoded things would probably be good
too, something like "typedef char *PackedItemPointer". In the new .c file,
please also add a file-header comment explaining the encoding.The README really needs to be updated. The posting tree page structure is
a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending on
what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)
Apparently some bugs breaks index on huge updates independent on
incremental update of the 32-entry. I'm in debugging now. That's why
benchmarks are delayed.
------
With best regards,
Alexander Korotkov.
On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner
or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version
of
patch by introducing incremental updating of that index. Benchmarks will
be
posted later today..Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.
Yes
A couple of quick comments after a quick glance (in addition to the above):
The varbyte encoding stuff is a quite isolated piece of functionality. And
potentially useful elsewhere, too. It would be good to separate that into a
separate .c/.h files. I'm thinking of src/backend/access/gin/**packeditemptr.c,
which would contain ginDataPageLeafReadItemPointer**,
ginDataPageLeafWriteItemPointe**r and ginDataPageLeafGetItemPointerS**ize
functions. A new typedef for varbyte-encoded things would probably be good
too, something like "typedef char *PackedItemPointer". In the new .c file,
please also add a file-header comment explaining the encoding.The README really needs to be updated. The posting tree page structure is
a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending on
what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)
Finally, I've debugged index update.
There are benchmark scripts attached which I used for testing. bench.sh is
doing following:
1) Switches ~/postgres to given branch, configures and compiles it
2) Initializes cluster, runs postgres, imports mailing lists archives which
could be downloaded from here:
http://mira.sai.msu.ru/~megera/tmp/pg-mailing/archives-9.1-main.dump.gz
3) Runs index creation measuring taken time and index size.
4) Runs set of index search queries measuring overall taken time and number
of used blocks. Queries was extracted from mailing lists web server logs.
So, they are real-life.
5) Runs huge updates and vacuums measuring overall taken time and final
index size.
6) Rerun set of queries.
The results of master branch:
Time taken by index build, update and search:
event | period
----------------+-----------------
index_build | 00:01:52.154299
index_update | 00:10:42.982413
search_new | 00:26:14.294872
search_updated | 00:27:06.10298
(4 rows)
Numbers of blocks used in search (not very representative, because it's
mostly consumed by heap fetches):
label | blocks_mark
----------------+-------------
search_new | 848156708
search_updated | 885122373
(2 rows)
Size of index newly created and after updates:
label | size
---------------+------------
new | 884514816
after_updates | 1595252736
(2 rows)
The results of packed posting lists branch.
Time taken by index build, update and search:
event | period
----------------+-----------------
index_build | 00:01:54.363988
index_update | 00:08:55.291099
search_new | 00:26:06.262403
search_updated | 00:27:07.501142
(4 rows)
Numbers of blocks used in search:
label | blocks_mark
----------------+-------------
search_new | 847591514
search_updated | 883928608
(2 rows)
Size of index newly created and after updates:
label | size
---------------+-----------
new | 421330944
after_updates | 718921728
(2 rows)
We can see there is no significant slowdown. Updates even becomes faster
probably because of reduced index size.
Now, I'm going to take care about WAL, refactoring and documentation.
------
With best regards,
Alexander Korotkov.
Attachments:
bench.tar.gzapplication/x-gzip; name=bench.tar.gzDownload
� �8R �]�rGv�o>E��D��I�*Yviw)���H������1��97���*����Ry��O�b�~�t��@]����<D�m����������9=�3ZKF����������';���������g����76����[�;��uJQ�+KR���"RQz]����}�����7��2|��VV�U����=}��s�����@<�~���K~���WV�9�W�����|���J:�
�/��"v�,����
�������!���kW2e+�re�qY%��D���w��p.�,e�oW��l5l���k6s��R�b�X��H�^y���^����� ���MT+k��(<�^�[]!�b��������Vc�4�����
�V����z��%W~!���P���Y������u�J-�R���j+*`���=r����<��~�"C��C��_{����?J����7�E�JF\����������f���qMe��k���J�?�O��L����;���B����]f������V]�#�o��"Y}���p��Y�yo��M�<��C�I#����[�F�����X����F��`(<�j:��_�1J���_
����"��}C ������'�;�|c-�JWV.�~�W���c^3]��f��x���c���|���|���m������[w��m���{��=�{s�;<��1L�H��K"L��W�/��>?��Tb�w���C��T\�_�\]�ix?�����H��Or�P����b�v�e��SHs�&�TKa�'��I���w����~_�o��^1���*�F�yw����=�=��_�PM�?<������Q���y����y���W�a&}���J���h����'��{o&��^1��yg'��L��@K�����ek�#kM6A�b���O2>Oe����k�o������5���.�z�+��M�~����w����������}���t���2��������|��X��]�Mcl����0|r��An�����q#m�����1v��b��H�
!��fM~�k�����x��x�����<1���������6^1���~aq����AV�9s����}c�Ml���in>������qO������~o}bN
��c��f��f��f��f��fxl�����Kk��>�C����S�Jh���_�o��9v��5��#�F>�o/��7�������m�;����D�w��-�&^�b��b��e�?���>�a
��t���E��2���~�������{�����<�<�0~�/��{��{'{�����������9y��U�RTA��C$CHRl_�//$� �/���{��{��RUC�/�1{`�p�HA����}��)�����rmR����3��D���Hy����%&��]�VA �~�gA��o`�7���P�/��.K��<�=�
<Efp�e4-��iI�����m �Dkc!�����W���%����6�����z�m���T�k���<�>/�i_�;�ex)����JU�
}���(e� ��.�i_�����G�vB�\*��&���E!�
A�jh*i�F�$f�d4-fb���+R ��S�@3��'0�<O WC���,=f�p��`}�l}��Z.�W�c��9�g�� H�q����ml���)\7��{Sb;��^���$~JG =�,W�
|���11�&��=]���0�/q7��B�-�'���iy��L0Y�3��op����(���q�^���#��2t�1���L�K��
S���+y)|� �y�R[t�Tw������lXA;`�;4�
�7�n�ujh2#�N�<I�]�z ����VH
<���HSd�u��{���,�})\Dn��S��vd3!���}20�,�S>�C��!:��P��Ct�w�%X������Y��E�jH�K&+_�c��Y��#W��C��6h:�2\RYK:����$w����373��,������4�2a)oJ�!������Amt�']�W����)G'wBvN�����1�=�� xq�3?5�$��^���s2����9��e��G�j�/LJC`O�<�!_� ���
��q�����%Y
Z%�`�<r���H`l�y�'kt rJ����zz��y�K�i�:��-�$�zX�����E�q��s��������D��]�������C�'X"Y�F�fg��Z�Jh�:���83ASXq�'fL����(���J`���������EYg� i ���D�!\�-&"3��7�`�A&wd7 :��yq|t��_�`�0��4��`���S���btz��)��u��\� n�/���J *��b�������������G�_�UG#�'p���r%M���9x#9����2m�8#m$��MT�5�\i����)�
�k������L��NN�w��og��d�q|���L�G�)EZ";��_^w�6 (��}�<3��]�$�>�7Db�M���+���\����l�Y��$�}���!:�M��f�� I�����SB������Tq��L�?����e��Z��#�0#���Uex(����kS���C�������]l�kPc�a�q�����c�d)����)<��P�L��]�l���8#p�����*��k�)BA���$7t�G7�q7%hV`*a��?,���f��,��d��{q��
�o1��~��o�,��,�7��dfg���B1�ehUKm��Lrb)e.]���s�U��~�Ek�M�45��"�"�4�X�[��}A�7���S�r�`��547���
I�s�����rJx�"�DR�$'\ )&�������������p|n��Lv{��C�ZU_2<�"���,E���)���q���6Om_��rD/jh�EM*�5&X�����(�1E�H�RF%l���K���T��f�������.`�2���;X%� �n�N`�
�*s�a#=���>��8��f)�y�1W`��L0��9SQ�T� ��J,���;wRt������}.xsGE�@(��G���b��
T�Yd��U+,�,���]5�Y
���K:`��:��
��z
r j�`��������%~#�$�H���a#���/j���5����0HYOV�)��M�����H��H��3����0P�9���G!�/�D�"|7B���e1�_[A;�����=���� �$4��&Z���������� QA�D1��t
M18������:��'�RNB���4�H��`-�����EK|�����f��i�;��l�J�����d�����=H�"�;�wbI��%~��f���Xj�G�2�����o������~�+��I��v�S��l&�!j�vu�.�
.����<p)E�����%�:�O�������
<�/n�w�9��F�b"?��2'�V��e]0F� R`�'�X��5(* �@�Iq��J��\b758]
*a���u��f�V�s��8��Y�����)������6VN�)�L1�e��+(�������n���u+��G���b��M����&�}Q�,��3����m)WK)��������Ij�Y�����a[7G��0���������cc�����������M+����E�kh��J30�g����$���#��t���j M�}Q�w����5$�����\��xL��]�B���5� �#��t�����0��6KA2� ���G�
M�a$T���h���9y_�mA�r��^�+��R�8\�$U�������}��qm�Y�;�w"�"����iNX29��������&���rO�;!�'PA�}k�/��ku�L�xv��= }��N��1�s4����;N0S$3��x7��������<?<�n�m�.�6n�cq�$x^S\�Xv��,��2�mV��� �Z��h��F,�������,?Wp
n_����Z�}�d�Pp��"�8��\z�L� 2����w��/<�� �/����Kf�F�p���~��a'VP��%?
�����_�zBrr�P
��6�$iS�O:��'
<A����EG���O�'`SH&���R%=���G�2���B�
�
5�=z?l���F��� �/�_Ar�`
��In�+5c3q����W�Z�"�G �]F)+i2�<� ���%���~����.W\��k���[k����#�~.���y,�l�Z���f
��|�0bIl����y�����R�[7r8���'�o�z�{�y]s�
����O���_����~�b�^U�q�~2��/�1h�%H|F5Q ����J��'i_�����G��>��X(�F
�5�B0�x��R�,�0�EWP�Y����kR��<vso�"p��hC�����X���84a������4����A5t���B�^�-,�c
&����<���NZ]��\&��)r
<��*/"�/l�<i��&`���g����b��s��A
O%�BI������Z$��!���z,�8(�_YR������{2��8nr���1�aMV�S����Gp��$��K�lG�gd� d��P��Zz�,���*eV3�A��6 �&hTX�w�U�L�4���<:0�t��������9������&`
w!�pr*w��H8���&�����I@1�r_�v�]�
L����Y%��q�/3�2�I~�d>����q��F�o��]��`M�������K����I�F�S0��4��!�6���i��bv�F!��>�)��)�T[��$Q�y�l(�,J����8T�G8��������?���T\CS�|6�bJiUnx>h���tQW�L'�"���@0gC��x��H�p6ge-�0�/�`o��wr���w l�i*�'��%�� � �`O��Io��X���OJ�C|y]\�Nz������H:q���;:8�=�"�����}�M���L����l<�]��;�:y��������Kw|�R�R��`�@�9yE9W0"+!��D2Y��p��{NWo
�*��TyT�<�U��n�.e��#����J.��1K��O���HR4�a
��o})-���o�7 H��n�{��8��i��D���?(�f�i\D����t%$�����pQ����'
UXYJp���4� ,�c�x���<4��`O�6����S�;cls��\O�u���S0E~'A&5M�$�,2OU�i��c�����t��Ic���8OBT9(��j|���������a��Az��B����p���UB�'!J8�=���`j�"w���|�78�VP*Q? �I@Pj�, X �>��+0����syX���R���M}K%��n�)
���v�����v��]���|qej��� ~���~
�[~Y����'��>3���.0��,UzD����kh�^�8E�H�������������!;����|h����d'e~���2N�9C;ZN�Ic�i��� dhYin��H&.���b�`��u8��dg
b~mw������Cd
� ?��� R��,mu���C��a��-�"o�"0)�/�5�8�E�PjR�l]9)��-�1�X>�*s�`]|����n)�����gT��1��[��n��E��i�����0�m��Tq�V����!P��s�p��(= ��/���wo��\�X�
z�r��B��H����3��.�L&�F�����DS��s\�Y���q#���1�Q6S���&��sb���O\�)��k�� ��g
,n�M���|�'�}'W�M�0c���{���CX����%Uu�<��&��C��E�t�\�<f(�I)w��h(��q����A~�S I���!V*���*�I�e5�C�9KG"d�G��6(�a��TQ��Y�z��jfH�U�
�; ��}�;X:�&�,V2�2l��� ����U�� ��3��6�o�$-M����_F�u^A�n������@��b���:���_��M�J6dL�;\&�6:i��byH�6/�t��*:b��Qc!HAPKw���r_��ZOe{��&�L�����M(&���cs&�`T���D�V��X5h�SL����R�iP����`�Q�%k����o�I�B�����G�?9��a�&S0���K4���v���"�S����F��EX���rs7�i|�N���S0�-d���]��Q�KX���+�^G����ne
.�*h���Zax�:�}q����qa��o��j��������M����78�����O���%�6.�1:]A��C�N�����"4����2�y�` p%�iBu�,��l%����s
i/O�}�� �����v��9��9�M��������o8V�H�i�:{Y[|��E4�!�=���DE�a���j�w3��{��e1z$�L"C��-K���P�{9>�V\A Z/�+S��J(EdX�� R��tl�!y��
<���� k���_�3���[�=l������$�����'���!4!*�V�������V�4M�7���r*������� ���<$E����1�#��aDx��8V;�=���m{{ ;�6�m�on �v/`na��� �#��xU���������-��D0�ET�#���a��gEZ���/����T��/���Di���@���{Bz��������zMq�XME������|�R��@�"����$BK�)��h��g��}�+�$ �h�����]���!��D��,�)�T�c��B�Gs�c�*()�t��pz����i�������m��g����V@�'5�Sg4��s��x�4�Q�����Hp9g1�o�H��i�vH�*�T�\���rf:=����1}��R�&&��@�f&I��#�&��{��0`��Cr�lI+���ej�q�Cbzh��Z���+(���(sa�+��,����V�S �2�]sR��P��,��������I��b���G����4�cpS��G9���Zy"A��R�����������U`�+��w;Y1��e��$nK��O$,w{�&��U���LiV�:�/v�=)��r�x(�� �]�m����T���M=%��Ar����d�ku��hu��l�0��9�������b(�Y�@��r�J��3��:�Q�NM���B�ESU%9��Bh��6��9���B�!��'e��{F�F��(�3��X
<���x;�Kp��F�k0��R$m�
!�d���n��$�B���@5���h%��Q�>>,�'�(��D�6�;��]���h$��{+�B����������b2�u�sa��!1='k3�C� ��I2P[-���&X��Za+�)r����4�=�������g�B|r]��S��N�����"�IR�3,D� R���-�O���H����zh
a��X'/�Y��ogZ�P��T�jx���+6�Y��m��?��x7r�^�|�e?@ �KW"I�7���M ��+9\�[4A�WY7���t%D���N�����W�=�z�_�����Y��#w~7�����b�D�)�'��N�����"�8����p(�����X�����?%�B�3z>�`y�L��pk�
Jv��__a
SS ��kc�6?�&^booSI3��D[n�Q&��m��f���Ud��9-d�'xQ�!��^=��f����{��._�k�vE��X��?�:�@�{����s����W�np>��Rf����f4~u7���{�<B�2/R��x����S��K��a>)��E�+��ZmUY�So!������X^��#�q=�B��n�O�p�]_�o��7������]!�kR�A�K`�$��R I�����,A��j�����������q��(����I�\'Im.���N�`.[lWD�$������{,-6]���n
���r(�"I���n�4H NC�(rq����/��+R�)�����2o���o�"z����l� �&Q�S4�R�m�j�p�G`��=y��+��Z������j�&�c
���� H����] {@(0��tV������� ����;�cA�}���M�|���"�h�Ja�{
��(A�<�`eQ�$xx1O%���7��]��&6Q`r?�M^�_�/o��`��H�2l��5^yJEO��i�b!�]��`26<i����T��Xq�A9�t��{2h�%��j�1�� ���@s�Z�p\�T���w#�Cjh7;�Nv�l_����F�����#4���8���WDN���5�-�u!Z-���wjj��}�y��a����P(����VN�&x�W�D�D�D�D�D����&��_k"(Z2}�.�$�>?�q�,���uZU��*I���0��F�h�/���F�2]����x-�{-���/����J�H��%�-p=��?��������;O����k�_$��]/��8�:�fu�+Yp-|�+�A9����-�5I=�]k~�I{�8��*v�P�
�s�z�K�
���C�(bh�t����d\8kq~
<��9�A`io67fl���]��'(��3�����������b_�}���A�j��^S ����A�
�}b�k��5���&��0�L�I�:�����y������ ��^W�j������Ss���<�������$��vxgs����-����-�xOF�4_�
��p%��B2����RXS�
��An�e�9��0k�=�N�kvu��+���?)����I������$F�����}}9��r��],���]@��'������c(�t�����~!����}!��HY�:r�`:bF��OE/�8�H ����������.�+xGEVl�'M���G,�gWDOW2�s�87��������
��%Og�u45���7���_��/B�u1 <�p�!e��T�0�������S���~�X��x�"p,W�$xw�X��n �"���86h
x���,�Un��I�=��������!�/��&W3��v�����W����F\B)B3�`��
Ku�����A�XS���i�Mv7X�L�������p��>q�Q��=��o��q�).�vc�>��{IrA�:��K6h�1�8�������'����v$�&��q��[��`5����-�]T����I��c3�B�8�)TY?l�^�_����}!���� r�"R�;����/?}���|6?�� 'P,E��T��g���[6� ��.��*OgJ�>�
���0�b`W�-��*����!����)�}�V5�b��
JrHLQ�m����,��� �"zZ�S~<|������n����H��Ch��������������lo�];@�T ���X���5<���
���%1�@y<X(0}���]�lG���_������3� ���G��<�R Q�r�FO����I��,|x�@���?��!������9�8�����O'���M_AO����Z>,����Ss�����rzr��'J!(�k�dw����XZ�n�`�R�%���]�$�x\�4+�Ey���,U%l�����������un��8i��D��{��ag�����2��s���/�M\���O\�k]t/Bu+�Q���=) �E��)�TIk ��{4��m������G�+>
�Uhj����QQnx�|�Kd`�
JvwhY��O�m�}���#b�wW������Xp$(�z���SUX���\*��j��R�jz)x�� �8`y`��x��J&�:�&�H
{�{2:r��,Y�\�
����� ��o���o2SY�O��C��O7��`�u8Y������+����*����������xab�����>�����ndd@�jh��8��?k�+loh�Z,���PD��@R�&��@��9���"��c���u�c�qS A�-�@��$��
%,���R���P�
1�*�]�����j!��JfQ��D����g��S�__����8mg���=��Z+���'��o��_^G�������+� �[�������Q��g�=�H|4���*�%K�SS�@�^�u2���'�������F���K0��#�BY^ ��A��]^�����Co~b^9����{���^���4��\L�\
��1��)��E^��W�����G`��
���+RQ�W�c��L�O�h|��X;F�4�6E����_���M��w
�C�,�`����'8�laJ������ "����$
�h�cE%!�2����� K�2�?M�
J���)�Ja��9�����I�����{5xq� �*0���Le�L���A9�*C�NE�h�
������x���f�D"we2?�g�}�[iSd�"������n� Y����~������jz���t>����vG�{���(e�26�RpH�&=�-�6��d�w���� 7A-��!l)�����[4��njSa����':3��B�,"+]L�Y���'�"��\�Z&F�r[���H��*�42`OA�{���l�P�x���[`�z"��G���������I� �4��Jd*�6T�"(�n� �����Q�la��S�\�]����c]�`��~����B�J$�/�D)�H�,j0{�0%�����_��N��<+�.��/yGA\��Db��ER�#x���lZ(����E�dj��o1;�@e
����,wE�(�rT����{,����R$� J�Q I�cb��`u8�{���@s�?���{��� �R���k4���(�
��X'�
���U+�\�8:nN�� �������7 �V�,��?��s�������<+����M�d3�k
�*�u(���y_S�y0W�X�X���(V��OI�~�5to���i^��.�������]"�v��6��]^6���5����|S�*�Y�n�TO���~��/���$���M=_����K#� �S�-B��o��4����%*��`��1���D��%^�!�m�nfK�b�Si�������o�����JD����I�Y�Z����K3
��CB6Q��)f\��=(����=�?� ��ZR����
>��h�
�'����X5z%8=�!���6P
�t����M�+-��: t6 �c��r�'$��Y�l�h���d9NW���`"���Hb���st r6�TQ� ��D:8��0�f3�D�-��
�`*I������~�����:t�$l�i6�-�����������^S�����O>��U�������p�u1d�jk�a�%���b)N��`e�1\���������,F/�I��S�h��=���7:j l_$T�E��< ���x�
"���`�v��` �������!����:����K��������X=M�A��������98�aW�Q��d���z[0������z�r�]�4P�����������Q��X�)��Uw$�O�bxu1c
�6����V������|��wl��L�Mq�LB�91c����Sx���5J$����b:8NZhO�}V�`:bn��#�"#�3z���6ivEX[��] ��Lk��������M���3�q�O�"��s���z�������5�X�)���3C:�Z��c<)�����)�j��22�����f-���
^<b����^R�]��r�����^�����7w��.���G>���\����#��`��o3�W\��|�&�%�k�����6��{�Q����#0k"�)���_���
K�6$)� �-���/Z_�}z�9��*
�|W�^���k%����n�$8�ln��\�%�:m�����������"�
��<�����
S�ry�G��-�}b�����+.�<�_c���,�B�r�go���N�ys`�S
�>���������������������W����\�I7.*����gY�w�4��4�'�}���%D� ��*���7X�Pi��k%�l��!M�j-�
��@�����qG���j �{��B���KWx��
!���ZMH����:�`��
� �n!��_ I�F?o�z��<�N��=AD��7��<�N�=��Zu��m)"��Oh��Lr�k�"�2�(�Rz_HO�2�}����m�4�4s V� z1�t
�MCp��C��:��K�����������bW ����bc����j�����������BU���(��qk��p�|F2��.��L���{2(R4l#J����)�^���b�P �>�� P���8��H^���� �]`�����)�<_�}!=�o�/��T�� ��;M
�
�����$%#��S�����m��Z�$�$���,���l����J������{�X�
L�*���L�/T��g~���;h(��H�����f[9������e���a6�*d���7'}��>�������/^Le����K���r�Hs�J���Zx���l$D���ia$�������2�s�F�A6}>� ��/R.���wD_��{��.%a��V��Q�������������n�zx=���� 0y��|�wJ���'wd���#�1@�&?S�=�}x�vE���.�d'�R
l
fO���{m!�o�-L����V�D`���J��tOF�"�E��v;������]����(-�V�+��u��>x�V��<���*��S�������B2���)��q =7an�{�t�VgwYG��D��5��a��� �h����Fl�d{�i~���@I$��PI %8^\���2M �s�4�?�pG�{�����ph3�'� �S�"f�L����B��@�(�H�������t�A���M�E���c�[������ .�!�D�]����2����������2������C�)�^6�jl���5����CAjh
��9vFf�m�����r�i@�� R�����,����e�CP���q4<K3z����l��X���5{`���K�P<�Fy
M�6�Y�Tf[���T�d� ��^�� x}d�rTy��
�d�3�)[� k��P<f��}4��:��;(����S ��.��.$3����#�';:j4��}.Kd��.�.����%����zp;������)*�9��[a��5��_^��hm�=z��<��/�>����6�"jE���X�Z�&r���4���\Kd��:7�fpw7�+)��S�:���\]�.�:�z|'��;I�]�d�.G��o� �Dkz.��T��k��Y�
��}�8U��2�sA�1�2����R�r���#�������r�-�z�<P%���O�&�B��f�d��,��p�aA��v�od�'42�@`=�������<����4���[L�-��Y���$ON�N�-�'�����.V���5�o|�h�T�� �#�f����2��������/]��N��gb���2�d��G���R ���"AS7`�
���~VeC�C���;���kW��X����SQj~�*��9Oi"0�\%���|��}��B��@��GK$��.� �v�!K���N?$��|�����v$�yYh�,��{a�������.�X�����u*C�f%�3����-iv��e��Z��i2��%p\v����|��;�'�����C(
|��R�"�x�~O���p����2�m�=!������{'C�N]"(�C����:q~>mzc`����� ��+(AN��}�?��#I~��������k,��o�\��W�$(���+u8��$*; ��jp[5���H��R�~�54A����=����O�N��'�DD����Z��������2��������H��k�s>�jU^!@��(�\��Q��i
�e�Fo�d�����)�@��;��x8�,����;;6X�4L�gJcK����/����0�����K
�@6����u�%%S;������88|��(������qu��#���iEv� ��T�Ij
R ?��h�`����K���Is5}���e_�#�+I��H
�;�h�Y
N�&�m�+(E����bQ�����8�Mp+8� 9�����gX����6Z�w��x������0����s��� �i�Fx�Y�_������?a$
:�����R�,�K���i���w��t�E�.���������_�$�E_ep0v�>�?�V��6x����/����3����7�|�^��K��B;�i��rm�fmYH��
���hk+I����0�a<�]�`E�B�'�����s����� n�d��a��(��
���`��3l�Nc����\JI�������}z*m�J�pw�
�>��Bvj����p���1tM 4�����"bR~�
���'j���dX�.��Og4��?��P�Z��'�Ji,��+����XHh ]%�2_��@ql��!p�y1v�@B`�3Q���@��S<�~n��M�3-<��Y(�����*�#�.��=r_0�����$P+0��XO#��]��n���Z�E�*�����$��:[ 1
�p�����b�4��_���9K��Ta���r(^��e(WN�2�%MM���e0�M�{4 �OR��kS E����Sf�8��{2��`���T�MXI�M�v�Q�=�-��m~�4��W��Bt�UP���I�q��L�Z��CS� UWb#�<9Qf8EY:X�NS�6����l9��R�sv��A��20cR�G��'K�� R��GK�?��T��G�%g��F�^�����>�
���`�JD�o���U:Zu\;W!�G��pu�>�o��:���Sr�x�f����F=e�B��)O���
�R��
|��v�����?B_� �d�g�9F$
��L�T��)�2,|�+�}�_�}~��!r��E\���'��;�����Ow|���.���p�H%8��8����~��E���w[�=.`o���3g����<�"�'$�O���e���������N�~��vwsu����/�-�w]@�@
�l m�T��1_�`���'h�et��5�*����u�������:}������ZJ��(@��NQ�`��:����7��q]OD�x�
�b�yz���LA����>;C�p6�T3�3=�f>t���+����d�����p���h����+�X�9�������0D�E:���sp�� }��?�������M�"O����d����X>���K?���:kF
@S���\^O�h.o%�c�C����+�dr��c|n��W�7��l�[��s�FEm�,���.�'��=<���dP�dm�d�DK�Sk�=�����������e�%pX�%g��K��m� ���AP�R0*]����i�h#h��&�kn�3�K$E�����09����b�;zI����:�
�>��������_�?B�������x!�ky��tR�P�5���&�X�A`�3����B�@}�4 1���S����D��\7g���w����������
_��XT������.��c���(��Y��� �c��!��Y��.I;�����wX��
����|�(����L��|l*�L��A�j����[��U#�����R&&�B�d���{��&��r��#�$�,�!�^V�)�������h����clvwO�,��3�k�w$�RtfHBPFK�L�(�5���Esb�|�0���R��*Q
-�3}�m���>?���$^�I��"(���.%%�������N�b��
JQpdt;22p��vPE��}T��tCO6�oQ�;���}y��������D�W$sg�F#iQ�x<���P�[#o��;�x��N&�o����C?+F�����������-����6<aGEB,���
�Bs���
hc�����lJh���;�1�����Q�_�
���R�$h>`^�(��;f������N��rhB LF���5�]N��5v/�Zc-�2s�7��C�#����6;������[�Vg"���Li�$GM��z0���.$�W��0�9aBk����c�XN����vs`����v�"Ho�Tz��E�[0E&N,4�a������ ��.�_����/o�����7����!1=y�� ,{���@��+��\�Iw���7��q��X� �hFC���l<����#��2SZ�y�h�C����+�� [�.>9�t�0���t��`��JB
eD�F�
Q�
��7Tk�l��d�X;�#I5GR�dhk1vR�8��F��F;�B/�j�x��U�Al��N�,B��%����M�t(��+��;��5�Y��0O|m���fr���b�����g�"os�,�G�LS�Q9 #y|�M�(�����J���jD��W�2�p�������.����M� �F_"��4��a����ke��Na��)����)3�I�n
�H�J3�����Zn)t��WoGYRsX�]���3K��`)hM����U���vE��S�y��SS@/,��F�����!K9'�����X�O����5���!��%)�w�M�#����!1y"������ ���lo�3]������s�l����`����b����j�n��i������ w�����|��t
��o�����Y��H�@$1�`3|�x%|����>�%���G�BFg�Qp�6h����)�v��o��=&��
&����{��Q�0�l�����
��X��O�5��8��suuZ-���}Js����V"i�#�����m�+H�G������`�j���/��
�
K�c�L����Q[-����'�u7|i���|8�T���@C�.9�@���L ��[0EbQ�`��x��>�� 8lgL`��
�����G���'$t;���?b�r ���{=[K��� G��������`��Ls���)���*iT���ao�T���C"�����K�I@���L>�q�e\������?[�������
��Dp�B`�J$�i�i-�tk�MA1|~4MVo[v�bD�F�����P�x`�Rp�m�������������_ jux�V-H� m�I�r��;�FlH�������.���h��upl�C��v����2���'��.'/�"��H�<�a���=,������no�� <��S
0Nj��\�<�S~wD�<�(�L��� 1�Y ��WX�5��}jo�R$j���*�B{��z���Rd8��E�-���"$���zj
�J������o�{s��WM�Tik� ~����1�b jS54��&7I���`)V���[A{�
�F@�W��I ��
� ����#�D7���5�$U2l�ZA)j:9�XM��[�*,En`W�H�Xa�r�����m`'{aC�H���
Sz�m�(�z��X)�!o%����NGd���������O4BN��W�h���V�I�,����jh��U��W�k��������
;�2��\
<�.��L'���T�on�����Cb~
�LF�h��j
?)����~"����F?�H ��eSt�rm�+D���:��@�2�5�
���uxO
l�4�����t,�^KP�������Y�`�W m��7�@�,�@�K�����PR ���RD�����"IP���{P\��?At,��������UvA��|k��>$�"]Wr��(��)�h�_
�>������/�����k����b������������?_8ME'\�IT��p�3�k�?��������y��:�`�G�;������7`��JQ?ld��%���N��N�}Bk����P���:����T������/1:�����~��l�����`m� �IQ��NZ���*L����rc&�K�`'��0WG
�������be4�����v7<��]~�u1�A��{Y�[TP�Ji�i�4�)�|'������6���F��K,}�_S@�/_B�,������Q��`�}�����h��1�`MT]���C+x��6���[A����'��SA �P�<�gh��MQ�f���?OWBD�k��{�P�������'�}�s����C�g�d���T��h6e���q]3���lt���n;VC����fb��k�&�m���h�E�f������s8��E��r\Oc+��� ��U�b�F54����+L��8���Lhr[��:'��#���#����Nr#�B'~L�Gj�4��P��n�q_�}�8!���^4�qz���v*�`��I`w�����z.XQR��� ���8vf}�!�{5$P�*d3m��z pS��q��
UOp�p��3.�z�w$�u����sRO���]ba���#���t��|����H ��]`���H�ay����o�I�v�4��Ybi�e,�I�Rk3�#�}����k��,�4�!�`�@�����x����^�]�g���?�-�H�I5s��`��L�����)�f��z�`P�YR�Kp �����9��������X��e�������V�p�" l"����-��/���1fUP���TK���K�8P!]���Uh�-�|D�F@��`�T�G
x�������4�#��&���-]:����xA@n��j6��`�����\]c��LE\�-�}b����8������?��_����������������� ���`�H�Q����_������+�n�9�m�HDi� &� ?�k���m��l�c������ca����|�����L����n�J0ib_�^�kD�Fq���&�6�B(������g8�4�����Z/���o�����w�c�#���&�"�����9����#`���A����FQ�����q$m��P�����G��� ���L��� @s�!�|��}!E�F� x����u*8���^���H���c.��������zp���sT��i�I@e� ��]�2���G�7Xw��&���B+�*�"�.����o�������H�ZNkES �_'��T�S���������_s��VA�.�r�����RrKs$����P`��-�"�7)|w`o�&�J���� ����<�V���
���U�4X2�@b�u����_�3�m3 ����)Q$�U�=�8����6g�iS��������|�3��M
ME��WK�+I^-���+6��mK���Tljs-E�8��35��2I�A��w���~�i`'��������|�;2����Z�T�"�#g�G��
���2I�d=1���K�f*��h�����
�d3�s�����[��dsk(��`�2�d�����>^���w:�
��g���f��OHR;u�l�Z�g1O��?Wto����������
]a)tfc�D���A\As��i����]�$g�$gT$c��;�i�,��@���%�f\��d������})�dw�����������lSA6���6N-&7��� �~��q��T*��fN�1hgS>)�'���
�vM_S��wu�IwW�
:;�
����{��<������X
l����O�n8�����n./ ~M���7�7��s,�[A{e��z��|J�����;WPwT>��.�P��J�*o$xtNa=�p^����������Aq��" @�K]��@Aq.'eh���@�� ��''�eY��u@
�:��(�x�
��{�W�,�J�}��Y���� �Z���J$E�)("�H�<������+fsl�������mIb'tG��8e�D�S�j wE���s�/�=J���N����`OtO�n�����9�}B?����h��=Lz���u�\*�5��K�������n?��%�Z��D�����?aGAQ+��\�U�/�>?����B�i���h���������
�=��HqZZ��$���`��AI[@�@pp�J�%8��ed�M�L<z"���>~|Y�vv���R�%h��
���VZ+��o��4LWw���������]����������k��V�=-�w1����b�]�XB��4y �weS9G�Q{2HV�i�'�������<x��p�:N����5SgP)�w+�qzc�+a��������lGAk>���iw$���%f�^��G�^'����k7��BH�\�uh�M��]{2(4�$F�
�����VY����h�S
X"��k
$��3�wE�*�s��������>�Q= l���wE�����LFh��-��X|��=���`�J�K"����eq�t�^�d?!��;�cn�?�}?�L��S��t���.��Vi��LN�yr���=)d����R��TG X(DS&$�i��f���%���F�<������
`9�m�����`�,UE?�I����M�S�(ue���C+V
j��M�3�I�}���UJ�T��RX�}��3��@�@ �0����9�G�,��c�Jd���\]^�����sZ����'<��x����u8E�*���V���L������)����%F��D$�0dv�w����6���bL?��A����1r%�'��3L�o��kO��i;l�:���+��Y��I�(;7��d��jJ5D�������'���B �Db{���:I���}!4|���k��@t���(�p�
���o���O�� ���<�I��l�UV@2��/����Cb4��+�$Gki�E�����F]C��w$�O���v��q-�$J���>Kp��=�Z�X��g���Q�])=8X:�%��q�2���Y��4}� ��|�vpCZG���6���]�v���y<� K����f�����+l��S�(�����m�*O�"!����X�Be)��e_H���� v%��
��1 ^�Z��`�vOM�MFH�s������jT`���'��"����`�����b�:|�������8�����"I���9����g���*lOjN���l���t�4x������L�&�S�wN��{�;�Fvp�:H�c�h����['S8B��4����u,Do��<�0�����w�t�;��rzy�
bVC�O-�Y���y��]�$�9RP��mI]A{|tr���t���f�2w��9�����'>��}wEP$��(e�Qx��;�ev�i�>A5��(,���(��'���F�D� ��|����
�#=�h���!�����*�-����9� q1�R�����t �{���`��^��}=�|�;v��y�Il�9�y$��-���!�O�z��xz!�Y�f�`�J��j������':�u0tl���0�}��N&��qy4D��^v��Z� {�UP��B����P*1+|���{�������he�u*��:,A�Q��V" ��cM`��X'�������`<�z0B�v$��w����y�=Bs����
����*
�R���O����a~���"ez���]A�'e����5�i��=�F�$��7�-��>�-�'-��F��P�>J%f�l��Dh����=~�E�&�>
K�l
x���<�.z��e�:���������/8h��[��� �������.���z-�F�Xv�i�@����S�1�� n�,IU��j��F�OHz�����a'���w��x��G`�����I���]�e�7>�& ��3�AJ9���V��3��h��9�j�����N�Ze�H�<:��9~WE�h�@�f��a����[���<�)�D���K�~����|���A�9�
�W����x��d�>�W�1������/�G�=
�J�c���x
���VI�r4�_�{S���#��&��6��cJT�L����� ����L��'��@������~q�9�-��O��?�F��g��OtNt� ����'�������9g���c��X����j��.�������Y��:x"z������ ������s�pWAa�X�������{�T��y�G_r/��I(0����p,����bDyc>�I[����3+�����#�yU��%�Z���O���������������` ������TX�����L���|��r�6.�T��z|���8���`�Z]7w�nx{5p���Rz|��n�Q��;����fp�~x}�-�LG��$xG�H x���%f��9�2�������h��c7���B����W"������F�����I�;��1S8�kWF�4�����D�D��������]��)vp�:H�c�������0hCl��Sqj4�w����&����5��t�X��5+�� ��,����)�L�*�VX8������Ft6�F&�c=�#:x���u<D����H� e��;v;,���d>+E��Cb�N@����,P9������{��������eO9Mf����,@���/�@�5��;���O�eW.�����������d����X���}j���
i1������'����e �c�D�O���lG-{�C������!������_�,o���F�:�b���l�c!��W�c��t�
�j-��S����e�/������v$�gZ���:����R~���)x�Fe����/��V�����.��rm9�����w4+�=a��9����������=���
���4Q��"m��Ye�<�.����n!�O�f�"��.�a�,���d
k�@�Z����[���)����c�w%�����\{6���Y\����� Ty�������/�}�3���Gpve�_�������5� uv��W)C�m�Y�Zs���H���E:<`>%��u{�6}�������z��BND�O4�B�I�>�
Lp�������G�A����:����7A}M,ABOr���V��x�m�%���@�p-h2�J��:���#�w$
F����1x�l�4��f��S6�J�-9 ��j�
�����F�6����~��!�-Gx��������w$�O����!b9���"�R�T)L1��A��`6�%�K1����D��T���4j1K�����M�n��AGB�
B��N�:��N4:J����o������8�BcU�$�V7���������O�t4:k��D�D�w�����h��j��S�5�%QC������adaE��
��A?��1:�����Z,�AoW4���W�v�^��Rb�Y�5tz=����;�Z�9�T}�g` ~�m��7je$����|�3_j��
Th�'� =9o�
�z�� �?I�}��2��i�g�:��`�`%�_s-���wk�{HL�di��G����sV���gJ�?����}��g�>�����G�/�>;����V2����������O��Il+���KTi��tYm<�_q-�jgm���[�L���ls AH>���LE����}L_9 �d�|�J���j��$;t��h9� qZ&yC�T�9:�|O���<�����l
�y��_�|��I����B<��i�
=� K������ �y�x����yr���j�5�<�����G�;w�Wy4k�1�C�4�j���7k(����3��@�G��-����D�� �n-�2�&��o/�}�� Ae�'X�Erl�X%{I��]��O��#SCs`��^|��O�k�����L�|1�Y����������������g��}�=$���u�����w<�v4D���*eI�#�9�x�$��TssYl�=�m<�������tpH��r�u���Z>U ����TKxS��w��Gs�t.���[lk�]ND�Er�{�P I����I��I���R��<����)�K2���c��������BK���p*��������& ���cl�O��PX�"�;bn��<'f�(�Grm�=>���h�N�zF���be~?1P��MR��`f�����?��LiV��B�������'s m��.�i�{B(R��f��
�}ZK���P����0~�B�'��U�~ �;(\����;��rH���C����tHW��T1p�)���H��j4FG�l�$}�"���������CKN�R%O<���
���-�t�9g���������k�F��6ZS ���aDEqs����@Fs���O���i�x�P�����u�%�Y�5��xg�J��fb���������S�6X���Xt�������(���������,��Q
Nj��Q���7�AY��D4LS��i��q#����`�,0�A�h���X����XVS ������x�:�X�i�l�U�<��fcy�@��=.�zz}�wPQ������!J%��&-=0��+���mW���?�Z&`,dOF��T8S�*O6I��c�<5�X�B�z�h}RU�<����]��S��k����*){o�/�vu��Zfxm�d
�y������R����~@X�����������Q�
��G��x����?SB{��\�l6�<�R��&M���9T�4����h�t%�B8�����91���u<DO��Oz8�C�����6�jh�d�$�Jp1�S�]�%�$�<���Q`������6����+���!��1K$A,5�|19��
#a��C�� ������v����o�����\�3����Q�eP[t�5�c!�6��v��4�����[ (��$TQ� ����J�q�h���VN3����:h)��ME�b�K
<��B�Ia LE���`{|Eu�%����f������}���������l�7h^��d�����n���������W�}!dqx��hS ���wX�������������h�xWD/��Gs�yZ% {���g�]�N���
���rjU�+���E���va�J$��oK��<q� �2�Y��>����:x7t_.�!1�X�"z� �u�~��y��P����/������|q��'���$T {|I����"W ���y4�h�vPI�����`�H��O;��)@��aMr`��G��:������t�����V:�3&Q
���g�6�������r`��..G����l|w������'���e.��������
�3 N����Z���� K\�J�Y 86w���Y�����;v�r��� �g��{�t�.[J_`�fK$��V�(���[���Z��Z��t.�Ka����u�F��@$1�K��}~�GC�[{��I����M����,?4�q�GZ?n�u���i�6�Z�"J�L�Cr���h�:�}rwh��;�v�_�}
��,�# �PA{|w���s��;����� �uq�]��y�E�, 7{
���"��
�T7X�S�'��� �d��y�$a���G��,�
�hJ�%E'���8��;�3�k����[u�����taWD�h�f
��*;���)`%C�"�R�0��>����� ��� �jn��r F|kp�<V������Bq�$�9�C����AD'G��_�^c�Kp�+�Y���Y1;������b:�K=�VL<
/3�X�6`6.i�I��? �?-�����B�R������S(�(���4S;��,P6XH���K��Ya�����ho�/�>EC�=82VZ�v�,�<|F�)�R�H�7D#]��8�%���P�Y���VP2!�������z4D;xt����r �
��xe>����:FN
R�T1��b���p��:0�m��}j�1Q����� �E65Y�d����kWE������jh��<
���H�$Y�����Ip�W`���'*��h�n
�c5�H&1�����n�?��CMg�9MH��=j��5��j�E�E���Xh�*0��)� �O%����������
�g]��x3��l�����#�����0?���W�~Gs�������0��I��h��B�>[��c���5��,����>���1���R�r`E������k��kk�ny���:��h�J����n��,P��^" R�t
E�N1��0-�DRl{�]���<��<�`dv���Z����7��r�%�r�
Q�5�
�K$�Ac�
Af�q������fZ�y�� � !�?�G�3��`�:,T
9��BN���&:U*<rL�> ��Y
�_����@��C�ZQ��r���h���y �z����"F���M1�o~�����J$�g��j�{|\t����K���:���d�_,#q��\�2��M��pT����i�2�z����!������yF�7���*3�b�CP2���T�x�al�tK��>qT���Fz
<Y�
����x�����9����q��|(,�wO��!��l�1����rK<z"v8�6�6��cpD��"7qd7��'#�P0gp�(�����|�Y���O���w��� f�b����pNyN�Y^�D��_�
���Q�Bm��p�d��"�5��2��\L��M}:O�����h�v��(�V���GZh�_�������[ �|�m�l����
��M�i�S�Qe�%���2]�X��T���(��a��d���q�X������RC Qe5s����>�?'�+\ �|#
�n���GsiO�_�9����&6�{%���c��|%�K��#��9tt�8��5�{�(|��� X?Xa h�Prp�K %P���)r�����Ns?��g:��u��|�N���=��u~c����o �w� �X��\���ix���-���Ih��.r/[Oq�Gs�f>xZl�=8�:���Jw�FFM��v&h�r����`��Z��'~>��j_����� KPq�%����Qf��� ���cU�h��As���)� �$u�N&j
���
y�����1��
|���<���h�/��9�,�S�\�����4��Kp>���Tsx� ��k!x�c��1K��{X�U��O��?A�,���7v�4������+��_e��mq��0��pp���D>�P�A����@�z�� +(�}���
� �<����+����}�'�Gs8AOF�4�������5���G�����i�������B�������-�(�
�_�����d�9�n_H�/��i,G�n�������:H!��-�����#h��H5�U���O1����R�}KJ�������A�Jd/(9��\�2Q�R��4�6��B�$��zHL�/����E3��Qu8Au���j���(�G�P+v~w7�O����������������z��q)�Qw@>6]9l����n�����,BO�Ipk��
�n�=����1Ox*@N�K�`7qI4��hN�c!z�C�,'��[�#���Y� `�8X)��;(��1��t��G��6���;\����������wh-N
�>�����wc��p_���{R(��;6����Xi�����`�
C�XR�aA;�n87!{<�~XP�o��!�-U5�uzwv$�:/
Cg�?-��/�R���MT��W��qe���*�������*���+Zs~����f�"���f���/�^����Q�!�Y���@yFE_3�q�������nLE�>���W��,���>J�����������������;�~.p:U�r�����a�cXC�x
O{� ���X���8�HJ�$�H*�,�I�o�a��G��4������!���
�/f<Rf�3�IDe=�#�N���p�$x���2�AS@���=T��P5jw�j�{�J�� �W����[c��{�A94��)O1�[p����!�Z����h��H���3��=)�l��$���6�$�x
�jVB�_���wSs���W_��3��4J�Lz�����m��l��@������eI��T��#�����Gt"�an�:�}r��a�5D� Rb0#FC)V��U"{|�u�]<�u�EA�J$�}������k�0�����;Q�Y&�����u�}:6��XO7��y�'J'J�����s��� ��hp;�^^��xu8A�c �Z�H���
���>M�r��P�>��2R���HC��
���` h�u���%/'? ��O��&7G���G�B��vkw�����wvl�ic7jNR�#&�9���,5x�����I65�����fW����L� L��� ��*{�j�x�n�H`!t�����@G�o�9��s=B���W�s��` v�0���l��+�+WPQy�����O�+�h�m��*�E"��Y$��0�2N��+����z4�\^c��:��U�����Z���"�O��q �����V�o�����y�`(�|�=������7J�e
9���c)�}B�!vM�������*�4��%����XVA ���*�D�M<��"(k�<�kK/6��f�u�V5�.��VF_:������n@z4�5�hw�I��OhMUP�����z��X�{Xa���x9����@GM�Y!���Xa�rk�y�j��@'����^
|�l��_����X�Mqvn�\.��"]������U_3�Z��hL�#�D;�xO������;Q�9��������:u�R�J;�8�"(2#W`b����wG�?�u3��m�eS A��k� ;x�}\Y��>Q�wHL�_
�W4s��X�b� `��^��u�m�{����h����'J'J'J=���x�Ov�Ju|����!��{1�v\:���&y5�qRA |��b�|��9���zHp������^�'J'J'J�jT�k#s�D05cJb�C�c�Ki�/�=^���������;,�T��eQ����� 6����a�H�Z!MK��>�L���=mX�CcmX*,I&�5S_R#l#p�"��U� ����;�v'J'J'J=�t4h�6J�#�g�a$z�pO���R�
>���X�����)�-h� LY����,�����1�X<�'