Handling GIN incomplete splits

Started by Heikki Linnakangasabout 12 years ago12 messages
#1Heikki Linnakangas
hlinnakangas@vmware.com
4 attachment(s)

Here's another part of my crusade against xlog cleanup routines. This
series of patches gets rid of the gin_cleanup() function, which is
currently used to finish splits of GIN b-tree pages, if the system
crashes (or an error occurs) between splitting a page and inserting its
downlink to the parent.

The first three patches just move code around. IMHO they make the code
more readable, so they should be committed in any case. The meat is in
the fourth patch.

Thoughts, objections?

Alexander, I'm sorry if this conflicts with your GIN patches. Feel free
to post the latest versions of your patches against the current master,
ignoring patches. I can fix the bitrot. That said, I think these
refactorings will make your code look a little bit nicer too, so you
might want to rebase because of that anyway.

- Heikki

Attachments:

0004-Get-rid-of-the-post-recovery-cleanup-step-of-GIN-pag.patchtext/x-diff; name=0004-Get-rid-of-the-post-recovery-cleanup-step-of-GIN-pag.patchDownload
>From fdd15f7dfe1ca66efba5eb4724d08574aa3e02ce Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 7 Nov 2013 19:38:01 +0200
Subject: [PATCH 4/4] Get rid of the post-recovery cleanup step of GIN page
 splits.

Replace it with an approach similar to what GiST uses: when a page is split,
the left sibling is marked with a flag indicating that the parent hasn't been
updated yet. When the parent is updated, the flag is cleared. If an insertion
steps on a page with the flag set, it will finish split before proceeding
with the insertion.

The post-recovery cleanup mechanism was never totally reliable, as insertion
to the parent could fail e.g because of running out of memory or disk space,
leaving the tree in an inconsistent state.
---
 src/backend/access/gin/ginbtree.c     | 529 ++++++++++++++++++++++++----------
 src/backend/access/gin/gindatapage.c  | 207 ++++++-------
 src/backend/access/gin/ginentrypage.c | 125 +++-----
 src/backend/access/gin/gininsert.c    |   9 +-
 src/backend/access/gin/ginxlog.c      | 482 ++++++++++++++-----------------
 src/backend/access/rmgrdesc/gindesc.c |  45 ++-
 src/include/access/gin.h              |   1 -
 src/include/access/gin_private.h      | 108 +++++--
 src/include/access/rmgrlist.h         |   2 +-
 9 files changed, 835 insertions(+), 673 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 7248b06..41e460e 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -18,8 +18,16 @@
 #include "miscadmin.h"
 #include "utils/rel.h"
 
+static void ginFindParents(GinBtree btree, GinBtreeStack *stack,
+			   BlockNumber rootBlkno);
+static bool ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
+			   void *insertdata, BlockNumber updateblkno,
+			   Buffer childbuf, GinStatsData *buildStats);
+static void ginFinishSplit(GinBtree btree, BlockNumber rootBlkno,
+			   GinBtreeStack *stack, bool freestack, GinStatsData *buildStats);
+
 /*
- * Locks buffer by needed method for search.
+ * Lock buffer by needed method for search.
  */
 static int
 ginTraverseLock(Buffer buffer, bool searchMode)
@@ -53,7 +61,7 @@ ginTraverseLock(Buffer buffer, bool searchMode)
 }
 
 /*
- * Descends the tree to the leaf page that contains or would contain the
+ * Descend the tree to the leaf page that contains, or would contain, the
  * key we're searching for. The key should already be filled in 'btree',
  * in tree-type specific manner. If btree->fullScan is true, descends to the
  * leftmost leaf page.
@@ -86,6 +94,13 @@ ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode)
 		access = ginTraverseLock(stack->buffer, searchMode);
 
 		/*
+		 * If we're going to modify the tree, finish any incomplete splits we
+		 * encounter on the way.
+		 */
+		if (!searchMode && GinPageIsIncompleteSplit(page))
+			ginFinishSplit(btree, rootBlkno, stack, false, NULL);
+
+		/*
 		 * ok, page is correctly locked, we should check to move right ..,
 		 * root never has a right link, so small optimization
 		 */
@@ -101,6 +116,9 @@ ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode)
 			stack->buffer = ginStepRight(stack->buffer, btree->index, access);
 			stack->blkno = rightlink;
 			page = BufferGetPage(stack->buffer);
+
+			if (!searchMode && GinPageIsIncompleteSplit(page))
+				ginFinishSplit(btree, rootBlkno, stack, false, NULL);
 		}
 
 		if (GinPageIsLeaf(page))	/* we found, return locked page */
@@ -188,67 +206,62 @@ freeGinBtreeStack(GinBtreeStack *stack)
  * Function should never release root page to prevent conflicts
  * with vacuum process
  */
-void
-ginFindParents(GinBtree btree, GinBtreeStack *stack,
-			   BlockNumber rootBlkno)
+static void
+ginFindParents(GinBtree btree, GinBtreeStack *stack, BlockNumber rootBlkno)
 {
 	Page		page;
 	Buffer		buffer;
 	BlockNumber blkno,
 				leftmostBlkno;
 	OffsetNumber offset;
-	GinBtreeStack *root = stack->parent;
+	GinBtreeStack *root;
 	GinBtreeStack *ptr;
 
-	if (!root)
+	/*
+	 * Unwind the stack all the way up to the root, leaving only the root
+	 * item.
+	 *
+	 * Be careful not to release the pin on the root page! The pin on root
+	 * page is required to lock out concurrent vacuums on the tree.
+	 */
+	root = stack->parent;
+	while (root->parent)
 	{
-		/* XLog mode... */
-		root = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
-		root->blkno = rootBlkno;
-		root->buffer = ReadBuffer(btree->index, rootBlkno);
-		LockBuffer(root->buffer, GIN_EXCLUSIVE);
-		root->parent = NULL;
+		ReleaseBuffer(root->buffer);
+		root = root->parent;
 	}
-	else
-	{
-		/*
-		 * find root, we should not release root page until update is
-		 * finished!!
-		 */
-		while (root->parent)
-		{
-			ReleaseBuffer(root->buffer);
-			root = root->parent;
-		}
 
-		Assert(root->blkno == rootBlkno);
-		Assert(BufferGetBlockNumber(root->buffer) == rootBlkno);
-		LockBuffer(root->buffer, GIN_EXCLUSIVE);
-	}
+	Assert(root->blkno == rootBlkno);
+	Assert(BufferGetBlockNumber(root->buffer) == rootBlkno);
 	root->off = InvalidOffsetNumber;
 
-	page = BufferGetPage(root->buffer);
-	Assert(!GinPageIsLeaf(page));
-
-	/* check trivial case */
-	if ((root->off = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) != InvalidOffsetNumber)
-	{
-		stack->parent = root;
-		return;
-	}
+	blkno = root->blkno;
+	buffer = root->buffer;
+	offset = InvalidOffsetNumber;
 
-	leftmostBlkno = blkno = btree->getLeftMostChild(btree, page);
-	LockBuffer(root->buffer, GIN_UNLOCK);
-	Assert(blkno != InvalidBlockNumber);
+	ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
 
 	for (;;)
 	{
-		buffer = ReadBuffer(btree->index, blkno);
 		LockBuffer(buffer, GIN_EXCLUSIVE);
 		page = BufferGetPage(buffer);
 		if (GinPageIsLeaf(page))
 			elog(ERROR, "Lost path");
 
+		if (GinPageIsIncompleteSplit(page))
+		{
+			Assert(blkno != rootBlkno);
+			ptr->blkno = blkno;
+			ptr->buffer = buffer;
+			/*
+			 * parent may be wrong, but if so, the ginFinshSplit call will
+			 * recurse to call ginFindParents again to fix it.
+			 */
+			ptr->parent = root;
+			ptr->off = InvalidOffsetNumber;
+			ginFinishSplit(btree, rootBlkno, ptr, false, NULL);
+		}
+
 		leftmostBlkno = btree->getLeftMostChild(btree, page);
 
 		while ((offset = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) == InvalidOffsetNumber)
@@ -261,50 +274,146 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 			}
 			buffer = ginStepRight(buffer, btree->index, GIN_EXCLUSIVE);
 			page = BufferGetPage(buffer);
+
+			/* finish any incomplete splits, as above */
+			if (GinPageIsIncompleteSplit(page))
+			{
+				Assert(blkno != rootBlkno);
+				ptr->blkno = blkno;
+				ptr->buffer = buffer;
+				ptr->parent = root;
+				ptr->off = InvalidOffsetNumber;
+				ginFinishSplit(btree, rootBlkno, ptr, false, NULL);
+			}
 		}
 
 		if (blkno != InvalidBlockNumber)
 		{
-			ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
 			ptr->blkno = blkno;
 			ptr->buffer = buffer;
-			ptr->parent = root; /* it's may be wrong, but in next call we will
+			ptr->parent = root; /* it may be wrong, but in next call we will
 								 * correct */
 			ptr->off = offset;
 			stack->parent = ptr;
 			return;
 		}
 
+		/* Descend down to next level */
 		blkno = leftmostBlkno;
+		buffer = ReadBuffer(btree->index, blkno);
 	}
 }
 
 /*
- * Returns true if the insertion is done, false if the page was split and
- * downlink insertion is pending.
+ * Insert a new item to a page.
+ *
+ * Returns true if the insertion was finished. On false, the page was split and
+ * the parent needs to be updated. (a root update returns true as it doesn't
+ * need any further action by the caller to complete)
  *
  * stack->buffer is locked on entry, and is kept locked.
+ *
+ * When inserting a downlink to a internal page, 'childbuf' contains the
+ * child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared
+ * atomically with the insert. Also, the existing item at the given location
+ * is updated to point to 'updateblkno'.
  */
 static bool
-ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
-			   GinStatsData *buildStats)
+ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
+			   void *insertdata, BlockNumber updateblkno,
+			   Buffer childbuf, GinStatsData *buildStats)
 {
 	Page		page = BufferGetPage(stack->buffer);
-	XLogRecData *rdata;
+	XLogRecData *payloadrdata;
 	bool		fit;
+	uint16		xlflags = 0;
+	Page		childpage = NULL;
+
+	if (GinPageIsData(page))
+		xlflags |= GIN_SPLIT_ISDATA;
+	if (GinPageIsLeaf(page))
+	{
+		xlflags |= GIN_SPLIT_ISLEAF;
+		Assert(!BufferIsValid(childbuf));
+		Assert(updateblkno == InvalidBlockNumber);
+	}
+	else
+	{
+		Assert(BufferIsValid(childbuf));
+		Assert(updateblkno != InvalidBlockNumber);
+		childpage = BufferGetPage(childbuf);
+	}
 
+	/*
+	 * Try to put the incoming tuple on the page. If it doesn't fit,
+	 * placeToPage method will return false and leave the page unmodified,
+	 * and we'll have to split the page.
+	 */
 	START_CRIT_SECTION();
-	fit = btree->placeToPage(btree, stack->buffer, stack->off, &rdata);
+	fit = btree->placeToPage(btree, stack->buffer, stack->off,
+							 insertdata, updateblkno,
+							 &payloadrdata);
 	if (fit)
 	{
 		MarkBufferDirty(stack->buffer);
 
+		/* An insert to an internal page finishes the split of the child. */
+		if (childbuf != InvalidBuffer)
+		{
+			GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
+			MarkBufferDirty(childbuf);
+		}
+
 		if (RelationNeedsWAL(btree->index))
 		{
 			XLogRecPtr	recptr;
+			XLogRecData rdata[3];
+			ginxlogInsert xlrec;
+
+			xlrec.node = btree->index->rd_node;
+			xlrec.blkno = BufferGetBlockNumber(stack->buffer);
+			xlrec.offset = stack->off;
+			xlrec.flags = xlflags;
+
+			rdata[0].buffer = InvalidBuffer;
+			rdata[0].data = (char *) &xlrec;
+			rdata[0].len = sizeof(ginxlogInsert);
+
+			/*
+			 * Log information about child if this was an insertion of a
+			 * downlink.
+			 */
+			if (childbuf != InvalidBuffer)
+			{
+				struct
+				{
+					BlockNumber left;
+					BlockNumber right;
+				} children;
+
+				rdata[0].next = &rdata[1];
+
+				children.left = BufferGetBlockNumber(childbuf);
+				children.right = GinPageGetOpaque(childpage)->rightlink;
+
+				rdata[1].buffer = InvalidBuffer;
+				rdata[1].data = (char *) &children;
+				rdata[1].len = sizeof(BlockNumber) * 2;
+				rdata[1].next = &rdata[2];
+
+				rdata[2].buffer = childbuf;
+				rdata[2].buffer_std = false;
+				rdata[2].data = NULL;
+				rdata[2].len = 0;
+				rdata[2].next = payloadrdata;
+			}
+			else
+				rdata[0].next = payloadrdata;
 
 			recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT, rdata);
 			PageSetLSN(page, recptr);
+			if (childbuf != InvalidBuffer)
+				PageSetLSN(childpage, recptr);
 		}
 
 		END_CRIT_SECTION();
@@ -317,13 +426,23 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 		Buffer		rbuffer;
 		Page		newlpage;
 		BlockNumber savedRightLink;
-		GinBtreeStack *parent;
-		Page		lpage,
-					rpage;
+		Page		rpage;
+		XLogRecData rdata[2];
+		ginxlogSplit data;
+		Buffer		lbuffer = InvalidBuffer;
+		Page		newrootpg = NULL;
 
 		END_CRIT_SECTION();
 
 		rbuffer = GinNewBuffer(btree->index);
+		/* During index build, count the new page */
+		if (buildStats)
+		{
+			if (btree->isData)
+				buildStats->nDataPages++;
+			else
+				buildStats->nEntryPages++;
+		}
 
 		savedRightLink = GinPageGetOpaque(page)->rightlink;
 
@@ -331,64 +450,48 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 		 * newlpage is a pointer to memory page, it is not associated with
 		 * a buffer. stack->buffer is not touched yet.
 		 */
-		newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off, &rdata);
+		newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off,
+									insertdata, updateblkno,
+									&payloadrdata);
+
+		data.node = btree->index->rd_node;
+		data.rblkno = BufferGetBlockNumber(rbuffer);
+		data.flags = xlflags;
+		if (childbuf != InvalidBuffer)
+		{
+			Page childpage = BufferGetPage(childbuf);
+			data.leftChildBlkno = BufferGetBlockNumber(childbuf);
+			data.rightChildBlkno = GinPageGetOpaque(childpage)->rightlink;
+		}
+		else
+			data.leftChildBlkno = data.rightChildBlkno = InvalidBlockNumber;
 
-		((ginxlogSplit *) (rdata->data))->rootBlkno = rootBlkno;
+		rdata[0].buffer = InvalidBuffer;
+		rdata[0].data = (char *) &data;
+		rdata[0].len = sizeof(ginxlogSplit);
 
-		/* During index build, count the newly-split page */
-		if (buildStats)
+		if (childbuf != InvalidBuffer)
 		{
-			if (btree->isData)
-				buildStats->nDataPages++;
-			else
-				buildStats->nEntryPages++;
+			rdata[0].next = &rdata[1];
+
+			rdata[1].buffer = childbuf;
+			rdata[1].buffer_std = false;
+			rdata[1].data = NULL;
+			rdata[1].len = 0;
+			rdata[1].next = payloadrdata;
 		}
+		else
+			rdata[0].next = payloadrdata;
 
-		parent = stack->parent;
+		rpage = BufferGetPage(rbuffer);
 
-		if (parent == NULL)
+		if (stack->parent == NULL)
 		{
 			/*
 			 * split root, so we need to allocate new left page and place
 			 * pointer on root to left and right page
 			 */
-			Buffer		lbuffer = GinNewBuffer(btree->index);
-
-			((ginxlogSplit *) (rdata->data))->isRootSplit = TRUE;
-			((ginxlogSplit *) (rdata->data))->rrlink = InvalidBlockNumber;
-
-			lpage = BufferGetPage(lbuffer);
-			rpage = BufferGetPage(rbuffer);
-
-			GinPageGetOpaque(rpage)->rightlink = InvalidBlockNumber;
-			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
-			((ginxlogSplit *) (rdata->data))->lblkno = BufferGetBlockNumber(lbuffer);
-
-			START_CRIT_SECTION();
-
-			GinInitBuffer(stack->buffer, GinPageGetOpaque(newlpage)->flags & ~GIN_LEAF);
-			PageRestoreTempPage(newlpage, lpage);
-			btree->fillRoot(btree, stack->buffer, lbuffer, rbuffer);
-
-			MarkBufferDirty(rbuffer);
-			MarkBufferDirty(lbuffer);
-			MarkBufferDirty(stack->buffer);
-
-			if (RelationNeedsWAL(btree->index))
-			{
-				XLogRecPtr	recptr;
-
-				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
-				PageSetLSN(page, recptr);
-				PageSetLSN(lpage, recptr);
-				PageSetLSN(rpage, recptr);
-			}
-
-			UnlockReleaseBuffer(rbuffer);
-			UnlockReleaseBuffer(lbuffer);
-			END_CRIT_SECTION();
-
-			/* During index build, count the newly-added root page */
+			lbuffer = GinNewBuffer(btree->index);
 			if (buildStats)
 			{
 				if (btree->isData)
@@ -397,87 +500,139 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 					buildStats->nEntryPages++;
 			}
 
-			return true;
+			/*
+			 * root never has a right-link, so we borrow the rrlink field to
+			 * store the root block number.
+			 */
+			data.rrlink = BufferGetBlockNumber(stack->buffer);
+			data.lblkno = BufferGetBlockNumber(lbuffer);
+			data.flags |= GIN_SPLIT_ROOT;
+
+			GinPageGetOpaque(rpage)->rightlink = InvalidBlockNumber;
+			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
+
+			/*
+			 * Construct a new root page containing downlinks to the new left
+			 * and right pages. (do this in a temporary copy first rather
+			 * than overwriting the original page directly, so that we can still
+			 * abort gracefully if this fails.)
+			 */
+			newrootpg = PageGetTempPage(rpage);
+			GinInitPage(newrootpg, GinPageGetOpaque(newlpage)->flags & ~GIN_LEAF, BLCKSZ);
+
+			btree->fillRoot(btree, newrootpg,
+							BufferGetBlockNumber(lbuffer), newlpage,
+							BufferGetBlockNumber(rbuffer), rpage);
 		}
 		else
 		{
 			/* split non-root page */
-			((ginxlogSplit *) (rdata->data))->isRootSplit = FALSE;
-			((ginxlogSplit *) (rdata->data))->rrlink = savedRightLink;
-
-			lpage = BufferGetPage(stack->buffer);
-			rpage = BufferGetPage(rbuffer);
+			data.rrlink = savedRightLink;
+			data.lblkno = BufferGetBlockNumber(stack->buffer);
 
 			GinPageGetOpaque(rpage)->rightlink = savedRightLink;
+			GinPageGetOpaque(newlpage)->flags |= GIN_INCOMPLETE_SPLIT;
 			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
+		}
 
-			START_CRIT_SECTION();
-			PageRestoreTempPage(newlpage, lpage);
+		/*
+		 * Ok, we have the new contents of the left page in a temporary copy
+		 * now (newlpage), and the newly-allocated right block has been filled
+		 * in. The original page is still unchanged.
+		 *
+		 * If this is a root split, we also have a temporary page containing
+		 * the new contents of the root. Copy the new left page to a
+		 * newly-allocated block, and initialize the (original) root page the
+		 * new copy. Otherwise, copy over the temporary copy of the new left
+		 * page over the old left page.
+		 */
 
-			MarkBufferDirty(rbuffer);
-			MarkBufferDirty(stack->buffer);
+		START_CRIT_SECTION();
 
-			if (RelationNeedsWAL(btree->index))
-			{
-				XLogRecPtr	recptr;
+		MarkBufferDirty(rbuffer);
 
-				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
-				PageSetLSN(lpage, recptr);
-				PageSetLSN(rpage, recptr);
-			}
-			UnlockReleaseBuffer(rbuffer);
-			END_CRIT_SECTION();
+		if (stack->parent == NULL)
+		{
+			PageRestoreTempPage(newlpage, BufferGetPage(lbuffer));
+			MarkBufferDirty(lbuffer);
+			newlpage = newrootpg;
+		}
 
-			return false;
+		PageRestoreTempPage(newlpage, BufferGetPage(stack->buffer));
+		MarkBufferDirty(stack->buffer);
+
+		/* write WAL record */
+		if (RelationNeedsWAL(btree->index))
+		{
+			XLogRecPtr	recptr;
+
+			recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
+			PageSetLSN(BufferGetPage(stack->buffer), recptr);
+			PageSetLSN(rpage, recptr);
+			if (stack->parent == NULL)
+				PageSetLSN(BufferGetPage(lbuffer), recptr);
 		}
+		END_CRIT_SECTION();
+
+		/*
+		 * We can release the lock on the right page now, but keep the
+		 * original buffer locked.
+		 */
+		UnlockReleaseBuffer(rbuffer);
+		if (stack->parent == NULL)
+			UnlockReleaseBuffer(lbuffer);
+
+		/*
+		 * If we split the root, we're done. Otherwise the split is not
+		 * complete until the downlink for the new page has been inserted to
+		 * the parent.
+		 */
+		if (stack->parent == NULL)
+			return true;
+		else
+			return false;
 	}
 }
 
 /*
- * Insert value (stored in GinBtree) to tree described by stack
+ * Finish a split by inserting the downlink for the new page to parent.
  *
- * During an index build, buildStats is non-null and the counters
- * it contains are incremented as needed.
+ * On entry, stack->buffer is exclusively locked.
  *
- * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
+ * If freestack is true, all the buffers are released and unlocked as we
+ * crawl up the tree, and 'stack' is freed. Otherwise stack is unmodified,
+ * and stack->buffer is kept locked.
  */
-void
-ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
+static void
+ginFinishSplit(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
+			   bool freestack, GinStatsData *buildStats)
 {
-	GinBtreeStack *parent;
-	BlockNumber rootBlkno;
 	Page		page;
-
-	/* extract root BlockNumber from stack */
-	Assert(stack != NULL);
-	parent = stack;
-	while (parent->parent)
-		parent = parent->parent;
-	rootBlkno = parent->blkno;
-	Assert(BlockNumberIsValid(rootBlkno));
+	bool		done;
+	bool		first = true;
 
 	/* this loop crawls up the stack until the insertion is complete */
-	for (;;)
+	do
 	{
-		bool done;
-
-		done = ginPlaceToPage(btree, rootBlkno, stack, buildStats);
-
-		/* just to be extra sure we don't delete anything by accident... */
-		btree->isDelete = FALSE;
-
-		if (done)
-		{
-			LockBuffer(stack->buffer, GIN_UNLOCK);
-			freeGinBtreeStack(stack);
-			break;
-		}
-
-		btree->prepareDownlink(btree, stack->buffer);
+		GinBtreeStack *parent = stack->parent;
+		void	   *insertdata;
+		BlockNumber updateblkno;
 
 		/* search parent to lock */
 		LockBuffer(parent->buffer, GIN_EXCLUSIVE);
 
+		/*
+		 * If the parent page was incompletely split, finish that split first,
+		 * then continue with the current one.
+		 *
+		 * Note: we have to finish *all* incomplete splits we encounter, even
+		 * if we have to move right. Otherwise we might choose as the target
+		 * a page that has no downlink in the parent, and splitting it further
+		 * would fail.
+		 */
+		if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
+			ginFinishSplit(btree, rootBlkno, parent, false, buildStats);
+
 		/* move right if it's needed */
 		page = BufferGetPage(parent->buffer);
 		while ((parent->off = btree->findChildPtr(btree, page, stack->blkno, parent->off)) == InvalidOffsetNumber)
@@ -500,10 +655,78 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 			parent->buffer = ginStepRight(parent->buffer, btree->index, GIN_EXCLUSIVE);
 			parent->blkno = rightlink;
 			page = BufferGetPage(parent->buffer);
+
+			if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
+				ginFinishSplit(btree, rootBlkno, parent, false, buildStats);
 		}
 
-		UnlockReleaseBuffer(stack->buffer);
-		pfree(stack);
+		insertdata = btree->prepareDownlink(btree, stack->buffer);
+		updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer))->rightlink;
+		done = ginPlaceToPage(btree, parent,
+							  insertdata, updateblkno,
+							  stack->buffer, buildStats);
+		pfree(insertdata);
+
+		/*
+		 * If the caller requested to free the stack, unlock and release the
+		 * child buffer now. Otherwise keep it pinned and locked, but if
+		 * we have to recurse up the tree, we can unlock the upper pages,
+		 * only keeping the page at the bottom of the stack locked.
+		 */
+		if (!first || freestack)
+			LockBuffer(stack->buffer, GIN_UNLOCK);
+		if (freestack)
+		{
+			ReleaseBuffer(stack->buffer);
+			pfree(stack);
+		}
 		stack = parent;
+
+		first = false;
+	} while (!done);
+
+	if (freestack)
+	{
+		LockBuffer(stack->buffer, GIN_UNLOCK);
+		freeGinBtreeStack(stack);
 	}
 }
+
+/*
+ * Insert value (stored in GinBtree) to tree described by stack
+ *
+ * During an index build, buildStats is non-null and the counters
+ * it contains are incremented as needed.
+ *
+ * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
+ */
+void
+ginInsertValue(GinBtree btree, GinBtreeStack *stack, void *insertdata,
+			   GinStatsData *buildStats)
+{
+	GinBtreeStack *parent;
+	BlockNumber rootBlkno;
+	bool		done;
+
+	/* extract root BlockNumber from stack */
+	Assert(stack != NULL);
+	parent = stack;
+	while (parent->parent)
+		parent = parent->parent;
+	rootBlkno = parent->blkno;
+	Assert(BlockNumberIsValid(rootBlkno));
+
+	/* If the leaf page was incompletely split, finish the split first */
+	if (GinPageIsIncompleteSplit(BufferGetPage(stack->buffer)))
+		ginFinishSplit(btree, rootBlkno, stack, false, buildStats);
+
+	done = ginPlaceToPage(btree, stack, insertdata, InvalidBlockNumber,
+						  InvalidBuffer, buildStats);
+	if (done)
+	{
+		LockBuffer(stack->buffer, GIN_UNLOCK);
+		freeGinBtreeStack(stack);
+	}
+	else
+		ginFinishSplit(btree, rootBlkno, stack, true, buildStats);
+}
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 908e28f..d4458f7 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -30,7 +30,7 @@ dataIsMoveRight(GinBtree btree, Page page)
 	if (GinPageRightMost(page))
 		return FALSE;
 
-	return (ginCompareItemPointers(btree->items + btree->curitem, iptr) > 0) ? TRUE : FALSE;
+	return (ginCompareItemPointers(&btree->itemptr, iptr) > 0) ? TRUE : FALSE;
 }
 
 /*
@@ -80,7 +80,7 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
 		else
 		{
 			pitem = GinDataPageGetPostingItem(page, mid);
-			result = ginCompareItemPointers(btree->items + btree->curitem, &(pitem->key));
+			result = ginCompareItemPointers(&btree->itemptr, &(pitem->key));
 		}
 
 		if (result == 0)
@@ -138,7 +138,7 @@ dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
 	{
 		OffsetNumber mid = low + ((high - low) / 2);
 
-		result = ginCompareItemPointers(btree->items + btree->curitem,
+		result = ginCompareItemPointers(&btree->itemptr,
 										GinDataPageGetItemPointer(page, mid));
 
 		if (result == 0)
@@ -298,18 +298,18 @@ GinPageDeletePostingItem(Page page, OffsetNumber offset)
  * item pointer never deletes!
  */
 static bool
-dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
+dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off, void *insertdata)
 {
 	Page		page = BufferGetPage(buf);
 
 	Assert(GinPageIsData(page));
-	Assert(!btree->isDelete);
 
 	if (GinPageIsLeaf(page))
 	{
+		GinBtreeDataLeafInsertData *items = insertdata;
 		if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
 		{
-			if ((btree->nitem - btree->curitem) * sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
+			if ((items->nitem - items->curitem) * sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
 				return true;
 		}
 		else if (sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
@@ -322,31 +322,6 @@ dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
 }
 
 /*
- * In case of previous split update old child blkno to
- * new right page
- * item pointer never deletes!
- */
-static BlockNumber
-dataPrepareData(GinBtree btree, Page page, OffsetNumber off)
-{
-	BlockNumber ret = InvalidBlockNumber;
-
-	Assert(GinPageIsData(page));
-
-	if (!GinPageIsLeaf(page) && btree->rightblkno != InvalidBlockNumber)
-	{
-		PostingItem *pitem = GinDataPageGetPostingItem(page, off);
-
-		PostingItemSetBlockNumber(pitem, btree->rightblkno);
-		ret = btree->rightblkno;
-	}
-
-	btree->rightblkno = InvalidBlockNumber;
-
-	return ret;
-}
-
-/*
  * Places keys to page and fills WAL record. In case leaf page and
  * build mode puts all ItemPointers to page.
  *
@@ -354,85 +329,74 @@ dataPrepareData(GinBtree btree, Page page, OffsetNumber off)
  */
 static bool
 dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
+				void *insertdata, BlockNumber updateblkno,
 				XLogRecData **prdata)
 {
 	Page		page = BufferGetPage(buf);
 	int			sizeofitem = GinSizeOfDataPageItem(page);
-	int			cnt = 0;
 
 	/* these must be static so they can be returned to caller */
 	static XLogRecData rdata[3];
-	static ginxlogInsert data;
 
 	/* quick exit if it doesn't fit */
-	if (!dataIsEnoughSpace(btree, buf, off))
+	if (!dataIsEnoughSpace(btree, buf, off, insertdata))
 		return false;
 
 	*prdata = rdata;
 	Assert(GinPageIsData(page));
 
-	data.updateBlkno = dataPrepareData(btree, page, off);
-
-	data.node = btree->index->rd_node;
-	data.blkno = BufferGetBlockNumber(buf);
-	data.offset = off;
-	data.nitem = 1;
-	data.isDelete = FALSE;
-	data.isData = TRUE;
-	data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
-
-	/*
-	 * Prevent full page write if child's split occurs. That is needed to
-	 * remove incomplete splits while replaying WAL
-	 *
-	 * data.updateBlkno contains new block number (of newly created right
-	 * page) for recently splited page.
-	 */
-	if (data.updateBlkno == InvalidBlockNumber)
+	/* Update existing downlink to point to next page (on internal page) */
+	if (!GinPageIsLeaf(page))
 	{
-		rdata[0].buffer = buf;
-		rdata[0].buffer_std = FALSE;
-		rdata[0].data = NULL;
-		rdata[0].len = 0;
-		rdata[0].next = &rdata[1];
-		cnt++;
+		PostingItem *pitem = GinDataPageGetPostingItem(page, off);
+		PostingItemSetBlockNumber(pitem, updateblkno);
 	}
 
-	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (char *) &data;
-	rdata[cnt].len = sizeof(ginxlogInsert);
-	rdata[cnt].next = &rdata[cnt + 1];
-	cnt++;
-
-	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem));
-	rdata[cnt].len = sizeofitem;
-	rdata[cnt].next = NULL;
-
 	if (GinPageIsLeaf(page))
 	{
+		GinBtreeDataLeafInsertData *items = insertdata;
+		static ginxlogInsertDataLeaf data;
+		data.nitem = 1;
+
 		if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
 		{
 			/* usually, create index... */
-			uint32		savedPos = btree->curitem;
+			uint32		savedPos = items->curitem;
 
-			while (btree->curitem < btree->nitem)
+			while (items->curitem < items->nitem)
 			{
-				GinDataPageAddItemPointer(page, btree->items + btree->curitem, off);
+				GinDataPageAddItemPointer(page, items->items + items->curitem, off);
 				off++;
-				btree->curitem++;
+				items->curitem++;
 			}
-			data.nitem = btree->curitem - savedPos;
-			rdata[cnt].len = sizeofitem * data.nitem;
+			data.nitem = items->curitem - savedPos;
 		}
 		else
 		{
-			GinDataPageAddItemPointer(page, btree->items + btree->curitem, off);
-			btree->curitem++;
+			GinDataPageAddItemPointer(page, items->items + items->curitem, off);
+			items->curitem++;
 		}
+
+		rdata[0].buffer = InvalidBuffer;
+		rdata[0].data = (char *) &data;
+		rdata[0].len = offsetof(ginxlogInsertDataLeaf, items);
+		rdata[0].next = &rdata[1];
+
+		rdata[1].buffer = InvalidBuffer;
+		rdata[1].data = (char *) (items->items + items->curitem);
+		rdata[1].len = data.nitem * sizeofitem;
+		rdata[1].next = NULL;
 	}
 	else
-		GinDataPageAddPostingItem(page, &(btree->pitem), off);
+	{
+		PostingItem *pitem = insertdata;
+		GinDataPageAddPostingItem(page, pitem, off);
+
+		rdata[0].buffer = InvalidBuffer;
+		rdata[0].data = (char *) pitem;
+		rdata[0].len = sizeof(PostingItem);
+		rdata[0].next = NULL;
+	}
 
 	return true;
 }
@@ -444,7 +408,8 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
  * left page
  */
 static Page
-dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRecData **prdata)
+dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+			  void *insertdata, BlockNumber updateblkno, XLogRecData **prdata)
 {
 	char	   *ptr;
 	OffsetNumber separator;
@@ -457,20 +422,23 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 	Page		rpage = BufferGetPage(rbuf);
 	Size		pageSize = PageGetPageSize(lpage);
 	Size		freeSpace;
-	uint32		nCopied = 1;
 
 	/* these must be static so they can be returned to caller */
-	static ginxlogSplit data;
-	static XLogRecData rdata[4];
+	static ginxlogSplitData data;
+	static XLogRecData rdata[2];
 	static char vector[2 * BLCKSZ];
 
 	GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
 	freeSpace = GinDataPageGetFreeSpace(rpage);
 
 	*prdata = rdata;
-	data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
-		InvalidOffsetNumber : PostingItemGetBlockNumber(&(btree->pitem));
-	data.updateBlkno = dataPrepareData(btree, lpage, off);
+
+	/* Update existing downlink to point to next page (on internal page) */
+	if (!isleaf)
+	{
+		PostingItem *pitem = GinDataPageGetPostingItem(lpage, off);
+		PostingItemSetBlockNumber(pitem, updateblkno);
+	}
 
 	if (isleaf)
 	{
@@ -487,16 +455,15 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 
 	if (isleaf && GinPageRightMost(lpage) && off > GinPageGetOpaque(lpage)->maxoff)
 	{
-		nCopied = 0;
-		while (btree->curitem < btree->nitem &&
+		GinBtreeDataLeafInsertData *items = insertdata;
+		while (items->curitem < items->nitem &&
 			   maxoff * sizeof(ItemPointerData) < 2 * (freeSpace - sizeof(ItemPointerData)))
 		{
 			memcpy(vector + maxoff * sizeof(ItemPointerData),
-				   btree->items + btree->curitem,
+				   items->items + items->curitem,
 				   sizeof(ItemPointerData));
 			maxoff++;
-			nCopied++;
-			btree->curitem++;
+			items->curitem++;
 		}
 	}
 	else
@@ -506,11 +473,15 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 			memmove(ptr + sizeofitem, ptr, (maxoff - off + 1) * sizeofitem);
 		if (isleaf)
 		{
-			memcpy(ptr, btree->items + btree->curitem, sizeofitem);
-			btree->curitem++;
+			GinBtreeDataLeafInsertData *items = insertdata;
+			memcpy(ptr, items->items + items->curitem, sizeofitem);
+			items->curitem++;
 		}
 		else
-			memcpy(ptr, &(btree->pitem), sizeofitem);
+		{
+			PostingItem *pitem = insertdata;
+			memcpy(ptr, pitem, sizeofitem);
+		}
 
 		maxoff++;
 	}
@@ -559,20 +530,13 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 	bound = GinDataPageGetRightBound(rpage);
 	*bound = oldbound;
 
-	data.node = btree->index->rd_node;
-	data.rootBlkno = InvalidBlockNumber;
-	data.lblkno = BufferGetBlockNumber(lbuf);
-	data.rblkno = BufferGetBlockNumber(rbuf);
 	data.separator = separator;
 	data.nitem = maxoff;
-	data.isData = TRUE;
-	data.isLeaf = GinPageIsLeaf(lpage) ? TRUE : FALSE;
-	data.isRootSplit = FALSE;
 	data.rightbound = oldbound;
 
 	rdata[0].buffer = InvalidBuffer;
 	rdata[0].data = (char *) &data;
-	rdata[0].len = sizeof(ginxlogSplit);
+	rdata[0].len = sizeof(ginxlogSplitData);
 	rdata[0].next = &rdata[1];
 
 	rdata[1].buffer = InvalidBuffer;
@@ -586,14 +550,16 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 /*
  * Prepare the state in 'btree' for inserting a downlink for given buffer.
  */
-static void
+static void *
 dataPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
+	PostingItem	*pitem = palloc(sizeof(PostingItem));
 	Page		lpage = BufferGetPage(lbuf);
 
-	PostingItemSetBlockNumber(&(btree->pitem), BufferGetBlockNumber(lbuf));
-	btree->pitem.key = *GinDataPageGetRightBound(lpage);
-	btree->rightblkno = GinPageGetOpaque(lpage)->rightlink;
+	PostingItemSetBlockNumber(pitem, BufferGetBlockNumber(lbuf));
+	pitem->key = *GinDataPageGetRightBound(lpage);
+
+	return pitem;
 }
 
 /*
@@ -601,21 +567,18 @@ dataPrepareDownlink(GinBtree btree, Buffer lbuf)
  * Also called from ginxlog, should not use btree
  */
 void
-ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
+ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage)
 {
-	Page		page = BufferGetPage(root),
-				lpage = BufferGetPage(lbuf),
-				rpage = BufferGetPage(rbuf);
 	PostingItem li,
 				ri;
 
 	li.key = *GinDataPageGetRightBound(lpage);
-	PostingItemSetBlockNumber(&li, BufferGetBlockNumber(lbuf));
-	GinDataPageAddPostingItem(page, &li, InvalidOffsetNumber);
+	PostingItemSetBlockNumber(&li, lblkno);
+	GinDataPageAddPostingItem(root, &li, InvalidOffsetNumber);
 
 	ri.key = *GinDataPageGetRightBound(rpage);
-	PostingItemSetBlockNumber(&ri, BufferGetBlockNumber(rbuf));
-	GinDataPageAddPostingItem(page, &ri, InvalidOffsetNumber);
+	PostingItemSetBlockNumber(&ri, rblkno);
+	GinDataPageAddPostingItem(root, &ri, InvalidOffsetNumber);
 }
 
 /*
@@ -715,7 +678,6 @@ ginPrepareDataScan(GinBtree btree, Relation index)
 	btree->prepareDownlink = dataPrepareDownlink;
 
 	btree->isData = TRUE;
-	btree->isDelete = FALSE;
 	btree->fullScan = FALSE;
 	btree->isBuild = FALSE;
 }
@@ -729,29 +691,32 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  GinStatsData *buildStats)
 {
 	GinBtreeData btree;
+	GinBtreeDataLeafInsertData insertdata;
 	GinBtreeStack *stack;
 
 	ginPrepareDataScan(&btree, index);
 	btree.isBuild = (buildStats != NULL);
-	btree.items = items;
-	btree.nitem = nitem;
-	btree.curitem = 0;
+	insertdata.items = items;
+	insertdata.nitem = nitem;
+	insertdata.curitem = 0;
 
-	while (btree.curitem < btree.nitem)
+	while (insertdata.curitem < insertdata.nitem)
 	{
+		/* search for the leaf page where the first item should go to */
+		btree.itemptr = insertdata.items[insertdata.curitem];
 		stack = ginFindLeafPage(&btree, rootBlkno, false);
 
 		if (btree.findItem(&btree, stack))
 		{
 			/*
-			 * btree.items[btree.curitem] already exists in index
+			 * Current item already exists in index.
 			 */
-			btree.curitem++;
+			insertdata.curitem++;
 			LockBuffer(stack->buffer, GIN_UNLOCK);
 			freeGinBtreeStack(stack);
 		}
 		else
-			ginInsertValue(&btree, stack, buildStats);
+			ginInsertValue(&btree, stack, &insertdata, buildStats);
 	}
 }
 
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 378bce1..c6096c2 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -431,22 +431,23 @@ entryGetLeftMostPage(GinBtree btree, Page page)
 }
 
 static bool
-entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
+entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
+				   GinBtreeEntryInsertData *entry)
 {
 	Size		itupsz = 0;
 	Page		page = BufferGetPage(buf);
 
-	Assert(btree->entry);
+	Assert(entry->entry);
 	Assert(!GinPageIsData(page));
 
-	if (btree->isDelete)
+	if (entry->isDelete)
 	{
 		IndexTuple	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, off));
 
 		itupsz = MAXALIGN(IndexTupleSize(itup)) + sizeof(ItemIdData);
 	}
 
-	if (PageGetFreeSpace(page) + itupsz >= MAXALIGN(IndexTupleSize(btree->entry)) + sizeof(ItemIdData))
+	if (PageGetFreeSpace(page) + itupsz >= MAXALIGN(IndexTupleSize(entry->entry)) + sizeof(ItemIdData))
 		return true;
 
 	return false;
@@ -457,31 +458,25 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
  * should update it, update old child blkno to new right page
  * if child split occurred
  */
-static BlockNumber
-entryPreparePage(GinBtree btree, Page page, OffsetNumber off)
+static void
+entryPreparePage(GinBtree btree, Page page, OffsetNumber off,
+				 GinBtreeEntryInsertData *entry, BlockNumber updateblkno)
 {
-	BlockNumber ret = InvalidBlockNumber;
-
-	Assert(btree->entry);
+	Assert(entry->entry);
 	Assert(!GinPageIsData(page));
 
-	if (btree->isDelete)
+	if (entry->isDelete)
 	{
 		Assert(GinPageIsLeaf(page));
 		PageIndexTupleDelete(page, off);
 	}
 
-	if (!GinPageIsLeaf(page) && btree->rightblkno != InvalidBlockNumber)
+	if (!GinPageIsLeaf(page) && updateblkno != InvalidBlockNumber)
 	{
 		IndexTuple	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, off));
 
-		GinSetDownlink(itup, btree->rightblkno);
-		ret = btree->rightblkno;
+		GinSetDownlink(itup, updateblkno);
 	}
-
-	btree->rightblkno = InvalidBlockNumber;
-
-	return ret;
 }
 
 /*
@@ -491,66 +486,43 @@ entryPreparePage(GinBtree btree, Page page, OffsetNumber off)
  */
 static bool
 entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
+				 void *insertdata, BlockNumber updateblkno,
 				 XLogRecData **prdata)
 {
+	GinBtreeEntryInsertData *entry = insertdata;
 	Page		page = BufferGetPage(buf);
 	OffsetNumber placed;
 	int			cnt = 0;
 
 	/* these must be static so they can be returned to caller */
 	static XLogRecData rdata[3];
-	static ginxlogInsert data;
+	static ginxlogInsertEntry data;
 
 	/* quick exit if it doesn't fit */
-	if (!entryIsEnoughSpace(btree, buf, off))
+	if (!entryIsEnoughSpace(btree, buf, off, entry))
 		return false;
 
 	*prdata = rdata;
-	data.updateBlkno = entryPreparePage(btree, page, off);
+	entryPreparePage(btree, page, off, entry, updateblkno);
 
-	placed = PageAddItem(page, (Item) btree->entry, IndexTupleSize(btree->entry), off, false, false);
+	placed = PageAddItem(page, (Item) entry->entry, IndexTupleSize(entry->entry), off, false, false);
 	if (placed != off)
 		elog(ERROR, "failed to add item to index page in \"%s\"",
 			 RelationGetRelationName(btree->index));
 
-	data.node = btree->index->rd_node;
-	data.blkno = BufferGetBlockNumber(buf);
-	data.offset = off;
-	data.nitem = 1;
-	data.isDelete = btree->isDelete;
-	data.isData = false;
-	data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
-
-	/*
-	 * Prevent full page write if child's split occurs. That is needed to
-	 * remove incomplete splits while replaying WAL
-	 *
-	 * data.updateBlkno contains new block number (of newly created right
-	 * page) for recently splited page.
-	 */
-	if (data.updateBlkno == InvalidBlockNumber)
-	{
-		rdata[0].buffer = buf;
-		rdata[0].buffer_std = TRUE;
-		rdata[0].data = NULL;
-		rdata[0].len = 0;
-		rdata[0].next = &rdata[1];
-		cnt++;
-	}
+	data.isDelete = entry->isDelete;
 
 	rdata[cnt].buffer = InvalidBuffer;
 	rdata[cnt].data = (char *) &data;
-	rdata[cnt].len = sizeof(ginxlogInsert);
+	rdata[cnt].len = offsetof(ginxlogInsertEntry, tuple);
 	rdata[cnt].next = &rdata[cnt + 1];
 	cnt++;
 
 	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (char *) btree->entry;
-	rdata[cnt].len = IndexTupleSize(btree->entry);
+	rdata[cnt].data = (char *) entry->entry;
+	rdata[cnt].len = IndexTupleSize(entry->entry);
 	rdata[cnt].next = NULL;
 
-	btree->entry = NULL;
-
 	return true;
 }
 
@@ -561,8 +533,10 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
  * an equal number!
  */
 static Page
-entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRecData **prdata)
+entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+			   void *insertdata, BlockNumber updateblkno, XLogRecData **prdata)
 {
+	GinBtreeEntryInsertData *entry = insertdata;
 	OffsetNumber i,
 				maxoff,
 				separator = InvalidOffsetNumber;
@@ -578,13 +552,11 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 
 	/* these must be static so they can be returned to caller */
 	static XLogRecData rdata[2];
-	static ginxlogSplit data;
+	static ginxlogSplitEntry data;
 	static char tupstore[2 * BLCKSZ];
 
 	*prdata = rdata;
-	data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
-		InvalidOffsetNumber : GinGetDownlink(btree->entry);
-	data.updateBlkno = entryPreparePage(btree, lpage, off);
+	entryPreparePage(btree, lpage, off, entry, updateblkno);
 
 	maxoff = PageGetMaxOffsetNumber(lpage);
 	ptr = tupstore;
@@ -593,8 +565,8 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 	{
 		if (i == off)
 		{
-			size = MAXALIGN(IndexTupleSize(btree->entry));
-			memcpy(ptr, btree->entry, size);
+			size = MAXALIGN(IndexTupleSize(entry->entry));
+			memcpy(ptr, entry->entry, size);
 			ptr += size;
 			totalsize += size + sizeof(ItemIdData);
 		}
@@ -608,8 +580,8 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 
 	if (off == maxoff + 1)
 	{
-		size = MAXALIGN(IndexTupleSize(btree->entry));
-		memcpy(ptr, btree->entry, size);
+		size = MAXALIGN(IndexTupleSize(entry->entry));
+		memcpy(ptr, entry->entry, size);
 		ptr += size;
 		totalsize += size + sizeof(ItemIdData);
 	}
@@ -643,15 +615,8 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 		ptr += MAXALIGN(IndexTupleSize(itup));
 	}
 
-	data.node = btree->index->rd_node;
-	data.rootBlkno = InvalidBlockNumber;
-	data.lblkno = BufferGetBlockNumber(lbuf);
-	data.rblkno = BufferGetBlockNumber(rbuf);
 	data.separator = separator;
 	data.nitem = maxoff;
-	data.isData = FALSE;
-	data.isLeaf = GinPageIsLeaf(lpage) ? TRUE : FALSE;
-	data.isRootSplit = FALSE;
 
 	rdata[0].buffer = InvalidBuffer;
 	rdata[0].data = (char *) &data;
@@ -669,17 +634,19 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 /*
  * Prepare the state in 'btree' for inserting a downlink for given buffer.
  */
-static void
+static void *
 entryPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
+	GinBtreeEntryInsertData *entry = palloc(sizeof(GinBtreeEntryInsertData));
 	Page		lpage = BufferGetPage(lbuf);
 	IndexTuple	itup;
 
 	itup = getRightMostTuple(lpage);
-
-	btree->entry = GinFormInteriorTuple(itup, lpage,
+	entry->entry = GinFormInteriorTuple(itup, lpage,
 										BufferGetBlockNumber(lbuf));
-	btree->rightblkno = GinPageGetOpaque(lpage)->rightlink;
+	entry->isDelete = false;
+
+	return entry;
 }
 
 /*
@@ -687,22 +654,19 @@ entryPrepareDownlink(GinBtree btree, Buffer lbuf)
  * Also called from ginxlog, should not use btree
  */
 void
-ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
+ginEntryFillRoot(GinBtree btree, Page root,
+				 BlockNumber lblkno, Page lpage,
+				 BlockNumber rblkno, Page rpage)
 {
-	Page		page = BufferGetPage(root);
-	Page		lpage = BufferGetPage(lbuf);
-	Page		rpage = BufferGetPage(rbuf);
 	IndexTuple	itup;
 
-	itup = GinFormInteriorTuple(getRightMostTuple(lpage), lpage,
-								BufferGetBlockNumber(lbuf));
-	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+	itup = GinFormInteriorTuple(getRightMostTuple(lpage), lpage, lblkno);
+	if (PageAddItem(root, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
 		elog(ERROR, "failed to add item to index root page");
 	pfree(itup);
 
-	itup = GinFormInteriorTuple(getRightMostTuple(rpage), rpage,
-								BufferGetBlockNumber(rbuf));
-	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+	itup = GinFormInteriorTuple(getRightMostTuple(rpage), rpage, rblkno);
+	if (PageAddItem(root, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
 		elog(ERROR, "failed to add item to index root page");
 	pfree(itup);
 }
@@ -740,5 +704,4 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 	btree->entryAttnum = attnum;
 	btree->entryKey = key;
 	btree->entryCategory = category;
-	btree->isDelete = FALSE;
 }
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 0a2883a..cb58052 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -163,10 +163,13 @@ ginEntryInsert(GinState *ginstate,
 			   GinStatsData *buildStats)
 {
 	GinBtreeData btree;
+	GinBtreeEntryInsertData insertdata;
 	GinBtreeStack *stack;
 	IndexTuple	itup;
 	Page		page;
 
+	insertdata.isDelete = FALSE;
+
 	/* During index build, count the to-be-inserted entry */
 	if (buildStats)
 		buildStats->nEntries++;
@@ -201,7 +204,7 @@ ginEntryInsert(GinState *ginstate,
 		itup = addItemPointersToLeafTuple(ginstate, itup,
 										  items, nitem, buildStats);
 
-		btree.isDelete = TRUE;
+		insertdata.isDelete = TRUE;
 	}
 	else
 	{
@@ -211,8 +214,8 @@ ginEntryInsert(GinState *ginstate,
 	}
 
 	/* Insert the new or modified leaf tuple */
-	btree.entry = itup;
-	ginInsertValue(&btree, stack, buildStats);
+	insertdata.entry = itup;
+	ginInsertValue(&btree, stack, &insertdata, buildStats);
 	pfree(itup);
 }
 
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index ddac343..cae5251 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -18,55 +18,27 @@
 #include "utils/memutils.h"
 
 static MemoryContext opCtx;		/* working memory for operations */
-static MemoryContext topCtx;
-
-typedef struct ginIncompleteSplit
-{
-	RelFileNode node;
-	BlockNumber leftBlkno;
-	BlockNumber rightBlkno;
-	BlockNumber rootBlkno;
-} ginIncompleteSplit;
-
-static List *incomplete_splits;
 
 static void
-pushIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, BlockNumber rightBlkno, BlockNumber rootBlkno)
+ginRedoClearIncompleteSplit(XLogRecPtr lsn, RelFileNode node, BlockNumber blkno)
 {
-	ginIncompleteSplit *split;
-
-	MemoryContextSwitchTo(topCtx);
-
-	split = palloc(sizeof(ginIncompleteSplit));
-
-	split->node = node;
-	split->leftBlkno = leftBlkno;
-	split->rightBlkno = rightBlkno;
-	split->rootBlkno = rootBlkno;
-
-	incomplete_splits = lappend(incomplete_splits, split);
-
-	MemoryContextSwitchTo(opCtx);
-}
+	Buffer		buffer;
+	Page		page;
 
-static void
-forgetIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, BlockNumber updateBlkno)
-{
-	ListCell   *l;
+	buffer = XLogReadBuffer(node, blkno, false);
+	if (!BufferIsValid(buffer))
+		return;					/* page was deleted, nothing to do */
+	page = (Page) BufferGetPage(buffer);
 
-	foreach(l, incomplete_splits)
+	if (lsn > PageGetLSN(page))
 	{
-		ginIncompleteSplit *split = (ginIncompleteSplit *) lfirst(l);
+		GinPageGetOpaque(page)->flags &= ~GIN_INCOMPLETE_SPLIT;
 
-		if (RelFileNodeEquals(node, split->node) &&
-			leftBlkno == split->leftBlkno &&
-			updateBlkno == split->rightBlkno)
-		{
-			incomplete_splits = list_delete_ptr(incomplete_splits, split);
-			pfree(split);
-			break;
-		}
+		PageSetLSN(page, lsn);
+		MarkBufferDirty(buffer);
 	}
+
+	UnlockReleaseBuffer(buffer);
 }
 
 static void
@@ -128,44 +100,104 @@ ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
 }
 
 static void
-ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
+ginRedoInsertEntry(Buffer buffer, OffsetNumber offset, BlockNumber rightblkno,
+				   void *rdata)
 {
-	ginxlogInsert *data = (ginxlogInsert *) XLogRecGetData(record);
-	Buffer		buffer;
-	Page		page;
+	Page		page = BufferGetPage(buffer);
+	ginxlogInsertEntry *data = (ginxlogInsertEntry *) rdata;
+	IndexTuple	itup;
 
-	/* first, forget any incomplete split this insertion completes */
-	if (data->isData)
+	if (rightblkno != InvalidBlockNumber)
 	{
-		Assert(data->isDelete == FALSE);
-		if (!data->isLeaf && data->updateBlkno != InvalidBlockNumber)
-		{
-			PostingItem *pitem;
+		/* update link to right page after split */
+		Assert(!GinPageIsLeaf(page));
+		Assert(offset >= FirstOffsetNumber && offset <= PageGetMaxOffsetNumber(page));
+		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+		GinSetDownlink(itup, rightblkno);
+	}
 
-			pitem = (PostingItem *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-			forgetIncompleteSplit(data->node,
-								  PostingItemGetBlockNumber(pitem),
-								  data->updateBlkno);
-		}
+	if (data->isDelete)
+	{
+		Assert(GinPageIsLeaf(page));
+		Assert(offset >= FirstOffsetNumber && offset <= PageGetMaxOffsetNumber(page));
+		PageIndexTupleDelete(page, offset);
+	}
+
+	itup = &data->tuple;
+
+	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), offset, false, false) == InvalidOffsetNumber)
+	{
+		RelFileNode node;
+		ForkNumber forknum;
+		BlockNumber blknum;
 
+		BufferGetTag(buffer, &node, &forknum, &blknum);
+		elog(ERROR, "failed to add item to index page in %u/%u/%u",
+			 node.spcNode, node.dbNode, node.relNode);
+	}
+}
+
+static void
+ginRedoInsertData(Buffer buffer, OffsetNumber offset, BlockNumber rightblkno,
+				  void *rdata)
+{
+	Page		page = BufferGetPage(buffer);
+
+	if (GinPageIsLeaf(page))
+	{
+		ginxlogInsertDataLeaf *data = (ginxlogInsertDataLeaf *) rdata;
+		ItemPointerData *items = data->items;
+		OffsetNumber i;
+
+		for (i = 0; i < data->nitem; i++)
+			GinDataPageAddItemPointer(page, &items[i], offset + i);
 	}
 	else
 	{
-		if (!data->isLeaf && data->updateBlkno != InvalidBlockNumber)
-		{
-			IndexTuple	itup;
+		PostingItem *oldpitem;
 
-			itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-			forgetIncompleteSplit(data->node,
-								  GinGetDownlink(itup),
-								  data->updateBlkno);
-		}
+		/* update link to right page after split */
+		oldpitem = GinDataPageGetPostingItem(page, offset);
+		PostingItemSetBlockNumber(oldpitem, rightblkno);
+
+		GinDataPageAddPostingItem(page, (PostingItem *) rdata, offset);
+	}
+}
+
+static void
+ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
+{
+	ginxlogInsert *data = (ginxlogInsert *) XLogRecGetData(record);
+	Buffer		buffer;
+	Page		page;
+	char	   *payload;
+	BlockNumber leftChildBlkno = InvalidBlockNumber;
+	BlockNumber rightChildBlkno = InvalidBlockNumber;
+	bool		isLeaf = (data->flags & GIN_SPLIT_ISLEAF) != 0;
+
+	payload = XLogRecGetData(record) + sizeof(ginxlogInsert);
+
+	/*
+	 * First clear incomplete-split flag on child page if this finishes
+	 * a split
+	 */
+	if (!isLeaf)
+	{
+		memcpy(&leftChildBlkno, payload, sizeof(BlockNumber));
+		payload += sizeof(BlockNumber);
+		memcpy(&rightChildBlkno, payload, sizeof(BlockNumber));
+		payload += sizeof(BlockNumber);
+
+		if (record->xl_info & XLR_BKP_BLOCK(0))
+			(void) RestoreBackupBlock(lsn, record, 0, false, false);
+		else
+			ginRedoClearIncompleteSplit(lsn, data->node, leftChildBlkno);
 	}
 
 	/* If we have a full-page image, restore it and we're done */
-	if (record->xl_info & XLR_BKP_BLOCK(0))
+	if (record->xl_info & XLR_BKP_BLOCK(isLeaf ? 0 : 1))
 	{
-		(void) RestoreBackupBlock(lsn, record, 0, false, false);
+		(void) RestoreBackupBlock(lsn, record, isLeaf ? 0 : 1, false, false);
 		return;
 	}
 
@@ -176,74 +208,88 @@ ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
 
 	if (lsn > PageGetLSN(page))
 	{
-		if (data->isData)
+		/* How to insert the payload is tree-type specific */
+		if (data->flags & GIN_SPLIT_ISDATA)
 		{
 			Assert(GinPageIsData(page));
-
-			if (data->isLeaf)
-			{
-				OffsetNumber i;
-				ItemPointerData *items = (ItemPointerData *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-
-				Assert(GinPageIsLeaf(page));
-				Assert(data->updateBlkno == InvalidBlockNumber);
-
-				for (i = 0; i < data->nitem; i++)
-					GinDataPageAddItemPointer(page, &items[i], data->offset + i);
-			}
-			else
-			{
-				PostingItem *pitem;
-
-				Assert(!GinPageIsLeaf(page));
-
-				if (data->updateBlkno != InvalidBlockNumber)
-				{
-					/* update link to right page after split */
-					pitem = GinDataPageGetPostingItem(page, data->offset);
-					PostingItemSetBlockNumber(pitem, data->updateBlkno);
-				}
-
-				pitem = (PostingItem *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-
-				GinDataPageAddPostingItem(page, pitem, data->offset);
-			}
+			ginRedoInsertData(buffer, data->offset, rightChildBlkno, payload);
 		}
 		else
 		{
-			IndexTuple	itup;
-
 			Assert(!GinPageIsData(page));
+			ginRedoInsertEntry(buffer, data->offset, rightChildBlkno, payload);
+		}
 
-			if (data->updateBlkno != InvalidBlockNumber)
-			{
-				/* update link to right page after split */
-				Assert(!GinPageIsLeaf(page));
-				Assert(data->offset >= FirstOffsetNumber && data->offset <= PageGetMaxOffsetNumber(page));
-				itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, data->offset));
-				GinSetDownlink(itup, data->updateBlkno);
-			}
+		PageSetLSN(page, lsn);
+		MarkBufferDirty(buffer);
+	}
 
-			if (data->isDelete)
-			{
-				Assert(GinPageIsLeaf(page));
-				Assert(data->offset >= FirstOffsetNumber && data->offset <= PageGetMaxOffsetNumber(page));
-				PageIndexTupleDelete(page, data->offset);
-			}
+	UnlockReleaseBuffer(buffer);
+}
 
-			itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogInsert));
+static void
+ginRedoSplitEntry(Buffer lbuffer, Buffer rbuffer, void *rdata)
+{
+	Page		lpage = BufferGetPage(lbuffer);
+	Page		rpage = BufferGetPage(rbuffer);
+	ginxlogSplitEntry *data = (ginxlogSplitEntry *) rdata;
+	IndexTuple	itup = (IndexTuple) ((char *) rdata + sizeof(ginxlogSplitEntry));
+	OffsetNumber i;
 
-			if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), data->offset, false, false) == InvalidOffsetNumber)
-				elog(ERROR, "failed to add item to index page in %u/%u/%u",
-				  data->node.spcNode, data->node.dbNode, data->node.relNode);
-		}
+	for (i = 0; i < data->separator; i++)
+	{
+		if (PageAddItem(lpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+			elog(ERROR, "failed to add item to gin index page");
+		itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
+	}
 
-		PageSetLSN(page, lsn);
+	for (i = data->separator; i < data->nitem; i++)
+	{
+		if (PageAddItem(rpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+			elog(ERROR, "failed to add item to gin index page");
+		itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
+	}
+}
 
-		MarkBufferDirty(buffer);
+static void
+ginRedoSplitData(Buffer lbuffer, Buffer rbuffer, void *rdata)
+{
+	Page		lpage = BufferGetPage(lbuffer);
+	Page		rpage = BufferGetPage(rbuffer);
+	ginxlogSplitData *data = (ginxlogSplitData *) rdata;
+	bool		isleaf = GinPageIsLeaf(lpage);
+	char	   *ptr = (char *) rdata + sizeof(ginxlogSplit);
+	Size		sizeofitem = isleaf ? sizeof(ItemPointerData) : sizeof(PostingItem);
+	OffsetNumber i;
+	ItemPointer bound;
+
+	for (i = 0; i < data->separator; i++)
+	{
+		if (isleaf)
+			GinDataPageAddItemPointer(lpage, (ItemPointer) ptr, InvalidOffsetNumber);
+		else
+			GinDataPageAddPostingItem(lpage, (PostingItem *) ptr, InvalidOffsetNumber);
+		ptr += sizeofitem;
 	}
 
-	UnlockReleaseBuffer(buffer);
+	for (i = data->separator; i < data->nitem; i++)
+	{
+		if (isleaf)
+			GinDataPageAddItemPointer(rpage, (ItemPointer) ptr, InvalidOffsetNumber);
+		else
+			GinDataPageAddPostingItem(rpage, (PostingItem *) ptr, InvalidOffsetNumber);
+		ptr += sizeofitem;
+	}
+
+	/* set up right key */
+	bound = GinDataPageGetRightBound(lpage);
+	if (isleaf)
+		*bound = *GinDataPageGetItemPointer(lpage, GinPageGetOpaque(lpage)->maxoff);
+	else
+		*bound = GinDataPageGetPostingItem(lpage, GinPageGetOpaque(lpage)->maxoff)->key;
+
+	bound = GinDataPageGetRightBound(rpage);
+	*bound = data->rightbound;
 }
 
 static void
@@ -255,14 +301,30 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 	Page		lpage,
 				rpage;
 	uint32		flags = 0;
+	char	   *payload;
+	bool		isLeaf = (data->flags & GIN_SPLIT_ISLEAF) != 0;
+	bool		isData = (data->flags & GIN_SPLIT_ISDATA) != 0;
+	bool		isRoot = (data->flags & GIN_SPLIT_ROOT) != 0;
+
+	payload = XLogRecGetData(record) + sizeof(ginxlogSplit);
 
-	if (data->isLeaf)
+	/*
+	 * First clear incomplete-split flag on child page if this finishes
+	 * a split
+	 */
+	if (!isLeaf)
+	{
+		if (record->xl_info & XLR_BKP_BLOCK(0))
+			(void) RestoreBackupBlock(lsn, record, 0, false, false);
+		else
+			ginRedoClearIncompleteSplit(lsn, data->node, data->leftChildBlkno);
+	}
+
+	if (isLeaf)
 		flags |= GIN_LEAF;
-	if (data->isData)
-		flags |= GIN_DATA;
 
-	/* Backup blocks are not used in split records */
-	Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
+	if (isData)
+		flags |= GIN_DATA;
 
 	lbuffer = XLogReadBuffer(data->node, data->lblkno, true);
 	Assert(BufferIsValid(lbuffer));
@@ -277,62 +339,11 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 	GinPageGetOpaque(lpage)->rightlink = BufferGetBlockNumber(rbuffer);
 	GinPageGetOpaque(rpage)->rightlink = data->rrlink;
 
-	if (data->isData)
-	{
-		char	   *ptr = XLogRecGetData(record) + sizeof(ginxlogSplit);
-		Size		sizeofitem = GinSizeOfDataPageItem(lpage);
-		OffsetNumber i;
-		ItemPointer bound;
-
-		for (i = 0; i < data->separator; i++)
-		{
-			if (data->isLeaf)
-				GinDataPageAddItemPointer(lpage, (ItemPointer) ptr, InvalidOffsetNumber);
-			else
-				GinDataPageAddPostingItem(lpage, (PostingItem *) ptr, InvalidOffsetNumber);
-			ptr += sizeofitem;
-		}
-
-		for (i = data->separator; i < data->nitem; i++)
-		{
-			if (data->isLeaf)
-				GinDataPageAddItemPointer(rpage, (ItemPointer) ptr, InvalidOffsetNumber);
-			else
-				GinDataPageAddPostingItem(rpage, (PostingItem *) ptr, InvalidOffsetNumber);
-			ptr += sizeofitem;
-		}
-
-		/* set up right key */
-		bound = GinDataPageGetRightBound(lpage);
-		if (data->isLeaf)
-			*bound = *GinDataPageGetItemPointer(lpage, GinPageGetOpaque(lpage)->maxoff);
-		else
-			*bound = GinDataPageGetPostingItem(lpage, GinPageGetOpaque(lpage)->maxoff)->key;
-
-		bound = GinDataPageGetRightBound(rpage);
-		*bound = data->rightbound;
-	}
+	/* Do the tree-type specific portion to restore the page contents */
+	if (data->flags & GIN_SPLIT_ISDATA)
+		ginRedoSplitData(lbuffer, rbuffer, payload);
 	else
-	{
-		IndexTuple	itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogSplit));
-		OffsetNumber i;
-
-		for (i = 0; i < data->separator; i++)
-		{
-			if (PageAddItem(lpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
-				elog(ERROR, "failed to add item to index page in %u/%u/%u",
-				  data->node.spcNode, data->node.dbNode, data->node.relNode);
-			itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
-		}
-
-		for (i = data->separator; i < data->nitem; i++)
-		{
-			if (PageAddItem(rpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
-				elog(ERROR, "failed to add item to index page in %u/%u/%u",
-				  data->node.spcNode, data->node.dbNode, data->node.relNode);
-			itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
-		}
-	}
+		ginRedoSplitEntry(lbuffer, rbuffer, payload);
 
 	PageSetLSN(rpage, lsn);
 	MarkBufferDirty(rbuffer);
@@ -340,25 +351,31 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 	PageSetLSN(lpage, lsn);
 	MarkBufferDirty(lbuffer);
 
-	if (!data->isLeaf && data->updateBlkno != InvalidBlockNumber)
-		forgetIncompleteSplit(data->node, data->leftChildBlkno, data->updateBlkno);
-
-	if (data->isRootSplit)
+	if (isRoot)
 	{
-		Buffer		rootBuf = XLogReadBuffer(data->node, data->rootBlkno, true);
+		BlockNumber	rootBlkno = data->rrlink;
+		Buffer		rootBuf = XLogReadBuffer(data->node, rootBlkno, true);
 		Page		rootPage = BufferGetPage(rootBuf);
 
 		GinInitBuffer(rootBuf, flags & ~GIN_LEAF);
 
-		if (data->isData)
+		if (isData)
 		{
-			Assert(data->rootBlkno != GIN_ROOT_BLKNO);
-			ginDataFillRoot(NULL, rootBuf, lbuffer, rbuffer);
+			Assert(rootBlkno != GIN_ROOT_BLKNO);
+			ginDataFillRoot(NULL, BufferGetPage(rootBuf),
+							BufferGetBlockNumber(lbuffer),
+							BufferGetPage(lbuffer),
+							BufferGetBlockNumber(rbuffer),
+							BufferGetPage(rbuffer));
 		}
 		else
 		{
-			Assert(data->rootBlkno == GIN_ROOT_BLKNO);
-			ginEntryFillRoot(NULL, rootBuf, lbuffer, rbuffer);
+			Assert(rootBlkno == GIN_ROOT_BLKNO);
+			ginEntryFillRoot(NULL, BufferGetPage(rootBuf),
+							 BufferGetBlockNumber(lbuffer),
+							 BufferGetPage(lbuffer),
+							 BufferGetBlockNumber(rbuffer),
+							 BufferGetPage(rbuffer));
 		}
 
 		PageSetLSN(rootPage, lsn);
@@ -366,8 +383,6 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 		MarkBufferDirty(rootBuf);
 		UnlockReleaseBuffer(rootBuf);
 	}
-	else
-		pushIncompleteSplit(data->node, data->lblkno, data->rblkno, data->rootBlkno);
 
 	UnlockReleaseBuffer(rbuffer);
 	UnlockReleaseBuffer(lbuffer);
@@ -711,6 +726,7 @@ void
 gin_redo(XLogRecPtr lsn, XLogRecord *record)
 {
 	uint8		info = record->xl_info & ~XLR_INFO_MASK;
+	MemoryContext oldCtx;
 
 	/*
 	 * GIN indexes do not require any conflict processing. NB: If we ever
@@ -718,7 +734,7 @@ gin_redo(XLogRecPtr lsn, XLogRecord *record)
 	 * killed tuples outside VACUUM, we'll need to handle that here.
 	 */
 
-	topCtx = MemoryContextSwitchTo(opCtx);
+	oldCtx = MemoryContextSwitchTo(opCtx);
 	switch (info)
 	{
 		case XLOG_GIN_CREATE_INDEX:
@@ -751,15 +767,13 @@ gin_redo(XLogRecPtr lsn, XLogRecord *record)
 		default:
 			elog(PANIC, "gin_redo: unknown op code %u", info);
 	}
-	MemoryContextSwitchTo(topCtx);
+	MemoryContextSwitchTo(oldCtx);
 	MemoryContextReset(opCtx);
 }
 
 void
 gin_xlog_startup(void)
 {
-	incomplete_splits = NIL;
-
 	opCtx = AllocSetContextCreate(CurrentMemoryContext,
 								  "GIN recovery temporary context",
 								  ALLOCSET_DEFAULT_MINSIZE,
@@ -767,84 +781,8 @@ gin_xlog_startup(void)
 								  ALLOCSET_DEFAULT_MAXSIZE);
 }
 
-static void
-ginContinueSplit(ginIncompleteSplit *split)
-{
-	GinBtreeData btree;
-	GinState	ginstate;
-	Relation	reln;
-	Buffer		buffer;
-	GinBtreeStack stack;
-
-	/*
-	 * elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u",  split->rootBlkno,
-	 * split->leftBlkno, split->rightBlkno);
-	 */
-	buffer = XLogReadBuffer(split->node, split->leftBlkno, false);
-
-	/*
-	 * Failure should be impossible here, because we wrote the page earlier.
-	 */
-	if (!BufferIsValid(buffer))
-		elog(PANIC, "ginContinueSplit: left block %u not found",
-			 split->leftBlkno);
-
-	reln = CreateFakeRelcacheEntry(split->node);
-
-	if (split->rootBlkno == GIN_ROOT_BLKNO)
-	{
-		MemSet(&ginstate, 0, sizeof(ginstate));
-		ginstate.index = reln;
-
-		ginPrepareEntryScan(&btree,
-							InvalidOffsetNumber, (Datum) 0, GIN_CAT_NULL_KEY,
-							&ginstate);
-	}
-	else
-	{
-		ginPrepareDataScan(&btree, reln);
-	}
-
-	stack.blkno = split->leftBlkno;
-	stack.buffer = buffer;
-	stack.off = InvalidOffsetNumber;
-	stack.parent = NULL;
-
-	ginFindParents(&btree, &stack, split->rootBlkno);
-
-	btree.prepareDownlink(&btree, buffer);
-	ginInsertValue(&btree, stack.parent, NULL);
-
-	FreeFakeRelcacheEntry(reln);
-
-	UnlockReleaseBuffer(buffer);
-}
-
 void
 gin_xlog_cleanup(void)
 {
-	ListCell   *l;
-	MemoryContext topCtx;
-
-	topCtx = MemoryContextSwitchTo(opCtx);
-
-	foreach(l, incomplete_splits)
-	{
-		ginIncompleteSplit *split = (ginIncompleteSplit *) lfirst(l);
-
-		ginContinueSplit(split);
-		MemoryContextReset(opCtx);
-	}
-
-	MemoryContextSwitchTo(topCtx);
 	MemoryContextDelete(opCtx);
-	incomplete_splits = NIL;
-}
-
-bool
-gin_safe_restartpoint(void)
-{
-	if (incomplete_splits)
-		return false;
-	return true;
 }
diff --git a/src/backend/access/rmgrdesc/gindesc.c b/src/backend/access/rmgrdesc/gindesc.c
index 391f75f..c534c3a 100644
--- a/src/backend/access/rmgrdesc/gindesc.c
+++ b/src/backend/access/rmgrdesc/gindesc.c
@@ -41,20 +41,45 @@ gin_desc(StringInfo buf, uint8 xl_info, char *rec)
 			desc_node(buf, ((ginxlogCreatePostingTree *) rec)->node, ((ginxlogCreatePostingTree *) rec)->blkno);
 			break;
 		case XLOG_GIN_INSERT:
-			appendStringInfoString(buf, "Insert item, ");
-			desc_node(buf, ((ginxlogInsert *) rec)->node, ((ginxlogInsert *) rec)->blkno);
-			appendStringInfo(buf, " offset: %u nitem: %u isdata: %c isleaf %c isdelete %c updateBlkno:%u",
-							 ((ginxlogInsert *) rec)->offset,
-							 ((ginxlogInsert *) rec)->nitem,
-							 (((ginxlogInsert *) rec)->isData) ? 'T' : 'F',
-							 (((ginxlogInsert *) rec)->isLeaf) ? 'T' : 'F',
-							 (((ginxlogInsert *) rec)->isDelete) ? 'T' : 'F',
-							 ((ginxlogInsert *) rec)->updateBlkno);
+			{
+				ginxlogInsert *xlrec = (ginxlogInsert *) rec;
+				char	*payload = rec + sizeof(ginxlogInsert);
+
+				appendStringInfoString(buf, "Insert item, ");
+				desc_node(buf, xlrec->node, xlrec->blkno);
+				appendStringInfo(buf, " offset: %u isdata: %c isleaf: %c",
+								 xlrec->offset,
+								 (xlrec->flags & GIN_SPLIT_ISDATA) ? 'T' : 'F',
+								 (xlrec->flags & GIN_SPLIT_ISLEAF) ? 'T' : 'F');
+				if (!(xlrec->flags & GIN_SPLIT_ISLEAF))
+				{
+					BlockNumber leftChildBlkno;
+					BlockNumber rightChildBlkno;
+
+					memcpy(&leftChildBlkno, payload, sizeof(BlockNumber));
+					payload += sizeof(BlockNumber);
+					memcpy(&rightChildBlkno, payload, sizeof(BlockNumber));
+					payload += sizeof(BlockNumber);
+					appendStringInfo(buf, " children: %u/%u",
+									 leftChildBlkno, rightChildBlkno);
+				}
+				if (!(xlrec->flags & GIN_SPLIT_ISDATA))
+					appendStringInfo(buf, " isdelete: %c",
+									 (((ginxlogInsertEntry *) payload)->isDelete) ? 'T' : 'F');
+				else if (xlrec->flags & GIN_SPLIT_ISLEAF)
+					appendStringInfo(buf, " nitem: %u",
+									 (((ginxlogInsertDataLeaf *) payload)->nitem) ? 'T' : 'F');
+				else
+					appendStringInfo(buf, " pitem: %u-%u/%u",
+									 PostingItemGetBlockNumber((PostingItem *) payload),
+									 ItemPointerGetBlockNumber(&((PostingItem *) payload)->key),
+									 ItemPointerGetOffsetNumber(&((PostingItem *) payload)->key));
+			}
 			break;
 		case XLOG_GIN_SPLIT:
 			appendStringInfoString(buf, "Page split, ");
 			desc_node(buf, ((ginxlogSplit *) rec)->node, ((ginxlogSplit *) rec)->lblkno);
-			appendStringInfo(buf, " isrootsplit: %c", (((ginxlogSplit *) rec)->isRootSplit) ? 'T' : 'F');
+			appendStringInfo(buf, " isrootsplit: %c", (((ginxlogSplit *) rec)->flags & GIN_SPLIT_ROOT) ? 'T' : 'F');
 			break;
 		case XLOG_GIN_VACUUM_PAGE:
 			appendStringInfoString(buf, "Vacuum page, ");
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index b6cb48d..7dcb0e0 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -58,6 +58,5 @@ extern void gin_redo(XLogRecPtr lsn, XLogRecord *record);
 extern void gin_desc(StringInfo buf, uint8 xl_info, char *rec);
 extern void gin_xlog_startup(void);
 extern void gin_xlog_cleanup(void);
-extern bool gin_safe_restartpoint(void);
 
 #endif   /* GIN_H */
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3952935..9bce245 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -48,6 +48,7 @@ typedef GinPageOpaqueData *GinPageOpaque;
 #define GIN_META		  (1 << 3)
 #define GIN_LIST		  (1 << 4)
 #define GIN_LIST_FULLROW  (1 << 5)		/* makes sense only on GIN_LIST page */
+#define GIN_INCOMPLETE_SPLIT (1 << 6)	/* page was split, but parent not updated */
 
 /* Page numbers of fixed-location pages */
 #define GIN_METAPAGE_BLKNO	(0)
@@ -119,6 +120,7 @@ typedef struct GinMetaPageData
 #define GinPageIsDeleted(page) ( GinPageGetOpaque(page)->flags & GIN_DELETED)
 #define GinPageSetDeleted(page)    ( GinPageGetOpaque(page)->flags |= GIN_DELETED)
 #define GinPageSetNonDeleted(page) ( GinPageGetOpaque(page)->flags &= ~GIN_DELETED)
+#define GinPageIsIncompleteSplit(page) ( GinPageGetOpaque(page)->flags & GIN_INCOMPLETE_SPLIT)
 
 #define GinPageRightMost(page) ( GinPageGetOpaque(page)->rightlink == InvalidBlockNumber)
 
@@ -336,41 +338,77 @@ typedef struct ginxlogInsert
 {
 	RelFileNode node;
 	BlockNumber blkno;
-	BlockNumber updateBlkno;
+	uint16		flags;		/* GIN_SPLIT_ISLEAF and/or GIN_SPLIT_ISDATA */
 	OffsetNumber offset;
-	bool		isDelete;
-	bool		isData;
-	bool		isLeaf;
-	OffsetNumber nitem;
 
 	/*
-	 * follows: tuples or ItemPointerData or PostingItem or list of
-	 * ItemPointerData
+	 * FOLLOWS:
+	 * if !isLeaf, left and right child block numbers of the child pages whose
+	 * split this insertion finishes. (non-aligned!)
+	 */
+
+	/*
+	 * follows: one of the following structs, depending on tree type.
+	 *
+	 * NB: the below structs are only 16-bit aligned when appended to a
+	 * ginxlogInsert struct! Beware of adding fields to them that require
+	 * stricter alignment.
 	 */
 } ginxlogInsert;
 
+typedef struct
+{
+	bool		isDelete;
+	IndexTupleData tuple;	/* variable length */
+} ginxlogInsertEntry;
+
+typedef struct
+{
+	OffsetNumber nitem;
+	ItemPointerData items[1]; /* variable length */
+} ginxlogInsertDataLeaf;
+
+/* In an insert to an internal data page, the payload is a PostingItem */
+
+
 #define XLOG_GIN_SPLIT	0x30
 
 typedef struct ginxlogSplit
 {
 	RelFileNode node;
 	BlockNumber lblkno;
-	BlockNumber rootBlkno;
 	BlockNumber rblkno;
-	BlockNumber rrlink;
+	BlockNumber rrlink;				/* right link, or root's blocknumber if root split */
+	BlockNumber	leftChildBlkno;		/* valid on a non-leaf split */
+	BlockNumber	rightChildBlkno;
+	uint16		flags;
+
+	/* follows: one of the following structs */
+} ginxlogSplit;
+
+/*
+ * Flags used in ginxlogInsert and ginxlogSplit records
+ */
+#define GIN_SPLIT_ISDATA	0x01
+#define GIN_SPLIT_ISLEAF	0x02
+#define GIN_SPLIT_ROOT		0x04
+
+typedef struct
+{
 	OffsetNumber separator;
 	OffsetNumber nitem;
 
-	bool		isData;
-	bool		isLeaf;
-	bool		isRootSplit;
+	/* FOLLOWS: IndexTuples */
+} ginxlogSplitEntry;
 
-	BlockNumber leftChildBlkno;
-	BlockNumber updateBlkno;
+typedef struct
+{
+	OffsetNumber separator;
+	OffsetNumber nitem;
+	ItemPointerData rightbound;
 
-	ItemPointerData rightbound; /* used only in posting tree */
-	/* follows: list of tuple or ItemPointerData or PostingItem */
-} ginxlogSplit;
+	/* FOLLOWS: array of ItemPointers (for leaf) or PostingItems (non-leaf) */
+} ginxlogSplitData;
 
 #define XLOG_GIN_VACUUM_PAGE	0x40
 
@@ -485,10 +523,10 @@ typedef struct GinBtreeData
 
 	/* insert methods */
 	OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
-	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, XLogRecData **);
-	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, XLogRecData **);
-	void		(*prepareDownlink) (GinBtree, Buffer);
-	void		(*fillRoot) (GinBtree, Buffer, Buffer, Buffer);
+	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, void *, BlockNumber, XLogRecData **);
+	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, void *, BlockNumber, XLogRecData **);
+	void		*(*prepareDownlink) (GinBtree, Buffer);
+	void		(*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, Page);
 
 	bool		isData;
 
@@ -497,29 +535,37 @@ typedef struct GinBtreeData
 	bool		fullScan;
 	bool		isBuild;
 
-	BlockNumber rightblkno;
-
-	/* Entry options */
+	/* Search key for Entry tree */
 	OffsetNumber entryAttnum;
 	Datum		entryKey;
 	GinNullCategory entryCategory;
+
+	/* Search key for data tree (posting tree) */
+	ItemPointerData itemptr;
+} GinBtreeData;
+
+/* Entry options */
+typedef struct
+{
 	IndexTuple	entry;
 	bool		isDelete;
+} GinBtreeEntryInsertData;
 
-	/* Data (posting tree) options */
+/* For data (posting tree) leaves */
+typedef struct
+{
 	ItemPointerData *items;
 	uint32		nitem;
 	uint32		curitem;
+} GinBtreeDataLeafInsertData;
 
-	PostingItem pitem;
-} GinBtreeData;
+/* For internal data (posting tree) pages, use PostingItem */
 
 extern GinBtreeStack *ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode);
 extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
-			   GinStatsData *buildStats);
-extern void ginFindParents(GinBtree btree, GinBtreeStack *stack, BlockNumber rootBlkno);
+			   void *insertdata, GinStatsData *buildStats);
 
 /* ginentrypage.c */
 extern IndexTuple GinFormTuple(GinState *ginstate,
@@ -529,7 +575,7 @@ extern void GinShortenTuple(IndexTuple itup, uint32 nipd);
 extern void ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 					Datum key, GinNullCategory category,
 					GinState *ginstate);
-extern void ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
+extern void ginEntryFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 
 /* gindatapage.c */
 extern BlockNumber createPostingTree(Relation index,
@@ -542,7 +588,7 @@ extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats);
 extern GinBtreeStack *ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno);
-extern void ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
+extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 extern void ginPrepareDataScan(GinBtree btree, Relation index);
 
 /* ginscan.c */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 7ad71b3..166689d 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -38,7 +38,7 @@ PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL)
 PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, NULL, NULL, NULL)
 PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint)
 PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, NULL, NULL, NULL)
-PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, gin_safe_restartpoint)
+PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, NULL)
 PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL)
 PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, NULL, NULL, NULL)
 PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_xlog_startup, spg_xlog_cleanup, NULL)
-- 
1.8.4.rc3

0001-Further-GIN-refactoring.patchtext/x-diff; name=0001-Further-GIN-refactoring.patchDownload
>From 38760dd746cc5f345edf9a29654b9b2917377a44 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 13 Nov 2013 18:15:14 +0200
Subject: [PATCH 1/4] Further GIN refactoring.

Merge some functions that were always called together. Makes the code
little bit more readable.
---
 src/backend/access/gin/ginbtree.c     | 54 +++++++++---------------
 src/backend/access/gin/gindatapage.c  | 79 ++++++++++++++---------------------
 src/backend/access/gin/ginentrypage.c |  5 +--
 src/backend/access/gin/ginget.c       | 24 +++++------
 src/backend/access/gin/gininsert.c    | 21 ++++------
 src/include/access/gin_private.h      | 19 ++-------
 6 files changed, 74 insertions(+), 128 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 010812d..0d93c52 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -52,52 +52,38 @@ ginTraverseLock(Buffer buffer, bool searchMode)
 	return access;
 }
 
-GinBtreeStack *
-ginPrepareFindLeafPage(GinBtree btree, BlockNumber blkno)
-{
-	GinBtreeStack *stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
-
-	stack->blkno = blkno;
-	stack->buffer = ReadBuffer(btree->index, stack->blkno);
-	stack->parent = NULL;
-	stack->predictNumber = 1;
-
-	ginTraverseLock(stack->buffer, btree->searchMode);
-
-	return stack;
-}
-
 /*
- * Locates leaf page contained tuple
+ * Descends the tree to the leaf page that contains or would contain the
+ * key we're searching for. The key should already be filled in 'btree',
+ * in tree-type specific manner. If btree->fullScan is true, descends to the
+ * leftmost leaf page.
+ *
+ * If 'searchmode' is false, on return stack->buffer is exclusively locked,
+ * and the stack represents the full path to the root. Otherwise stack->buffer
+ * is share-locked, and stack->parent is NULL.
  */
 GinBtreeStack *
-ginFindLeafPage(GinBtree btree, GinBtreeStack *stack)
+ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode)
 {
-	bool		isfirst = TRUE;
-	BlockNumber rootBlkno;
+	GinBtreeStack *stack;
 
-	if (!stack)
-		stack = ginPrepareFindLeafPage(btree, GIN_ROOT_BLKNO);
-	rootBlkno = stack->blkno;
+	stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
+	stack->blkno = rootBlkno;
+	stack->buffer = ReadBuffer(btree->index, rootBlkno);
+	stack->parent = NULL;
+	stack->predictNumber = 1;
 
 	for (;;)
 	{
 		Page		page;
 		BlockNumber child;
-		int			access = GIN_SHARE;
+		int			access;
 
 		stack->off = InvalidOffsetNumber;
 
 		page = BufferGetPage(stack->buffer);
 
-		if (isfirst)
-		{
-			if (GinPageIsLeaf(page) && !btree->searchMode)
-				access = GIN_EXCLUSIVE;
-			isfirst = FALSE;
-		}
-		else
-			access = ginTraverseLock(stack->buffer, btree->searchMode);
+		access = ginTraverseLock(stack->buffer, searchMode);
 
 		/*
 		 * ok, page is correctly locked, we should check to move right ..,
@@ -127,7 +113,7 @@ ginFindLeafPage(GinBtree btree, GinBtreeStack *stack)
 		Assert(child != InvalidBlockNumber);
 		Assert(stack->blkno != child);
 
-		if (btree->searchMode)
+		if (searchMode)
 		{
 			/* in search mode we may forget path to leaf */
 			stack->blkno = child;
@@ -251,7 +237,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 		return;
 	}
 
-	leftmostBlkno = blkno = btree->getLeftMostPage(btree, page);
+	leftmostBlkno = blkno = btree->getLeftMostChild(btree, page);
 	LockBuffer(root->buffer, GIN_UNLOCK);
 	Assert(blkno != InvalidBlockNumber);
 
@@ -263,7 +249,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 		if (GinPageIsLeaf(page))
 			elog(ERROR, "Lost path");
 
-		leftmostBlkno = btree->getLeftMostPage(btree, page);
+		leftmostBlkno = btree->getLeftMostChild(btree, page);
 
 		while ((offset = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) == InvalidOffsetNumber)
 		{
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index c9506cc..221b39e 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -54,7 +54,7 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
 	{
 		stack->off = FirstOffsetNumber;
 		stack->predictNumber *= GinPageGetOpaque(page)->maxoff;
-		return btree->getLeftMostPage(btree, page);
+		return btree->getLeftMostChild(btree, page);
 	}
 
 	low = FirstOffsetNumber;
@@ -680,17 +680,10 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
 	 */
 	if (nitems > nrootitems)
 	{
-		GinPostingTreeScan *gdi;
-
-		gdi = ginPrepareScanPostingTree(index, blkno, FALSE);
-		gdi->btree.isBuild = (buildStats != NULL);
-
-		ginInsertItemPointers(gdi,
+		ginInsertItemPointers(index, blkno,
 							  items + nrootitems,
 							  nitems - nrootitems,
 							  buildStats);
-
-		pfree(gdi);
 	}
 
 	return blkno;
@@ -704,76 +697,66 @@ ginPrepareDataScan(GinBtree btree, Relation index)
 	btree->index = index;
 
 	btree->findChildPage = dataLocateItem;
+	btree->getLeftMostChild = dataGetLeftMostPage;
 	btree->isMoveRight = dataIsMoveRight;
 	btree->findItem = dataLocateLeafItem;
 	btree->findChildPtr = dataFindChildPtr;
-	btree->getLeftMostPage = dataGetLeftMostPage;
 	btree->placeToPage = dataPlaceToPage;
 	btree->splitPage = dataSplitPage;
 	btree->fillRoot = ginDataFillRoot;
 
 	btree->isData = TRUE;
-	btree->searchMode = FALSE;
 	btree->isDelete = FALSE;
 	btree->fullScan = FALSE;
 	btree->isBuild = FALSE;
 }
 
-GinPostingTreeScan *
-ginPrepareScanPostingTree(Relation index, BlockNumber rootBlkno, bool searchMode)
-{
-	GinPostingTreeScan *gdi = (GinPostingTreeScan *) palloc0(sizeof(GinPostingTreeScan));
-
-	ginPrepareDataScan(&gdi->btree, index);
-
-	gdi->btree.searchMode = searchMode;
-	gdi->btree.fullScan = searchMode;
-
-	gdi->stack = ginPrepareFindLeafPage(&gdi->btree, rootBlkno);
-
-	return gdi;
-}
-
 /*
  * Inserts array of item pointers, may execute several tree scan (very rare)
  */
 void
-ginInsertItemPointers(GinPostingTreeScan *gdi,
+ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats)
 {
-	BlockNumber rootBlkno = gdi->stack->blkno;
+	GinBtreeData btree;
+	GinBtreeStack *stack;
 
-	gdi->btree.items = items;
-	gdi->btree.nitem = nitem;
-	gdi->btree.curitem = 0;
+	ginPrepareDataScan(&btree, index);
+	btree.isBuild = (buildStats != NULL);
+	btree.items = items;
+	btree.nitem = nitem;
+	btree.curitem = 0;
 
-	while (gdi->btree.curitem < gdi->btree.nitem)
+	while (btree.curitem < btree.nitem)
 	{
-		if (!gdi->stack)
-			gdi->stack = ginPrepareFindLeafPage(&gdi->btree, rootBlkno);
+		stack = ginFindLeafPage(&btree, rootBlkno, false);
 
-		gdi->stack = ginFindLeafPage(&gdi->btree, gdi->stack);
-
-		if (gdi->btree.findItem(&(gdi->btree), gdi->stack))
+		if (btree.findItem(&btree, stack))
 		{
 			/*
-			 * gdi->btree.items[gdi->btree.curitem] already exists in index
+			 * btree.items[btree.curitem] already exists in index
 			 */
-			gdi->btree.curitem++;
-			LockBuffer(gdi->stack->buffer, GIN_UNLOCK);
-			freeGinBtreeStack(gdi->stack);
+			btree.curitem++;
+			LockBuffer(stack->buffer, GIN_UNLOCK);
+			freeGinBtreeStack(stack);
 		}
 		else
-			ginInsertValue(&(gdi->btree), gdi->stack, buildStats);
-
-		gdi->stack = NULL;
+			ginInsertValue(&btree, stack, buildStats);
 	}
 }
 
-Buffer
-ginScanBeginPostingTree(GinPostingTreeScan *gdi)
+GinBtreeStack *
+ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno)
 {
-	gdi->stack = ginFindLeafPage(&gdi->btree, gdi->stack);
-	return gdi->stack->buffer;
+	GinBtreeData btree;
+	GinBtreeStack *stack;
+
+	ginPrepareDataScan(&btree, index);
+
+	btree.fullScan = TRUE;
+
+	stack = ginFindLeafPage(&btree, rootBlkno, TRUE);
+
+	return stack;
 }
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 0ed0a3d..a22fd32 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -258,7 +258,7 @@ entryLocateEntry(GinBtree btree, GinBtreeStack *stack)
 	{
 		stack->off = FirstOffsetNumber;
 		stack->predictNumber *= PageGetMaxOffsetNumber(page);
-		return btree->getLeftMostPage(btree, page);
+		return btree->getLeftMostChild(btree, page);
 	}
 
 	low = FirstOffsetNumber;
@@ -729,16 +729,15 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 	btree->ginstate = ginstate;
 
 	btree->findChildPage = entryLocateEntry;
+	btree->getLeftMostChild = entryGetLeftMostPage;
 	btree->isMoveRight = entryIsMoveRight;
 	btree->findItem = entryLocateLeafEntry;
 	btree->findChildPtr = entryFindChildPtr;
-	btree->getLeftMostPage = entryGetLeftMostPage;
 	btree->placeToPage = entryPlaceToPage;
 	btree->splitPage = entrySplitPage;
 	btree->fillRoot = ginEntryFillRoot;
 
 	btree->isData = FALSE;
-	btree->searchMode = FALSE;
 	btree->fullScan = FALSE;
 	btree->isBuild = FALSE;
 
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index acbdc5f..2a3991f 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -124,18 +124,16 @@ static void
 scanPostingTree(Relation index, GinScanEntry scanEntry,
 				BlockNumber rootPostingTree)
 {
-	GinPostingTreeScan *gdi;
+	GinBtreeStack *stack;
 	Buffer		buffer;
 	Page		page;
 
 	/* Descend to the leftmost leaf page */
-	gdi = ginPrepareScanPostingTree(index, rootPostingTree, TRUE);
-
-	buffer = ginScanBeginPostingTree(gdi);
+	stack = ginScanBeginPostingTree(index, rootPostingTree);
+	buffer = stack->buffer;
 	IncrBufferRefCount(buffer); /* prevent unpin in freeGinBtreeStack */
 
-	freeGinBtreeStack(gdi->stack);
-	pfree(gdi);
+	freeGinBtreeStack(stack);
 
 	/*
 	 * Loop iterates through all leaf pages of posting tree
@@ -376,8 +374,7 @@ restartScanEntry:
 	ginPrepareEntryScan(&btreeEntry, entry->attnum,
 						entry->queryKey, entry->queryCategory,
 						ginstate);
-	btreeEntry.searchMode = TRUE;
-	stackEntry = ginFindLeafPage(&btreeEntry, NULL);
+	stackEntry = ginFindLeafPage(&btreeEntry, GIN_ROOT_BLKNO, true);
 	page = BufferGetPage(stackEntry->buffer);
 	needUnlock = TRUE;
 
@@ -427,7 +424,7 @@ restartScanEntry:
 		if (GinIsPostingTree(itup))
 		{
 			BlockNumber rootPostingTree = GinGetPostingTree(itup);
-			GinPostingTreeScan *gdi;
+			GinBtreeStack *stack;
 			Page		page;
 
 			/*
@@ -439,9 +436,9 @@ restartScanEntry:
 			 */
 			LockBuffer(stackEntry->buffer, GIN_UNLOCK);
 			needUnlock = FALSE;
-			gdi = ginPrepareScanPostingTree(ginstate->index, rootPostingTree, TRUE);
 
-			entry->buffer = ginScanBeginPostingTree(gdi);
+			stack = ginScanBeginPostingTree(ginstate->index, rootPostingTree);
+			entry->buffer = stack->buffer;
 
 			/*
 			 * We keep buffer pinned because we need to prevent deletion of
@@ -451,7 +448,7 @@ restartScanEntry:
 			IncrBufferRefCount(entry->buffer);
 
 			page = BufferGetPage(entry->buffer);
-			entry->predictNumberResult = gdi->stack->predictNumber * GinPageGetOpaque(page)->maxoff;
+			entry->predictNumberResult = stack->predictNumber * GinPageGetOpaque(page)->maxoff;
 
 			/*
 			 * Keep page content in memory to prevent durable page locking
@@ -463,8 +460,7 @@ restartScanEntry:
 				   GinPageGetOpaque(page)->maxoff * sizeof(ItemPointerData));
 
 			LockBuffer(entry->buffer, GIN_UNLOCK);
-			freeGinBtreeStack(gdi->stack);
-			pfree(gdi);
+			freeGinBtreeStack(stack);
 			entry->isFinished = FALSE;
 		}
 		else if (GinGetNPosting(itup) > 0)
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fb7b4ea..0a2883a 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -81,7 +81,6 @@ addItemPointersToLeafTuple(GinState *ginstate,
 	{
 		/* posting list would be too big, convert to posting tree */
 		BlockNumber postingRoot;
-		GinPostingTreeScan *gdi;
 
 		/*
 		 * Initialize posting tree with the old tuple's posting list.  It's
@@ -94,12 +93,9 @@ addItemPointersToLeafTuple(GinState *ginstate,
 										buildStats);
 
 		/* Now insert the TIDs-to-be-added into the posting tree */
-		gdi = ginPrepareScanPostingTree(ginstate->index, postingRoot, FALSE);
-		gdi->btree.isBuild = (buildStats != NULL);
-
-		ginInsertItemPointers(gdi, items, nitem, buildStats);
-
-		pfree(gdi);
+		ginInsertItemPointers(ginstate->index, postingRoot,
+							  items, nitem,
+							  buildStats);
 
 		/* And build a new posting-tree-only result tuple */
 		res = GinFormTuple(ginstate, attnum, key, category, NULL, 0, true);
@@ -177,7 +173,7 @@ ginEntryInsert(GinState *ginstate,
 
 	ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
 
-	stack = ginFindLeafPage(&btree, NULL);
+	stack = ginFindLeafPage(&btree, GIN_ROOT_BLKNO, false);
 	page = BufferGetPage(stack->buffer);
 
 	if (btree.findItem(&btree, stack))
@@ -189,18 +185,15 @@ ginEntryInsert(GinState *ginstate,
 		{
 			/* add entries to existing posting tree */
 			BlockNumber rootPostingTree = GinGetPostingTree(itup);
-			GinPostingTreeScan *gdi;
 
 			/* release all stack */
 			LockBuffer(stack->buffer, GIN_UNLOCK);
 			freeGinBtreeStack(stack);
 
 			/* insert into posting tree */
-			gdi = ginPrepareScanPostingTree(ginstate->index, rootPostingTree, FALSE);
-			gdi->btree.isBuild = (buildStats != NULL);
-			ginInsertItemPointers(gdi, items, nitem, buildStats);
-			pfree(gdi);
-
+			ginInsertItemPointers(ginstate->index, rootPostingTree,
+								  items, nitem,
+								  buildStats);
 			return;
 		}
 
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index d22cc62..7491d7c 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -479,18 +479,17 @@ typedef struct GinBtreeData
 {
 	/* search methods */
 	BlockNumber (*findChildPage) (GinBtree, GinBtreeStack *);
+	BlockNumber (*getLeftMostChild) (GinBtree, Page);
 	bool		(*isMoveRight) (GinBtree, Page);
 	bool		(*findItem) (GinBtree, GinBtreeStack *);
 
 	/* insert methods */
 	OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
-	BlockNumber (*getLeftMostPage) (GinBtree, Page);
 	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, XLogRecData **);
 	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, XLogRecData **);
 	void		(*fillRoot) (GinBtree, Buffer, Buffer, Buffer);
 
 	bool		isData;
-	bool		searchMode;
 
 	Relation	index;
 	GinState   *ginstate;		/* not valid in a data scan */
@@ -514,8 +513,7 @@ typedef struct GinBtreeData
 	PostingItem pitem;
 } GinBtreeData;
 
-extern GinBtreeStack *ginPrepareFindLeafPage(GinBtree btree, BlockNumber blkno);
-extern GinBtreeStack *ginFindLeafPage(GinBtree btree, GinBtreeStack *stack);
+extern GinBtreeStack *ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode);
 extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
@@ -540,19 +538,10 @@ extern BlockNumber createPostingTree(Relation index,
 extern void GinDataPageAddItemPointer(Page page, ItemPointer data, OffsetNumber offset);
 extern void GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
 extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
-
-typedef struct
-{
-	GinBtreeData btree;
-	GinBtreeStack *stack;
-} GinPostingTreeScan;
-
-extern GinPostingTreeScan *ginPrepareScanPostingTree(Relation index,
-						  BlockNumber rootBlkno, bool searchMode);
-extern void ginInsertItemPointers(GinPostingTreeScan *gdi,
+extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats);
-extern Buffer ginScanBeginPostingTree(GinPostingTreeScan *gdi);
+extern GinBtreeStack *ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno);
 extern void ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
 extern void ginPrepareDataScan(GinBtree btree, Relation index);
 
-- 
1.8.4.rc3

0002-Refactor-the-internal-GIN-B-tree-interface-for-formi.patchtext/x-diff; name=0002-Refactor-the-internal-GIN-B-tree-interface-for-formi.patchDownload
>From fa3ecd275f51e073a385b5124763588cfb268e11 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 13 Nov 2013 18:15:22 +0200
Subject: [PATCH 2/4] Refactor the internal GIN B-tree interface for forming a
 downlink.

This creates a new gin-btree callback function for creating a downlink for
a page. Previously, ginxlog.c duplicated the logic used during normal
operation.
---
 src/backend/access/gin/ginbtree.c     |  1 +
 src/backend/access/gin/gindatapage.c  | 17 +++++++++++----
 src/backend/access/gin/ginentrypage.c | 40 ++++++++++++++++-------------------
 src/backend/access/gin/ginxlog.c      | 15 ++-----------
 src/include/access/gin_private.h      |  2 +-
 5 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 0d93c52..c4801d5 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -452,6 +452,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 			}
 		}
 
+		btree->prepareDownlink(btree, stack->buffer);
 		btree->isDelete = FALSE;
 
 		/* search parent to lock */
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 221b39e..908e28f 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -580,12 +580,20 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 	rdata[1].len = MAXALIGN(maxoff * sizeofitem);
 	rdata[1].next = NULL;
 
-	/* Prepare a downlink tuple for insertion to the parent */
+	return lpage;
+}
+
+/*
+ * Prepare the state in 'btree' for inserting a downlink for given buffer.
+ */
+static void
+dataPrepareDownlink(GinBtree btree, Buffer lbuf)
+{
+	Page		lpage = BufferGetPage(lbuf);
+
 	PostingItemSetBlockNumber(&(btree->pitem), BufferGetBlockNumber(lbuf));
 	btree->pitem.key = *GinDataPageGetRightBound(lpage);
-	btree->rightblkno = BufferGetBlockNumber(rbuf);
-
-	return lpage;
+	btree->rightblkno = GinPageGetOpaque(lpage)->rightlink;
 }
 
 /*
@@ -704,6 +712,7 @@ ginPrepareDataScan(GinBtree btree, Relation index)
 	btree->placeToPage = dataPlaceToPage;
 	btree->splitPage = dataSplitPage;
 	btree->fillRoot = ginDataFillRoot;
+	btree->prepareDownlink = dataPrepareDownlink;
 
 	btree->isData = TRUE;
 	btree->isDelete = FALSE;
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index a22fd32..378bce1 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -570,8 +570,7 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 	Size		lsize = 0,
 				size;
 	char	   *ptr;
-	IndexTuple	itup,
-				leftrightmost = NULL;
+	IndexTuple	itup;
 	Page		page;
 	Page		lpage = PageGetTempPageCopy(BufferGetPage(lbuf));
 	Page		rpage = BufferGetPage(rbuf);
@@ -635,7 +634,6 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 		}
 		else
 		{
-			leftrightmost = itup;
 			lsize += MAXALIGN(IndexTupleSize(itup)) + sizeof(ItemIdData);
 		}
 
@@ -645,11 +643,6 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 		ptr += MAXALIGN(IndexTupleSize(itup));
 	}
 
-	btree->entry = GinFormInteriorTuple(leftrightmost, lpage,
-										BufferGetBlockNumber(lbuf));
-
-	btree->rightblkno = BufferGetBlockNumber(rbuf);
-
 	data.node = btree->index->rd_node;
 	data.rootBlkno = InvalidBlockNumber;
 	data.lblkno = BufferGetBlockNumber(lbuf);
@@ -674,19 +667,19 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 }
 
 /*
- * return newly allocated rightmost tuple
+ * Prepare the state in 'btree' for inserting a downlink for given buffer.
  */
-IndexTuple
-ginPageGetLinkItup(Buffer buf)
+static void
+entryPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
-	IndexTuple	itup,
-				nitup;
-	Page		page = BufferGetPage(buf);
+	Page		lpage = BufferGetPage(lbuf);
+	IndexTuple	itup;
 
-	itup = getRightMostTuple(page);
-	nitup = GinFormInteriorTuple(itup, page, BufferGetBlockNumber(buf));
+	itup = getRightMostTuple(lpage);
 
-	return nitup;
+	btree->entry = GinFormInteriorTuple(itup, lpage,
+										BufferGetBlockNumber(lbuf));
+	btree->rightblkno = GinPageGetOpaque(lpage)->rightlink;
 }
 
 /*
@@ -696,17 +689,19 @@ ginPageGetLinkItup(Buffer buf)
 void
 ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
 {
-	Page		page;
+	Page		page = BufferGetPage(root);
+	Page		lpage = BufferGetPage(lbuf);
+	Page		rpage = BufferGetPage(rbuf);
 	IndexTuple	itup;
 
-	page = BufferGetPage(root);
-
-	itup = ginPageGetLinkItup(lbuf);
+	itup = GinFormInteriorTuple(getRightMostTuple(lpage), lpage,
+								BufferGetBlockNumber(lbuf));
 	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
 		elog(ERROR, "failed to add item to index root page");
 	pfree(itup);
 
-	itup = ginPageGetLinkItup(rbuf);
+	itup = GinFormInteriorTuple(getRightMostTuple(rpage), rpage,
+								BufferGetBlockNumber(rbuf));
 	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
 		elog(ERROR, "failed to add item to index root page");
 	pfree(itup);
@@ -736,6 +731,7 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 	btree->placeToPage = entryPlaceToPage;
 	btree->splitPage = entrySplitPage;
 	btree->fillRoot = ginEntryFillRoot;
+	btree->prepareDownlink = entryPrepareDownlink;
 
 	btree->isData = FALSE;
 	btree->fullScan = FALSE;
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 4d0ccb8..ddac343 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -799,31 +799,20 @@ ginContinueSplit(ginIncompleteSplit *split)
 		ginPrepareEntryScan(&btree,
 							InvalidOffsetNumber, (Datum) 0, GIN_CAT_NULL_KEY,
 							&ginstate);
-		btree.entry = ginPageGetLinkItup(buffer);
 	}
 	else
 	{
-		Page		page = BufferGetPage(buffer);
-
 		ginPrepareDataScan(&btree, reln);
-
-		PostingItemSetBlockNumber(&(btree.pitem), split->leftBlkno);
-		if (GinPageIsLeaf(page))
-			btree.pitem.key = *GinDataPageGetItemPointer(page,
-														 GinPageGetOpaque(page)->maxoff);
-		else
-			btree.pitem.key = GinDataPageGetPostingItem(page,
-														GinPageGetOpaque(page)->maxoff)->key;
 	}
 
-	btree.rightblkno = split->rightBlkno;
-
 	stack.blkno = split->leftBlkno;
 	stack.buffer = buffer;
 	stack.off = InvalidOffsetNumber;
 	stack.parent = NULL;
 
 	ginFindParents(&btree, &stack, split->rootBlkno);
+
+	btree.prepareDownlink(&btree, buffer);
 	ginInsertValue(&btree, stack.parent, NULL);
 
 	FreeFakeRelcacheEntry(reln);
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 7491d7c..3952935 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -487,6 +487,7 @@ typedef struct GinBtreeData
 	OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
 	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, XLogRecData **);
 	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, XLogRecData **);
+	void		(*prepareDownlink) (GinBtree, Buffer);
 	void		(*fillRoot) (GinBtree, Buffer, Buffer, Buffer);
 
 	bool		isData;
@@ -529,7 +530,6 @@ extern void ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 					Datum key, GinNullCategory category,
 					GinState *ginstate);
 extern void ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
-extern IndexTuple ginPageGetLinkItup(Buffer buf);
 
 /* gindatapage.c */
 extern BlockNumber createPostingTree(Relation index,
-- 
1.8.4.rc3

0003-More-GIN-refactoring.patchtext/x-diff; name=0003-More-GIN-refactoring.patchDownload
>From ac038643d402474522ac38118a379d7523231fbd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 13 Nov 2013 18:15:22 +0200
Subject: [PATCH 3/4] More GIN refactoring.

Split off the portion of ginInsertValue that inserts the tuple to
current level into a separate function, ginPlaceToPage. ginInsertValue's
charter is now to recurse up the tree to insert the downlink, when a page
split is required.

This is in preparation for a patch to change the way incomplete splits
are handled, which will need to do these operations separately. And IMHO
makes the code more readable anyway.
---
 src/backend/access/gin/ginbtree.c | 276 ++++++++++++++++++++------------------
 1 file changed, 148 insertions(+), 128 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index c4801d5..7248b06 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -280,80 +280,115 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 }
 
 /*
- * Insert value (stored in GinBtree) to tree described by stack
- *
- * During an index build, buildStats is non-null and the counters
- * it contains are incremented as needed.
+ * Returns true if the insertion is done, false if the page was split and
+ * downlink insertion is pending.
  *
- * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
+ * stack->buffer is locked on entry, and is kept locked.
  */
-void
-ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
+static bool
+ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
+			   GinStatsData *buildStats)
 {
-	GinBtreeStack *parent;
-	BlockNumber rootBlkno;
-	Page		page,
-				rpage,
-				lpage;
+	Page		page = BufferGetPage(stack->buffer);
+	XLogRecData *rdata;
+	bool		fit;
 
-	/* extract root BlockNumber from stack */
-	Assert(stack != NULL);
-	parent = stack;
-	while (parent->parent)
-		parent = parent->parent;
-	rootBlkno = parent->blkno;
-	Assert(BlockNumberIsValid(rootBlkno));
+	START_CRIT_SECTION();
+	fit = btree->placeToPage(btree, stack->buffer, stack->off, &rdata);
+	if (fit)
+	{
+		MarkBufferDirty(stack->buffer);
 
-	/* this loop crawls up the stack until the insertion is complete */
-	for (;;)
+		if (RelationNeedsWAL(btree->index))
+		{
+			XLogRecPtr	recptr;
+
+			recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT, rdata);
+			PageSetLSN(page, recptr);
+		}
+
+		END_CRIT_SECTION();
+
+		return true;
+	}
+	else
 	{
-		XLogRecData *rdata;
+		/* Didn't fit, have to split */
+		Buffer		rbuffer;
+		Page		newlpage;
 		BlockNumber savedRightLink;
-		bool		fit;
+		GinBtreeStack *parent;
+		Page		lpage,
+					rpage;
+
+		END_CRIT_SECTION();
+
+		rbuffer = GinNewBuffer(btree->index);
 
-		page = BufferGetPage(stack->buffer);
 		savedRightLink = GinPageGetOpaque(page)->rightlink;
 
-		START_CRIT_SECTION();
-		fit = btree->placeToPage(btree, stack->buffer, stack->off, &rdata);
-		if (fit)
+		/*
+		 * newlpage is a pointer to memory page, it is not associated with
+		 * a buffer. stack->buffer is not touched yet.
+		 */
+		newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off, &rdata);
+
+		((ginxlogSplit *) (rdata->data))->rootBlkno = rootBlkno;
+
+		/* During index build, count the newly-split page */
+		if (buildStats)
 		{
+			if (btree->isData)
+				buildStats->nDataPages++;
+			else
+				buildStats->nEntryPages++;
+		}
+
+		parent = stack->parent;
+
+		if (parent == NULL)
+		{
+			/*
+			 * split root, so we need to allocate new left page and place
+			 * pointer on root to left and right page
+			 */
+			Buffer		lbuffer = GinNewBuffer(btree->index);
+
+			((ginxlogSplit *) (rdata->data))->isRootSplit = TRUE;
+			((ginxlogSplit *) (rdata->data))->rrlink = InvalidBlockNumber;
+
+			lpage = BufferGetPage(lbuffer);
+			rpage = BufferGetPage(rbuffer);
+
+			GinPageGetOpaque(rpage)->rightlink = InvalidBlockNumber;
+			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
+			((ginxlogSplit *) (rdata->data))->lblkno = BufferGetBlockNumber(lbuffer);
+
+			START_CRIT_SECTION();
+
+			GinInitBuffer(stack->buffer, GinPageGetOpaque(newlpage)->flags & ~GIN_LEAF);
+			PageRestoreTempPage(newlpage, lpage);
+			btree->fillRoot(btree, stack->buffer, lbuffer, rbuffer);
+
+			MarkBufferDirty(rbuffer);
+			MarkBufferDirty(lbuffer);
 			MarkBufferDirty(stack->buffer);
 
 			if (RelationNeedsWAL(btree->index))
 			{
 				XLogRecPtr	recptr;
 
-				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT, rdata);
+				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
 				PageSetLSN(page, recptr);
+				PageSetLSN(lpage, recptr);
+				PageSetLSN(rpage, recptr);
 			}
 
-			LockBuffer(stack->buffer, GIN_UNLOCK);
-			END_CRIT_SECTION();
-
-			freeGinBtreeStack(stack);
-
-			return;
-		}
-		else
-		{
-			/* Didn't fit, have to split */
-			Buffer		rbuffer;
-			Page		newlpage;
-
+			UnlockReleaseBuffer(rbuffer);
+			UnlockReleaseBuffer(lbuffer);
 			END_CRIT_SECTION();
 
-			rbuffer = GinNewBuffer(btree->index);
-
-			/*
-			 * newlpage is a pointer to memory page, it is not associated with
-			 * a buffer. stack->buffer is not touched yet.
-			 */
-			newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off, &rdata);
-
-			((ginxlogSplit *) (rdata->data))->rootBlkno = rootBlkno;
-
-			/* During index build, count the newly-split page */
+			/* During index build, count the newly-added root page */
 			if (buildStats)
 			{
 				if (btree->isData)
@@ -362,98 +397,83 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 					buildStats->nEntryPages++;
 			}
 
-			parent = stack->parent;
-
-			if (parent == NULL)
-			{
-				/*
-				 * split root, so we need to allocate new left page and place
-				 * pointer on root to left and right page
-				 */
-				Buffer		lbuffer = GinNewBuffer(btree->index);
-
-				((ginxlogSplit *) (rdata->data))->isRootSplit = TRUE;
-				((ginxlogSplit *) (rdata->data))->rrlink = InvalidBlockNumber;
-
-				page = BufferGetPage(stack->buffer);
-				lpage = BufferGetPage(lbuffer);
-				rpage = BufferGetPage(rbuffer);
-
-				GinPageGetOpaque(rpage)->rightlink = InvalidBlockNumber;
-				GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
-				((ginxlogSplit *) (rdata->data))->lblkno = BufferGetBlockNumber(lbuffer);
-
-				START_CRIT_SECTION();
-
-				GinInitBuffer(stack->buffer, GinPageGetOpaque(newlpage)->flags & ~GIN_LEAF);
-				PageRestoreTempPage(newlpage, lpage);
-				btree->fillRoot(btree, stack->buffer, lbuffer, rbuffer);
-
-				MarkBufferDirty(rbuffer);
-				MarkBufferDirty(lbuffer);
-				MarkBufferDirty(stack->buffer);
+			return true;
+		}
+		else
+		{
+			/* split non-root page */
+			((ginxlogSplit *) (rdata->data))->isRootSplit = FALSE;
+			((ginxlogSplit *) (rdata->data))->rrlink = savedRightLink;
 
-				if (RelationNeedsWAL(btree->index))
-				{
-					XLogRecPtr	recptr;
+			lpage = BufferGetPage(stack->buffer);
+			rpage = BufferGetPage(rbuffer);
 
-					recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
-					PageSetLSN(page, recptr);
-					PageSetLSN(lpage, recptr);
-					PageSetLSN(rpage, recptr);
-				}
+			GinPageGetOpaque(rpage)->rightlink = savedRightLink;
+			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
 
-				UnlockReleaseBuffer(rbuffer);
-				UnlockReleaseBuffer(lbuffer);
-				LockBuffer(stack->buffer, GIN_UNLOCK);
-				END_CRIT_SECTION();
+			START_CRIT_SECTION();
+			PageRestoreTempPage(newlpage, lpage);
 
-				freeGinBtreeStack(stack);
+			MarkBufferDirty(rbuffer);
+			MarkBufferDirty(stack->buffer);
 
-				/* During index build, count the newly-added root page */
-				if (buildStats)
-				{
-					if (btree->isData)
-						buildStats->nDataPages++;
-					else
-						buildStats->nEntryPages++;
-				}
+			if (RelationNeedsWAL(btree->index))
+			{
+				XLogRecPtr	recptr;
 
-				return;
+				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
+				PageSetLSN(lpage, recptr);
+				PageSetLSN(rpage, recptr);
 			}
-			else
-			{
-				/* split non-root page */
-				((ginxlogSplit *) (rdata->data))->isRootSplit = FALSE;
-				((ginxlogSplit *) (rdata->data))->rrlink = savedRightLink;
+			UnlockReleaseBuffer(rbuffer);
+			END_CRIT_SECTION();
 
-				lpage = BufferGetPage(stack->buffer);
-				rpage = BufferGetPage(rbuffer);
+			return false;
+		}
+	}
+}
 
-				GinPageGetOpaque(rpage)->rightlink = savedRightLink;
-				GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
+/*
+ * Insert value (stored in GinBtree) to tree described by stack
+ *
+ * During an index build, buildStats is non-null and the counters
+ * it contains are incremented as needed.
+ *
+ * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
+ */
+void
+ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
+{
+	GinBtreeStack *parent;
+	BlockNumber rootBlkno;
+	Page		page;
+
+	/* extract root BlockNumber from stack */
+	Assert(stack != NULL);
+	parent = stack;
+	while (parent->parent)
+		parent = parent->parent;
+	rootBlkno = parent->blkno;
+	Assert(BlockNumberIsValid(rootBlkno));
 
-				START_CRIT_SECTION();
-				PageRestoreTempPage(newlpage, lpage);
+	/* this loop crawls up the stack until the insertion is complete */
+	for (;;)
+	{
+		bool done;
 
-				MarkBufferDirty(rbuffer);
-				MarkBufferDirty(stack->buffer);
+		done = ginPlaceToPage(btree, rootBlkno, stack, buildStats);
 
-				if (RelationNeedsWAL(btree->index))
-				{
-					XLogRecPtr	recptr;
+		/* just to be extra sure we don't delete anything by accident... */
+		btree->isDelete = FALSE;
 
-					recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
-					PageSetLSN(lpage, recptr);
-					PageSetLSN(rpage, recptr);
-				}
-				UnlockReleaseBuffer(rbuffer);
-				END_CRIT_SECTION();
-			}
+		if (done)
+		{
+			LockBuffer(stack->buffer, GIN_UNLOCK);
+			freeGinBtreeStack(stack);
+			break;
 		}
 
 		btree->prepareDownlink(btree, stack->buffer);
-		btree->isDelete = FALSE;
 
 		/* search parent to lock */
 		LockBuffer(parent->buffer, GIN_EXCLUSIVE);
-- 
1.8.4.rc3

#2Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: Handling GIN incomplete splits

Hi,

Here is a review of the first three patches:
1) Further gin refactoring:
make check passes (core tests and contrib tests).
Code compiles without warnings.
Then... About the patch... Even if I got little experience with code of
gin, moving the flag for search mode out of btree, as well as removing the
logic of PostingTreeScan really makes the code lighter and easier to follow.
Just wondering, why not simplifying as well ginTraverseLock:ginbtree.c at
the same time to something similar to that?
if (!GinPageIsLeaf(page) || searchMode == TRUE)
return access;

/* we should relock our page */
LockBuffer(buffer, GIN_UNLOCK);
LockBuffer(buffer, GIN_EXCLUSIVE);

/* But root can become non-leaf during relock */
if (!GinPageIsLeaf(page))
{
/* restore old lock type (very rare) */
LockBuffer(buffer, GIN_UNLOCK);
LockBuffer(buffer, GIN_SHARE);
}
else
access = GIN_EXCLUSIVE;
return access;
Feel free to discard as I can imagine that changing such code would make
back-branch maintenance more difficult and it would increase conflicts with
patches currently in development.
2) Refactoring of internal gin btree (needs patch 1 applied first):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Yep, removing ginPageGetLinkItup makes sense. Just to be picky, I would
have put the arguments of GinFormInteriorTuple replacing ginPageGetLinkItup
in 3 separate lines just for lisibility.
In dataPrepareDownlink:gindatapage.c, depending on if lpage is a leaf page
or not, isn't it inconsistent with the older code not to use
GinDataPageGetItemPointer and GinDataPageGetPostingItem to set
btree->pitem.key.
In ginContinueSplit:ginxlog.c, could it be possible to remove this code? It
looks that its deletion has been forgotten:
/*

* elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u", split->rootBlkno,
* split->leftBlkno, split->rightBlkno);
*/
Except the doubt about dataPrepareDownlink (related to my lack of knowledge
of the code), patch looks good.
3) More refactoring (needs patches 1 and 2):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Perhaps this patch would have been easier to read with context diffs :) It
just moves code around so nothing to say.

Then, I have done a small test with all 3 patches applied. Test is done
with pg_trgm by uploading the book "Les Miserables":
=# CREATE TABLE les_miserables (num serial, line text);
CREATE TABLE
=# \copy les_miserables (line) FROM '~/Desktop/pg135.txt';
=# select count(*) from les_miserables;
count
-------
68116
(1 row)
=# CREATE INDEX les_miserables_idx ON les_miserables USING gin (line
gin_trgm_ops);
CREATE INDEX

And here is the result of this query (average of a couple of 5 runs):
=# explain analyse SELECT * FROM les_miserables where line ~~ '%Cosette%';
Vanilla server: 5.289 ms
With patch 1 only: 5.283 ms
With patches 1+2: 5.232 ms
With patches 1+2+3: 5.232 ms
Based on that there is no performance degradation.

I just began reading the 4th patch. As it is a bit more complex and needs
more testing, I'll provide feedback later.
Regards,

On Thu, Nov 14, 2013 at 1:49 AM, Heikki Linnakangas <hlinnakangas@vmware.com

wrote:

Here's another part of my crusade against xlog cleanup routines. This
series of patches gets rid of the gin_cleanup() function, which is
currently used to finish splits of GIN b-tree pages, if the system crashes
(or an error occurs) between splitting a page and inserting its downlink to
the parent.

The first three patches just move code around. IMHO they make the code
more readable, so they should be committed in any case. The meat is in the
fourth patch.

Thoughts, objections?

--
Michael

#3Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#2)
Re: Handling GIN incomplete splits

Here are some comments about the 4th patch.
1) Compiles without warnings, passes make check.
2) s/ginFinshSplit/ginFinishSplit
3) Server crashes when trying to create a gin index index creation (see
example of previous email with pg_trgm). Here is the backtrace of the crash:
* thread #1: tid = 0x15221, 0x00007fff8c59f866
libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread,
stop reason = signal SIGABRT
frame #0: 0x00007fff8c59f866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8e60735c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff91586bba libsystem_c.dylib`abort + 125
frame #3: 0x000000010e953ed9
postgres`ExceptionalCondition(conditionName=0x000000010ea31055,
errorType=0x000000010e9b5973, fileName=0x000000010ea2fd3d, lineNumber=1175)
+ 137 at assert.c:54
frame #4: 0x000000010e79073a
postgres`UnpinBuffer(buf=0x000000010f37f9c0, fixOwner='\0') + 282 at
bufmgr.c:1175
frame #5: 0x000000010e793465 postgres`ReleaseBuffer(buffer=3169) + 565
at bufmgr.c:2540
frame #6: 0x000000010e414e43
postgres`freeGinBtreeStack(stack=0x00007fa2138adcf8) + 67 at ginbtree.c:196
frame #7: 0x000000010e415330
postgres`ginInsertValue(btree=0x00007fff51807e80, stack=0x00007fa2138a6dd8,
insertdata=0x00007fff51807e70, buildStats=0x00007fff51809fa0) + 1216 at
ginbtree.c:728
frame #8: 0x000000010e404ebf
postgres`ginEntryInsert(ginstate=0x00007fff51807fe0, attnum=1, key=7828073,
category='\0', items=0x0000000117d0ab28, nitem=76,
buildStats=0x00007fff51809fa0) + 1055 at gininsert.c:218
frame #9: 0x000000010e405ad6
postgres`ginbuild(fcinfo=0x00007fff5180a050) + 1590 at gininsert.c:392
frame #10: 0x000000010e9609ba
postgres`OidFunctionCall3Coll(functionId=2738, collation=0,
arg1=4693605424, arg2=4693760992, arg3=140334089145912) + 186 at fmgr.c:1649
frame #11: 0x000000010e4f4b30
postgres`index_build(heapRelation=0x0000000117c2bc30,
indexRelation=0x0000000117c51be0, indexInfo=0x00007fa213888e38,
isprimary='\0', isreindex='\0') + 464 at index.c:1963
frame #12: 0x000000010e4f2f07
postgres`index_create(heapRelation=0x0000000117c2bc30,
indexRelationName=0x00007fa213888b30, indexRelationId=16445, relFileNode=0,
indexInfo=0x00007fa213888e38, indexColNames=0x00007fa2138892d8,
accessMethodObjectId=2742, tableSpaceId=0,
collationObjectId=0x00007fa213889330, classObjectId=0x00007fa213889350,
coloptions=0x00007fa213889370, reloptions=0, isprimary='\0',
isconstraint='\0', deferrable='\0', initdeferred='\0',
allow_system_table_mods='\0', skip_build='\0', concurrent='\0',
is_internal='\0') + 3591 at index.c:1082
frame #13: 0x000000010e5da885
postgres`DefineIndex(stmt=0x00007fa213888b90, indexRelationId=0,
is_alter_table='\0', check_rights='\x01', skip_build='\0', quiet='\0') +
4181 at indexcmds.c:595
frame #14: 0x000000010e7dd4a3
postgres`ProcessUtilitySlow(parsetree=0x00007fa21384b530,
queryString=0x00007fa21384a838, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 2931 at utility.c:1163
frame #15: 0x000000010e7dc4d7
postgres`standard_ProcessUtility(parsetree=0x00007fa21384b530,
queryString=0x00007fa21384a838, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 3511 at utility.c:873
frame #16: 0x000000010e7db719
postgres`ProcessUtility(parsetree=0x00007fa21384b530,
queryString=0x00007fa21384a838, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 185 at utility.c:352
frame #17: 0x000000010e7db0e5
postgres`PortalRunUtility(portal=0x00007fa213889638,
utilityStmt=0x00007fa21384b530, isTopLevel='\x01', dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 325 at pquery.c:1187
frame #18: 0x000000010e7da002
postgres`PortalRunMulti(portal=0x00007fa213889638, isTopLevel='\x01',
dest=0x00007fa21384b918, altdest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 514 at pquery.c:1318
frame #19: 0x000000010e7d95c4
postgres`PortalRun(portal=0x00007fa213889638, count=9223372036854775807,
isTopLevel='\x01', dest=0x00007fa21384b918, altdest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 964 at pquery.c:816
frame #20: 0x000000010e7d4d77
postgres`exec_simple_query(query_string=0x00007fa21384a838) + 1207 at
postgres.c:1048
frame #21: 0x000000010e7d3fc1 postgres`PostgresMain(argc=1,
argv=0x00007fa21301a3c0, dbname=0x00007fa21301a228,
username=0x00007fa21301a208) + 2753 at postgres.c:3992
frame #22: 0x000000010e75868c
postgres`BackendRun(port=0x00007fa212e00240) + 700 at postmaster.c:4085
frame #23: 0x000000010e757c81
postgres`BackendStartup(port=0x00007fa212e00240) + 433 at postmaster.c:3774
frame #24: 0x000000010e7544fe postgres`ServerLoop + 606 at
postmaster.c:1585
frame #25: 0x000000010e751d74 postgres`PostmasterMain(argc=3,
argv=0x00007fa212c041f0) + 5380 at postmaster.c:1240
frame #26: 0x000000010e6930fd postgres`main(argc=3,
argv=0x00007fa212c041f0) + 653 at main.c:196
Test has been done on OSX 10.9.

This is all I have for now... I will have a look at the code later, as this
patch changes quite a bit the internals of gin, it is going to take a
little bit of time.
Thanks,
--
Michael

#4Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Michael Paquier (#2)
Re: Handling GIN incomplete splits

On 19.11.2013 14:48, Michael Paquier wrote:

Here is a review of the first three patches:
1) Further gin refactoring:
make check passes (core tests and contrib tests).
Code compiles without warnings.

Committed.

Then... About the patch... Even if I got little experience with code of
gin, moving the flag for search mode out of btree, as well as removing the
logic of PostingTreeScan really makes the code lighter and easier to follow.
Just wondering, why not simplifying as well ginTraverseLock:ginbtree.c at
the same time to something similar to that?
if (!GinPageIsLeaf(page) || searchMode == TRUE)
return access;

/* we should relock our page */
LockBuffer(buffer, GIN_UNLOCK);
LockBuffer(buffer, GIN_EXCLUSIVE);

/* But root can become non-leaf during relock */
if (!GinPageIsLeaf(page))
{
/* restore old lock type (very rare) */
LockBuffer(buffer, GIN_UNLOCK);
LockBuffer(buffer, GIN_SHARE);
}
else
access = GIN_EXCLUSIVE;
return access;
Feel free to discard as I can imagine that changing such code would make
back-branch maintenance more difficult and it would increase conflicts with
patches currently in development.

Yeah, might be more readable to write it that way. There's a lot of
cleanup that could be done to the gin code, these patches are by no
means the end of it.

2) Refactoring of internal gin btree (needs patch 1 applied first):
make check passes (core tests and contrib tests).
Code compiles without warnings.

Committed.

Yep, removing ginPageGetLinkItup makes sense. Just to be picky, I would
have put the arguments of GinFormInteriorTuple replacing ginPageGetLinkItup
in 3 separate lines just for lisibility.

Ok, did that.

In dataPrepareDownlink:gindatapage.c, depending on if lpage is a leaf page
or not, isn't it inconsistent with the older code not to use
GinDataPageGetItemPointer and GinDataPageGetPostingItem to set
btree->pitem.key.

Hmm. The old code in dataSplitPage() didn't use
GinDataPageGetItemPointer/PostingItem either.

The corresponding code in ginContinueSplit did, though. There was
actually an inconsistency there: the ginContinueSplit function took the
downlink's key from the last item on the page (using maxoff), while
dataSplitPage took it from the right bound using
GinDataPageGetRightBound(). Both are the same, dataSplitPage copies the
value from the last item to the right bound, so it doesn't make a
difference. They would diverge if the last item on the page is deleted,
though, so the old coding in ginContinueSplit was actually a bit suspicious.

In ginContinueSplit:ginxlog.c, could it be possible to remove this code? It
looks that its deletion has been forgotten:
/*

* elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u", split->rootBlkno,
* split->leftBlkno, split->rightBlkno);
*/

Yeah, that's just leftover debug code. But again, I'll leave that for
another patch (in fact, the whole function will go away with the fourth
patch, anyway).

3) More refactoring (needs patches 1 and 2):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Perhaps this patch would have been easier to read with context diffs :) It
just moves code around so nothing to say.

Committed.

Thanks for the review! I'll let you finish the review of the fourth
patch. Meanwhile, I'll take another look at Alexander's gin packed
posting items patch, and see how badly these commits bitrotted it.
- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Michael Paquier (#3)
2 attachment(s)
Re: Handling GIN incomplete splits

On 20.11.2013 15:27, Michael Paquier wrote:

Here are some comments about the 4th patch.
1) Compiles without warnings, passes make check.
2) s/ginFinshSplit/ginFinishSplit
3) Server crashes when trying to create a gin index index creation (see
example of previous email with pg_trgm).

Thanks, fixed!

Here's a new version. To ease the review, I split the remaining patch
again into two, where the first patch is just yet more refactoring.

I also fixed some bugs in WAL logging and replay that I bumped into
while testing.

- Heikki

Attachments:

0001-More-GIN-refactoring.patchtext/x-diff; name=0001-More-GIN-refactoring.patchDownload
>From 040ebdd9dd7025c4e0fdccef05975848da3d2ee0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Nov 2013 23:11:23 +0200
Subject: [PATCH 1/2] More GIN refactoring.

Separate the insertion payload from the more static portions of GinBtree in
the gin b-tree functions.

Add root block number to GinBtree, instead of passing it around all the
functions as argument.

Split off ginFinishSplit() from ginInsertValue(), which is responsible for
finding the parent and inserting the downlink to it.
---
 src/backend/access/gin/ginbtree.c     | 157 +++++++++++++++++++++------------
 src/backend/access/gin/gindatapage.c  | 159 ++++++++++++++++++----------------
 src/backend/access/gin/ginentrypage.c |  81 +++++++++--------
 src/backend/access/gin/ginget.c       |   2 +-
 src/backend/access/gin/gininsert.c    |  11 ++-
 src/backend/access/gin/ginxlog.c      |  22 ++---
 src/include/access/gin_private.h      |  39 ++++++---
 7 files changed, 272 insertions(+), 199 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 7248b06..3c352e9 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -63,13 +63,13 @@ ginTraverseLock(Buffer buffer, bool searchMode)
  * is share-locked, and stack->parent is NULL.
  */
 GinBtreeStack *
-ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode)
+ginFindLeafPage(GinBtree btree, bool searchMode)
 {
 	GinBtreeStack *stack;
 
 	stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
-	stack->blkno = rootBlkno;
-	stack->buffer = ReadBuffer(btree->index, rootBlkno);
+	stack->blkno = btree->rootBlkno;
+	stack->buffer = ReadBuffer(btree->index, btree->rootBlkno);
 	stack->parent = NULL;
 	stack->predictNumber = 1;
 
@@ -89,7 +89,7 @@ ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode)
 		 * ok, page is correctly locked, we should check to move right ..,
 		 * root never has a right link, so small optimization
 		 */
-		while (btree->fullScan == FALSE && stack->blkno != rootBlkno &&
+		while (btree->fullScan == FALSE && stack->blkno != btree->rootBlkno &&
 			   btree->isMoveRight(btree, page))
 		{
 			BlockNumber rightlink = GinPageGetOpaque(page)->rightlink;
@@ -189,8 +189,7 @@ freeGinBtreeStack(GinBtreeStack *stack)
  * with vacuum process
  */
 void
-ginFindParents(GinBtree btree, GinBtreeStack *stack,
-			   BlockNumber rootBlkno)
+ginFindParents(GinBtree btree, GinBtreeStack *stack)
 {
 	Page		page;
 	Buffer		buffer;
@@ -204,8 +203,8 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 	{
 		/* XLog mode... */
 		root = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
-		root->blkno = rootBlkno;
-		root->buffer = ReadBuffer(btree->index, rootBlkno);
+		root->blkno = btree->rootBlkno;
+		root->buffer = ReadBuffer(btree->index, btree->rootBlkno);
 		LockBuffer(root->buffer, GIN_EXCLUSIVE);
 		root->parent = NULL;
 	}
@@ -221,8 +220,8 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 			root = root->parent;
 		}
 
-		Assert(root->blkno == rootBlkno);
-		Assert(BufferGetBlockNumber(root->buffer) == rootBlkno);
+		Assert(root->blkno == btree->rootBlkno);
+		Assert(BufferGetBlockNumber(root->buffer) == btree->rootBlkno);
 		LockBuffer(root->buffer, GIN_EXCLUSIVE);
 	}
 	root->off = InvalidOffsetNumber;
@@ -268,7 +267,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 			ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
 			ptr->blkno = blkno;
 			ptr->buffer = buffer;
-			ptr->parent = root; /* it's may be wrong, but in next call we will
+			ptr->parent = root; /* it may be wrong, but in next call we will
 								 * correct */
 			ptr->off = offset;
 			stack->parent = ptr;
@@ -280,21 +279,35 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 }
 
 /*
- * Returns true if the insertion is done, false if the page was split and
- * downlink insertion is pending.
+ * Insert a new item to a page.
+ *
+ * Returns true if the insertion was finished. On false, the page was split and
+ * the parent needs to be updated. (a root update returns true as it doesn't
+ * need any further action by the caller to complete)
+ *
+ * When inserting a downlink to a internal page, the existing item at the
+ * given location is updated to point to 'updateblkno'.
  *
  * stack->buffer is locked on entry, and is kept locked.
  */
 static bool
-ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
+ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
+			   void *insertdata, BlockNumber updateblkno,
 			   GinStatsData *buildStats)
 {
 	Page		page = BufferGetPage(stack->buffer);
 	XLogRecData *rdata;
 	bool		fit;
 
+	/*
+	 * Try to put the incoming tuple on the page. If it doesn't fit,
+	 * placeToPage method will return false and leave the page unmodified,
+	 * and we'll have to split the page.
+	 */
 	START_CRIT_SECTION();
-	fit = btree->placeToPage(btree, stack->buffer, stack->off, &rdata);
+	fit = btree->placeToPage(btree, stack->buffer, stack->off,
+							 insertdata, updateblkno,
+							 &rdata);
 	if (fit)
 	{
 		MarkBufferDirty(stack->buffer);
@@ -324,6 +337,14 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 		END_CRIT_SECTION();
 
 		rbuffer = GinNewBuffer(btree->index);
+		/* During index build, count the new page */
+		if (buildStats)
+		{
+			if (btree->isData)
+				buildStats->nDataPages++;
+			else
+				buildStats->nEntryPages++;
+		}
 
 		savedRightLink = GinPageGetOpaque(page)->rightlink;
 
@@ -331,18 +352,11 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 		 * newlpage is a pointer to memory page, it is not associated with
 		 * a buffer. stack->buffer is not touched yet.
 		 */
-		newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off, &rdata);
+		newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off,
+									insertdata, updateblkno,
+									&rdata);
 
-		((ginxlogSplit *) (rdata->data))->rootBlkno = rootBlkno;
-
-		/* During index build, count the newly-split page */
-		if (buildStats)
-		{
-			if (btree->isData)
-				buildStats->nDataPages++;
-			else
-				buildStats->nEntryPages++;
-		}
+		((ginxlogSplit *) (rdata->data))->rootBlkno = btree->rootBlkno;
 
 		parent = stack->parent;
 
@@ -354,6 +368,15 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 			 */
 			Buffer		lbuffer = GinNewBuffer(btree->index);
 
+			/* During index build, count the new page */
+			if (buildStats)
+			{
+				if (btree->isData)
+					buildStats->nDataPages++;
+				else
+					buildStats->nEntryPages++;
+			}
+
 			((ginxlogSplit *) (rdata->data))->isRootSplit = TRUE;
 			((ginxlogSplit *) (rdata->data))->rrlink = InvalidBlockNumber;
 
@@ -434,46 +457,27 @@ ginPlaceToPage(GinBtree btree, BlockNumber rootBlkno, GinBtreeStack *stack,
 }
 
 /*
- * Insert value (stored in GinBtree) to tree described by stack
+ * Finish a split by inserting the downlink for the new page to parent.
  *
- * During an index build, buildStats is non-null and the counters
- * it contains are incremented as needed.
+ * On entry, stack->buffer is exclusively locked.
  *
  * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
  */
 void
-ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
+ginFinishSplit(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 {
-	GinBtreeStack *parent;
-	BlockNumber rootBlkno;
 	Page		page;
-
-	/* extract root BlockNumber from stack */
-	Assert(stack != NULL);
-	parent = stack;
-	while (parent->parent)
-		parent = parent->parent;
-	rootBlkno = parent->blkno;
-	Assert(BlockNumberIsValid(rootBlkno));
+	bool		done;
 
 	/* this loop crawls up the stack until the insertion is complete */
-	for (;;)
+	do
 	{
-		bool done;
-
-		done = ginPlaceToPage(btree, rootBlkno, stack, buildStats);
-
-		/* just to be extra sure we don't delete anything by accident... */
-		btree->isDelete = FALSE;
-
-		if (done)
-		{
-			LockBuffer(stack->buffer, GIN_UNLOCK);
-			freeGinBtreeStack(stack);
-			break;
-		}
+		GinBtreeStack *parent = stack->parent;
+		void	   *insertdata;
+		BlockNumber updateblkno;
 
-		btree->prepareDownlink(btree, stack->buffer);
+		insertdata = btree->prepareDownlink(btree, stack->buffer);
+		updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer))->rightlink;
 
 		/* search parent to lock */
 		LockBuffer(parent->buffer, GIN_EXCLUSIVE);
@@ -491,7 +495,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 				 * plain search...
 				 */
 				LockBuffer(parent->buffer, GIN_UNLOCK);
-				ginFindParents(btree, stack, rootBlkno);
+				ginFindParents(btree, stack);
 				parent = stack->parent;
 				Assert(parent != NULL);
 				break;
@@ -502,8 +506,49 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 			page = BufferGetPage(parent->buffer);
 		}
 
+		/* release the child */
 		UnlockReleaseBuffer(stack->buffer);
 		pfree(stack);
 		stack = parent;
+
+		/* insert the downlink to parent */
+		done = ginPlaceToPage(btree, stack,
+							  insertdata, updateblkno,
+							  buildStats);
+		pfree(insertdata);
+	} while (!done);
+	LockBuffer(stack->buffer, GIN_UNLOCK);
+
+	/* free the rest of the stack */
+	freeGinBtreeStack(stack);
+}
+
+/*
+ * Insert a value to tree described by stack.
+ *
+ * The value to be inserted is given in 'insertdata'. Its format depends
+ * on whether this is an entry or data tree, ginInsertValue just passes it
+ * through to the tree-specific callback function.
+ *
+ * During an index build, buildStats is non-null and the counters
+ * it contains are incremented as needed.
+ *
+ * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
+ */
+void
+ginInsertValue(GinBtree btree, GinBtreeStack *stack, void *insertdata,
+			   GinStatsData *buildStats)
+{
+	bool		done;
+
+	done = ginPlaceToPage(btree, stack,
+						  insertdata, InvalidBlockNumber,
+						  buildStats);
+	if (done)
+	{
+		LockBuffer(stack->buffer, GIN_UNLOCK);
+		freeGinBtreeStack(stack);
 	}
+	else
+		ginFinishSplit(btree, stack, buildStats);
 }
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index fbdde1d..647d383 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -30,7 +30,7 @@ dataIsMoveRight(GinBtree btree, Page page)
 	if (GinPageRightMost(page))
 		return FALSE;
 
-	return (ginCompareItemPointers(btree->items + btree->curitem, iptr) > 0) ? TRUE : FALSE;
+	return (ginCompareItemPointers(&btree->itemptr, iptr) > 0) ? TRUE : FALSE;
 }
 
 /*
@@ -80,7 +80,7 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
 		else
 		{
 			pitem = GinDataPageGetPostingItem(page, mid);
-			result = ginCompareItemPointers(btree->items + btree->curitem, &(pitem->key));
+			result = ginCompareItemPointers(&btree->itemptr, &(pitem->key));
 		}
 
 		if (result == 0)
@@ -138,7 +138,7 @@ dataLocateLeafItem(GinBtree btree, GinBtreeStack *stack)
 	{
 		OffsetNumber mid = low + ((high - low) / 2);
 
-		result = ginCompareItemPointers(btree->items + btree->curitem,
+		result = ginCompareItemPointers(&btree->itemptr,
 										GinDataPageGetItemPointer(page, mid));
 
 		if (result == 0)
@@ -298,18 +298,18 @@ GinPageDeletePostingItem(Page page, OffsetNumber offset)
  * item pointer never deletes!
  */
 static bool
-dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
+dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off, void *insertdata)
 {
 	Page		page = BufferGetPage(buf);
 
 	Assert(GinPageIsData(page));
-	Assert(!btree->isDelete);
 
 	if (GinPageIsLeaf(page))
 	{
+		GinBtreeDataLeafInsertData *items = insertdata;
 		if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
 		{
-			if ((btree->nitem - btree->curitem) * sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
+			if ((items->nitem - items->curitem) * sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
 				return true;
 		}
 		else if (sizeof(ItemPointerData) <= GinDataPageGetFreeSpace(page))
@@ -322,42 +322,21 @@ dataIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
 }
 
 /*
- * In case of previous split update old child blkno to
- * new right page
- * item pointer never deletes!
- */
-static BlockNumber
-dataPrepareData(GinBtree btree, Page page, OffsetNumber off)
-{
-	BlockNumber ret = InvalidBlockNumber;
-
-	Assert(GinPageIsData(page));
-
-	if (!GinPageIsLeaf(page) && btree->rightblkno != InvalidBlockNumber)
-	{
-		PostingItem *pitem = GinDataPageGetPostingItem(page, off);
-
-		PostingItemSetBlockNumber(pitem, btree->rightblkno);
-		ret = btree->rightblkno;
-	}
-
-	btree->rightblkno = InvalidBlockNumber;
-
-	return ret;
-}
-
-/*
  * Places keys to page and fills WAL record. In case leaf page and
  * build mode puts all ItemPointers to page.
  *
  * If none of the keys fit, returns false without modifying the page.
+ *
+ * On an insertion to an internal node, in addition to inserting the given
+ * item, the downlink of the existing item at 'off' is updated to point to
+ * 'updateblkno'.
  */
 static bool
 dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
+				void *insertdata, BlockNumber updateblkno,
 				XLogRecData **prdata)
 {
 	Page		page = BufferGetPage(buf);
-	int			sizeofitem = GinSizeOfDataPageItem(page);
 	int			cnt = 0;
 
 	/* these must be static so they can be returned to caller */
@@ -365,14 +344,20 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 	static ginxlogInsert data;
 
 	/* quick exit if it doesn't fit */
-	if (!dataIsEnoughSpace(btree, buf, off))
+	if (!dataIsEnoughSpace(btree, buf, off, insertdata))
 		return false;
 
 	*prdata = rdata;
 	Assert(GinPageIsData(page));
 
-	data.updateBlkno = dataPrepareData(btree, page, off);
+	/* Update existing downlink to point to next page (on internal page) */
+	if (!GinPageIsLeaf(page))
+	{
+		PostingItem *pitem = GinDataPageGetPostingItem(page, off);
+		PostingItemSetBlockNumber(pitem, updateblkno);
+	}
 
+	data.updateBlkno = updateblkno;
 	data.node = btree->index->rd_node;
 	data.blkno = BufferGetBlockNumber(buf);
 	data.offset = off;
@@ -405,34 +390,42 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 	cnt++;
 
 	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem));
-	rdata[cnt].len = sizeofitem;
+	/* data and len filled in below */
 	rdata[cnt].next = NULL;
 
 	if (GinPageIsLeaf(page))
 	{
+		GinBtreeDataLeafInsertData *items = insertdata;
+		uint32		savedPos = items->curitem;
+
 		if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
 		{
 			/* usually, create index... */
-			uint32		savedPos = btree->curitem;
-
-			while (btree->curitem < btree->nitem)
+			while (items->curitem < items->nitem)
 			{
-				GinDataPageAddItemPointer(page, btree->items + btree->curitem, off);
+				GinDataPageAddItemPointer(page, items->items + items->curitem, off);
 				off++;
-				btree->curitem++;
+				items->curitem++;
 			}
-			data.nitem = btree->curitem - savedPos;
-			rdata[cnt].len = sizeofitem * data.nitem;
+			data.nitem = items->curitem - savedPos;
 		}
 		else
 		{
-			GinDataPageAddItemPointer(page, btree->items + btree->curitem, off);
-			btree->curitem++;
+			GinDataPageAddItemPointer(page, items->items + items->curitem, off);
+			items->curitem++;
 		}
+
+		rdata[cnt].data = (char *) &items->items[savedPos];
+		rdata[cnt].len = sizeof(ItemPointerData) * data.nitem;
 	}
 	else
-		GinDataPageAddPostingItem(page, &(btree->pitem), off);
+	{
+		PostingItem *pitem = insertdata;
+		GinDataPageAddPostingItem(page, pitem, off);
+
+		rdata[cnt].data = (char *) pitem;
+		rdata[cnt].len = sizeof(PostingItem);
+	}
 
 	return true;
 }
@@ -444,7 +437,8 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
  * left page
  */
 static Page
-dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRecData **prdata)
+dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+			  void *insertdata, BlockNumber updateblkno, XLogRecData **prdata)
 {
 	char	   *ptr;
 	OffsetNumber separator;
@@ -457,7 +451,6 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 	Page		rpage = BufferGetPage(rbuf);
 	Size		pageSize = PageGetPageSize(lpage);
 	Size		freeSpace;
-	uint32		nCopied = 1;
 
 	/* these must be static so they can be returned to caller */
 	static ginxlogSplit data;
@@ -468,9 +461,13 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 	freeSpace = GinDataPageGetFreeSpace(rpage);
 
 	*prdata = rdata;
-	data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
-		InvalidOffsetNumber : PostingItemGetBlockNumber(&(btree->pitem));
-	data.updateBlkno = dataPrepareData(btree, lpage, off);
+
+	/* Update existing downlink to point to next page (on internal page) */
+	if (!isleaf)
+	{
+		PostingItem *pitem = GinDataPageGetPostingItem(lpage, off);
+		PostingItemSetBlockNumber(pitem, updateblkno);
+	}
 
 	if (isleaf)
 	{
@@ -487,16 +484,15 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 
 	if (isleaf && GinPageRightMost(lpage) && off > GinPageGetOpaque(lpage)->maxoff)
 	{
-		nCopied = 0;
-		while (btree->curitem < btree->nitem &&
+		GinBtreeDataLeafInsertData *items = insertdata;
+		while (items->curitem < items->nitem &&
 			   maxoff * sizeof(ItemPointerData) < 2 * (freeSpace - sizeof(ItemPointerData)))
 		{
 			memcpy(vector + maxoff * sizeof(ItemPointerData),
-				   btree->items + btree->curitem,
+				   items->items + items->curitem,
 				   sizeof(ItemPointerData));
 			maxoff++;
-			nCopied++;
-			btree->curitem++;
+			items->curitem++;
 		}
 	}
 	else
@@ -506,11 +502,15 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 			memmove(ptr + sizeofitem, ptr, (maxoff - off + 1) * sizeofitem);
 		if (isleaf)
 		{
-			memcpy(ptr, btree->items + btree->curitem, sizeofitem);
-			btree->curitem++;
+			GinBtreeDataLeafInsertData *items = insertdata;
+			memcpy(ptr, items->items + items->curitem, sizeofitem);
+			items->curitem++;
 		}
 		else
-			memcpy(ptr, &(btree->pitem), sizeofitem);
+		{
+			PostingItem *pitem = insertdata;
+			memcpy(ptr, pitem, sizeofitem);
+		}
 
 		maxoff++;
 	}
@@ -584,16 +584,18 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRe
 }
 
 /*
- * Prepare the state in 'btree' for inserting a downlink for given buffer.
+ * Construct insertion payload for inserting the downlink for given buffer.
  */
-static void
+static void *
 dataPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
+	PostingItem	*pitem = palloc(sizeof(PostingItem));
 	Page		lpage = BufferGetPage(lbuf);
 
-	PostingItemSetBlockNumber(&(btree->pitem), BufferGetBlockNumber(lbuf));
-	btree->pitem.key = *GinDataPageGetRightBound(lpage);
-	btree->rightblkno = GinPageGetOpaque(lpage)->rightlink;
+	PostingItemSetBlockNumber(pitem, BufferGetBlockNumber(lbuf));
+	pitem->key = *GinDataPageGetRightBound(lpage);
+
+	return pitem;
 }
 
 /*
@@ -698,11 +700,12 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
 }
 
 void
-ginPrepareDataScan(GinBtree btree, Relation index)
+ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno)
 {
 	memset(btree, 0, sizeof(GinBtreeData));
 
 	btree->index = index;
+	btree->rootBlkno = rootBlkno;
 
 	btree->findChildPage = dataLocateItem;
 	btree->getLeftMostChild = dataGetLeftMostPage;
@@ -715,7 +718,6 @@ ginPrepareDataScan(GinBtree btree, Relation index)
 	btree->prepareDownlink = dataPrepareDownlink;
 
 	btree->isData = TRUE;
-	btree->isDelete = FALSE;
 	btree->fullScan = FALSE;
 	btree->isBuild = FALSE;
 }
@@ -729,29 +731,32 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  GinStatsData *buildStats)
 {
 	GinBtreeData btree;
+	GinBtreeDataLeafInsertData insertdata;
 	GinBtreeStack *stack;
 
-	ginPrepareDataScan(&btree, index);
+	ginPrepareDataScan(&btree, index, rootBlkno);
 	btree.isBuild = (buildStats != NULL);
-	btree.items = items;
-	btree.nitem = nitem;
-	btree.curitem = 0;
+	insertdata.items = items;
+	insertdata.nitem = nitem;
+	insertdata.curitem = 0;
 
-	while (btree.curitem < btree.nitem)
+	while (insertdata.curitem < insertdata.nitem)
 	{
-		stack = ginFindLeafPage(&btree, rootBlkno, false);
+		/* search for the leaf page where the first item should go to */
+		btree.itemptr = insertdata.items[insertdata.curitem];
+		stack = ginFindLeafPage(&btree, false);
 
 		if (btree.findItem(&btree, stack))
 		{
 			/*
-			 * btree.items[btree.curitem] already exists in index
+			 * Current item already exists in index.
 			 */
-			btree.curitem++;
+			insertdata.curitem++;
 			LockBuffer(stack->buffer, GIN_UNLOCK);
 			freeGinBtreeStack(stack);
 		}
 		else
-			ginInsertValue(&btree, stack, buildStats);
+			ginInsertValue(&btree, stack, &insertdata, buildStats);
 	}
 }
 
@@ -764,11 +769,11 @@ ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno)
 	GinBtreeData btree;
 	GinBtreeStack *stack;
 
-	ginPrepareDataScan(&btree, index);
+	ginPrepareDataScan(&btree, index, rootBlkno);
 
 	btree.fullScan = TRUE;
 
-	stack = ginFindLeafPage(&btree, rootBlkno, TRUE);
+	stack = ginFindLeafPage(&btree, TRUE);
 
 	return stack;
 }
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index ec0114e..56cbf8d 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -431,22 +431,23 @@ entryGetLeftMostPage(GinBtree btree, Page page)
 }
 
 static bool
-entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
+entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
+				   GinBtreeEntryInsertData *entry)
 {
 	Size		itupsz = 0;
 	Page		page = BufferGetPage(buf);
 
-	Assert(btree->entry);
+	Assert(entry->entry);
 	Assert(!GinPageIsData(page));
 
-	if (btree->isDelete)
+	if (entry->isDelete)
 	{
 		IndexTuple	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, off));
 
 		itupsz = MAXALIGN(IndexTupleSize(itup)) + sizeof(ItemIdData);
 	}
 
-	if (PageGetFreeSpace(page) + itupsz >= MAXALIGN(IndexTupleSize(btree->entry)) + sizeof(ItemIdData))
+	if (PageGetFreeSpace(page) + itupsz >= MAXALIGN(IndexTupleSize(entry->entry)) + sizeof(ItemIdData))
 		return true;
 
 	return false;
@@ -457,42 +458,42 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off)
  * should update it, update old child blkno to new right page
  * if child split occurred
  */
-static BlockNumber
-entryPreparePage(GinBtree btree, Page page, OffsetNumber off)
+static void
+entryPreparePage(GinBtree btree, Page page, OffsetNumber off,
+				 GinBtreeEntryInsertData *entry, BlockNumber updateblkno)
 {
-	BlockNumber ret = InvalidBlockNumber;
-
-	Assert(btree->entry);
+	Assert(entry->entry);
 	Assert(!GinPageIsData(page));
 
-	if (btree->isDelete)
+	if (entry->isDelete)
 	{
 		Assert(GinPageIsLeaf(page));
 		PageIndexTupleDelete(page, off);
 	}
 
-	if (!GinPageIsLeaf(page) && btree->rightblkno != InvalidBlockNumber)
+	if (!GinPageIsLeaf(page) && updateblkno != InvalidBlockNumber)
 	{
 		IndexTuple	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, off));
 
-		GinSetDownlink(itup, btree->rightblkno);
-		ret = btree->rightblkno;
+		GinSetDownlink(itup, updateblkno);
 	}
-
-	btree->rightblkno = InvalidBlockNumber;
-
-	return ret;
 }
 
 /*
  * Place tuple on page and fills WAL record
  *
  * If the tuple doesn't fit, returns false without modifying the page.
+ *
+ * On an insertion to an internal node, in addition to inserting the given
+ * item, the downlink of the existing item at 'off' is updated to point to
+ * 'updateblkno'.
  */
 static bool
 entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
+				 void *insertdata, BlockNumber updateblkno,
 				 XLogRecData **prdata)
 {
+	GinBtreeEntryInsertData *entry = insertdata;
 	Page		page = BufferGetPage(buf);
 	OffsetNumber placed;
 	int			cnt = 0;
@@ -502,13 +503,14 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 	static ginxlogInsert data;
 
 	/* quick exit if it doesn't fit */
-	if (!entryIsEnoughSpace(btree, buf, off))
+	if (!entryIsEnoughSpace(btree, buf, off, entry))
 		return false;
 
 	*prdata = rdata;
-	data.updateBlkno = entryPreparePage(btree, page, off);
+	entryPreparePage(btree, page, off, entry, updateblkno);
+	data.updateBlkno = updateblkno;
 
-	placed = PageAddItem(page, (Item) btree->entry, IndexTupleSize(btree->entry), off, false, false);
+	placed = PageAddItem(page, (Item) entry->entry, IndexTupleSize(entry->entry), off, false, false);
 	if (placed != off)
 		elog(ERROR, "failed to add item to index page in \"%s\"",
 			 RelationGetRelationName(btree->index));
@@ -517,7 +519,7 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 	data.blkno = BufferGetBlockNumber(buf);
 	data.offset = off;
 	data.nitem = 1;
-	data.isDelete = btree->isDelete;
+	data.isDelete = entry->isDelete;
 	data.isData = false;
 	data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
 
@@ -545,12 +547,10 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 	cnt++;
 
 	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (char *) btree->entry;
-	rdata[cnt].len = IndexTupleSize(btree->entry);
+	rdata[cnt].data = (char *) entry->entry;
+	rdata[cnt].len = IndexTupleSize(entry->entry);
 	rdata[cnt].next = NULL;
 
-	btree->entry = NULL;
-
 	return true;
 }
 
@@ -561,8 +561,10 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
  * an equal number!
  */
 static Page
-entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogRecData **prdata)
+entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
+			   void *insertdata, BlockNumber updateblkno, XLogRecData **prdata)
 {
+	GinBtreeEntryInsertData *entry = insertdata;
 	OffsetNumber i,
 				maxoff,
 				separator = InvalidOffsetNumber;
@@ -583,8 +585,9 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 
 	*prdata = rdata;
 	data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
-		InvalidOffsetNumber : GinGetDownlink(btree->entry);
-	data.updateBlkno = entryPreparePage(btree, lpage, off);
+		InvalidOffsetNumber : GinGetDownlink(entry->entry);
+	data.updateBlkno = updateblkno;
+	entryPreparePage(btree, lpage, off, entry, updateblkno);
 
 	maxoff = PageGetMaxOffsetNumber(lpage);
 	ptr = tupstore;
@@ -593,8 +596,8 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 	{
 		if (i == off)
 		{
-			size = MAXALIGN(IndexTupleSize(btree->entry));
-			memcpy(ptr, btree->entry, size);
+			size = MAXALIGN(IndexTupleSize(entry->entry));
+			memcpy(ptr, entry->entry, size);
 			ptr += size;
 			totalsize += size + sizeof(ItemIdData);
 		}
@@ -608,8 +611,8 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 
 	if (off == maxoff + 1)
 	{
-		size = MAXALIGN(IndexTupleSize(btree->entry));
-		memcpy(ptr, btree->entry, size);
+		size = MAXALIGN(IndexTupleSize(entry->entry));
+		memcpy(ptr, entry->entry, size);
 		ptr += size;
 		totalsize += size + sizeof(ItemIdData);
 	}
@@ -667,20 +670,22 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off, XLogR
 }
 
 /*
- * Prepare the state in 'btree' for inserting a downlink for given buffer.
+ * Construct insertion payload for inserting the downlink for given buffer.
  */
-static void
+static void *
 entryPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
+	GinBtreeEntryInsertData *entry = palloc(sizeof(GinBtreeEntryInsertData));
 	Page		lpage = BufferGetPage(lbuf);
 	IndexTuple	itup;
 
 	itup = getRightMostTuple(lpage);
-
-	btree->entry = GinFormInteriorTuple(itup,
+	entry->entry = GinFormInteriorTuple(itup,
 										lpage,
 										BufferGetBlockNumber(lbuf));
-	btree->rightblkno = GinPageGetOpaque(lpage)->rightlink;
+	entry->isDelete = false;
+
+	return entry;
 }
 
 /*
@@ -724,6 +729,7 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 	memset(btree, 0, sizeof(GinBtreeData));
 
 	btree->index = ginstate->index;
+	btree->rootBlkno = GIN_ROOT_BLKNO;
 	btree->ginstate = ginstate;
 
 	btree->findChildPage = entryLocateEntry;
@@ -743,5 +749,4 @@ ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 	btree->entryAttnum = attnum;
 	btree->entryKey = key;
 	btree->entryCategory = category;
-	btree->isDelete = FALSE;
 }
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 2a3991f..f2ee2fb 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -374,7 +374,7 @@ restartScanEntry:
 	ginPrepareEntryScan(&btreeEntry, entry->attnum,
 						entry->queryKey, entry->queryCategory,
 						ginstate);
-	stackEntry = ginFindLeafPage(&btreeEntry, GIN_ROOT_BLKNO, true);
+	stackEntry = ginFindLeafPage(&btreeEntry, true);
 	page = BufferGetPage(stackEntry->buffer);
 	needUnlock = TRUE;
 
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 0a2883a..556e318 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -163,17 +163,20 @@ ginEntryInsert(GinState *ginstate,
 			   GinStatsData *buildStats)
 {
 	GinBtreeData btree;
+	GinBtreeEntryInsertData insertdata;
 	GinBtreeStack *stack;
 	IndexTuple	itup;
 	Page		page;
 
+	insertdata.isDelete = FALSE;
+
 	/* During index build, count the to-be-inserted entry */
 	if (buildStats)
 		buildStats->nEntries++;
 
 	ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
 
-	stack = ginFindLeafPage(&btree, GIN_ROOT_BLKNO, false);
+	stack = ginFindLeafPage(&btree, false);
 	page = BufferGetPage(stack->buffer);
 
 	if (btree.findItem(&btree, stack))
@@ -201,7 +204,7 @@ ginEntryInsert(GinState *ginstate,
 		itup = addItemPointersToLeafTuple(ginstate, itup,
 										  items, nitem, buildStats);
 
-		btree.isDelete = TRUE;
+		insertdata.isDelete = TRUE;
 	}
 	else
 	{
@@ -211,8 +214,8 @@ ginEntryInsert(GinState *ginstate,
 	}
 
 	/* Insert the new or modified leaf tuple */
-	btree.entry = itup;
-	ginInsertValue(&btree, stack, buildStats);
+	insertdata.entry = itup;
+	ginInsertValue(&btree, stack, &insertdata, buildStats);
 	pfree(itup);
 }
 
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index ddac343..ca7bee1 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -774,7 +774,7 @@ ginContinueSplit(ginIncompleteSplit *split)
 	GinState	ginstate;
 	Relation	reln;
 	Buffer		buffer;
-	GinBtreeStack stack;
+	GinBtreeStack *stack;
 
 	/*
 	 * elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u",  split->rootBlkno,
@@ -802,22 +802,22 @@ ginContinueSplit(ginIncompleteSplit *split)
 	}
 	else
 	{
-		ginPrepareDataScan(&btree, reln);
+		ginPrepareDataScan(&btree, reln, split->rootBlkno);
 	}
 
-	stack.blkno = split->leftBlkno;
-	stack.buffer = buffer;
-	stack.off = InvalidOffsetNumber;
-	stack.parent = NULL;
+	stack = palloc(sizeof(GinBtreeStack));
+	stack->blkno = split->leftBlkno;
+	stack->buffer = buffer;
+	stack->off = InvalidOffsetNumber;
+	stack->parent = NULL;
 
-	ginFindParents(&btree, &stack, split->rootBlkno);
+	ginFindParents(&btree, stack);
+	LockBuffer(stack->parent->buffer, GIN_UNLOCK);
+	ginFinishSplit(&btree, stack, NULL);
 
-	btree.prepareDownlink(&btree, buffer);
-	ginInsertValue(&btree, stack.parent, NULL);
+	/* buffer is released by ginFinishSplit */
 
 	FreeFakeRelcacheEntry(reln);
-
-	UnlockReleaseBuffer(buffer);
 }
 
 void
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3952935..3aa92b3 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -485,41 +485,56 @@ typedef struct GinBtreeData
 
 	/* insert methods */
 	OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
-	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, XLogRecData **);
-	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, XLogRecData **);
-	void		(*prepareDownlink) (GinBtree, Buffer);
+	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, void *, BlockNumber, XLogRecData **);
+	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, void *, BlockNumber, XLogRecData **);
+	void		*(*prepareDownlink) (GinBtree, Buffer);
 	void		(*fillRoot) (GinBtree, Buffer, Buffer, Buffer);
 
 	bool		isData;
 
 	Relation	index;
+	BlockNumber	rootBlkno;
 	GinState   *ginstate;		/* not valid in a data scan */
 	bool		fullScan;
 	bool		isBuild;
 
-	BlockNumber rightblkno;
-
-	/* Entry options */
+	/* Search key for Entry tree */
 	OffsetNumber entryAttnum;
 	Datum		entryKey;
 	GinNullCategory entryCategory;
+
+	/* Search key for data tree (posting tree) */
+	ItemPointerData itemptr;
+} GinBtreeData;
+
+/* This represents a tuple to be inserted to entry tree. */
+typedef struct
+{
 	IndexTuple	entry;
 	bool		isDelete;
+} GinBtreeEntryInsertData;
 
-	/* Data (posting tree) options */
+/*
+ * This represents an itempointer, or many itempointers, to be inserted to
+ * a data (posting tree) leaf page
+ */
+typedef struct
+{
 	ItemPointerData *items;
 	uint32		nitem;
 	uint32		curitem;
+} GinBtreeDataLeafInsertData;
 
-	PostingItem pitem;
-} GinBtreeData;
+/* For internal data (posting tree) pages, use PostingItem */
 
-extern GinBtreeStack *ginFindLeafPage(GinBtree btree, BlockNumber rootBlkno, bool searchMode);
+extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode);
 extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
+			   void *insertdata, GinStatsData *buildStats);
+extern void ginFindParents(GinBtree btree, GinBtreeStack *stack);
+extern void ginFinishSplit(GinBtree btree, GinBtreeStack *stack,
 			   GinStatsData *buildStats);
-extern void ginFindParents(GinBtree btree, GinBtreeStack *stack, BlockNumber rootBlkno);
 
 /* ginentrypage.c */
 extern IndexTuple GinFormTuple(GinState *ginstate,
@@ -543,7 +558,7 @@ extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  GinStatsData *buildStats);
 extern GinBtreeStack *ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno);
 extern void ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
-extern void ginPrepareDataScan(GinBtree btree, Relation index);
+extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
 
 /* ginscan.c */
 
-- 
1.8.4.2

0002-Get-rid-of-the-post-recovery-cleanup-step-of-GIN-pag.patchtext/x-diff; name=0002-Get-rid-of-the-post-recovery-cleanup-step-of-GIN-pag.patchDownload
>From 40e787b79e0bda24280a2bb642e02b0b6f047645 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 7 Nov 2013 19:38:01 +0200
Subject: [PATCH 2/2] Get rid of the post-recovery cleanup step of GIN page
 splits.

Replace it with an approach similar to what GiST uses: when a page is split,
the left sibling is marked with a flag indicating that the parent hasn't been
updated yet. When the parent is updated, the flag is cleared. If an insertion
steps on a page with the flag set, it will finish split before proceeding
with the insertion.

The post-recovery cleanup mechanism was never totally reliable, as insertion
to the parent could fail e.g because of running out of memory or disk space,
leaving the tree in an inconsistent state.

This also divides the responsibility of WAL-logging more clearly between
the generic ginbtree.c code, and the parts specific to entry and posting
trees. There is now a common WAL record format for insertions and deletions,
which is written by ginbtree.c, followed by tree-specific payload, which is
returned by the placetopage- and split- callbacks.
---
 src/backend/access/gin/ginbtree.c     | 456 ++++++++++++++++++++++----------
 src/backend/access/gin/gindatapage.c  |  91 ++-----
 src/backend/access/gin/ginentrypage.c |  74 ++----
 src/backend/access/gin/ginxlog.c      | 479 +++++++++++++++-------------------
 src/backend/access/rmgrdesc/gindesc.c |  45 +++-
 src/include/access/gin.h              |   1 -
 src/include/access/gin_private.h      |  81 ++++--
 src/include/access/rmgrlist.h         |   2 +-
 8 files changed, 672 insertions(+), 557 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 3c352e9..3691a57 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -18,8 +18,15 @@
 #include "miscadmin.h"
 #include "utils/rel.h"
 
+static void ginFindParents(GinBtree btree, GinBtreeStack *stack);
+static bool ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
+			   void *insertdata, BlockNumber updateblkno,
+			   Buffer childbuf, GinStatsData *buildStats);
+static void ginFinishSplit(GinBtree btree, GinBtreeStack *stack,
+			   bool freestack, GinStatsData *buildStats);
+
 /*
- * Locks buffer by needed method for search.
+ * Lock buffer by needed method for search.
  */
 static int
 ginTraverseLock(Buffer buffer, bool searchMode)
@@ -53,7 +60,7 @@ ginTraverseLock(Buffer buffer, bool searchMode)
 }
 
 /*
- * Descends the tree to the leaf page that contains or would contain the
+ * Descend the tree to the leaf page that contains, or would contain, the
  * key we're searching for. The key should already be filled in 'btree',
  * in tree-type specific manner. If btree->fullScan is true, descends to the
  * leftmost leaf page.
@@ -86,6 +93,13 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 		access = ginTraverseLock(stack->buffer, searchMode);
 
 		/*
+		 * If we're going to modify the tree, finish any incomplete splits we
+		 * encounter on the way.
+		 */
+		if (!searchMode && GinPageIsIncompleteSplit(page))
+			ginFinishSplit(btree, stack, false, NULL);
+
+		/*
 		 * ok, page is correctly locked, we should check to move right ..,
 		 * root never has a right link, so small optimization
 		 */
@@ -101,6 +115,9 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 			stack->buffer = ginStepRight(stack->buffer, btree->index, access);
 			stack->blkno = rightlink;
 			page = BufferGetPage(stack->buffer);
+
+			if (!searchMode && GinPageIsIncompleteSplit(page))
+				ginFinishSplit(btree, stack, false, NULL);
 		}
 
 		if (GinPageIsLeaf(page))	/* we found, return locked page */
@@ -188,7 +205,7 @@ freeGinBtreeStack(GinBtreeStack *stack)
  * Function should never release root page to prevent conflicts
  * with vacuum process
  */
-void
+static void
 ginFindParents(GinBtree btree, GinBtreeStack *stack)
 {
 	Page		page;
@@ -196,58 +213,55 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
 	BlockNumber blkno,
 				leftmostBlkno;
 	OffsetNumber offset;
-	GinBtreeStack *root = stack->parent;
+	GinBtreeStack *root;
 	GinBtreeStack *ptr;
 
-	if (!root)
+	/*
+	 * Unwind the stack all the way up to the root, leaving only the root
+	 * item.
+	 *
+	 * Be careful not to release the pin on the root page! The pin on root
+	 * page is required to lock out concurrent vacuums on the tree.
+	 */
+	root = stack->parent;
+	while (root->parent)
 	{
-		/* XLog mode... */
-		root = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
-		root->blkno = btree->rootBlkno;
-		root->buffer = ReadBuffer(btree->index, btree->rootBlkno);
-		LockBuffer(root->buffer, GIN_EXCLUSIVE);
-		root->parent = NULL;
+		ReleaseBuffer(root->buffer);
+		root = root->parent;
 	}
-	else
-	{
-		/*
-		 * find root, we should not release root page until update is
-		 * finished!!
-		 */
-		while (root->parent)
-		{
-			ReleaseBuffer(root->buffer);
-			root = root->parent;
-		}
 
-		Assert(root->blkno == btree->rootBlkno);
-		Assert(BufferGetBlockNumber(root->buffer) == btree->rootBlkno);
-		LockBuffer(root->buffer, GIN_EXCLUSIVE);
-	}
+	Assert(root->blkno == btree->rootBlkno);
+	Assert(BufferGetBlockNumber(root->buffer) == btree->rootBlkno);
 	root->off = InvalidOffsetNumber;
 
-	page = BufferGetPage(root->buffer);
-	Assert(!GinPageIsLeaf(page));
-
-	/* check trivial case */
-	if ((root->off = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) != InvalidOffsetNumber)
-	{
-		stack->parent = root;
-		return;
-	}
+	blkno = root->blkno;
+	buffer = root->buffer;
+	offset = InvalidOffsetNumber;
 
-	leftmostBlkno = blkno = btree->getLeftMostChild(btree, page);
-	LockBuffer(root->buffer, GIN_UNLOCK);
-	Assert(blkno != InvalidBlockNumber);
+	ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
 
 	for (;;)
 	{
-		buffer = ReadBuffer(btree->index, blkno);
 		LockBuffer(buffer, GIN_EXCLUSIVE);
 		page = BufferGetPage(buffer);
 		if (GinPageIsLeaf(page))
 			elog(ERROR, "Lost path");
 
+		if (GinPageIsIncompleteSplit(page))
+		{
+			Assert(blkno != btree->rootBlkno);
+			ptr->blkno = blkno;
+			ptr->buffer = buffer;
+			/*
+			 * parent may be wrong, but if so, the ginFinishSplit call will
+			 * recurse to call ginFindParents again to fix it.
+			 */
+			ptr->parent = root;
+			ptr->off = InvalidOffsetNumber;
+
+			ginFinishSplit(btree, ptr, false, NULL);
+		}
+
 		leftmostBlkno = btree->getLeftMostChild(btree, page);
 
 		while ((offset = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) == InvalidOffsetNumber)
@@ -260,11 +274,22 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
 			}
 			buffer = ginStepRight(buffer, btree->index, GIN_EXCLUSIVE);
 			page = BufferGetPage(buffer);
+
+			/* finish any incomplete splits, as above */
+			if (GinPageIsIncompleteSplit(page))
+			{
+				Assert(blkno != btree->rootBlkno);
+				ptr->blkno = blkno;
+				ptr->buffer = buffer;
+				ptr->parent = root;
+				ptr->off = InvalidOffsetNumber;
+
+				ginFinishSplit(btree, ptr, false, NULL);
+			}
 		}
 
 		if (blkno != InvalidBlockNumber)
 		{
-			ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
 			ptr->blkno = blkno;
 			ptr->buffer = buffer;
 			ptr->parent = root; /* it may be wrong, but in next call we will
@@ -274,7 +299,9 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
 			return;
 		}
 
+		/* Descend down to next level */
 		blkno = leftmostBlkno;
+		buffer = ReadBuffer(btree->index, blkno);
 	}
 }
 
@@ -285,19 +312,38 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
  * the parent needs to be updated. (a root update returns true as it doesn't
  * need any further action by the caller to complete)
  *
- * When inserting a downlink to a internal page, the existing item at the
- * given location is updated to point to 'updateblkno'.
+ * When inserting a downlink to a internal page, 'childbuf' contains the
+ * child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared
+ * atomically with the insert. Also, the existing item at the given location
+ * is updated to point to 'updateblkno'.
  *
  * stack->buffer is locked on entry, and is kept locked.
  */
 static bool
 ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 			   void *insertdata, BlockNumber updateblkno,
-			   GinStatsData *buildStats)
+			   Buffer childbuf, GinStatsData *buildStats)
 {
 	Page		page = BufferGetPage(stack->buffer);
-	XLogRecData *rdata;
+	XLogRecData *payloadrdata;
 	bool		fit;
+	uint16		xlflags = 0;
+	Page		childpage = NULL;
+
+	if (GinPageIsData(page))
+		xlflags |= GIN_INSERT_ISDATA;
+	if (GinPageIsLeaf(page))
+	{
+		xlflags |= GIN_INSERT_ISLEAF;
+		Assert(!BufferIsValid(childbuf));
+		Assert(updateblkno == InvalidBlockNumber);
+	}
+	else
+	{
+		Assert(BufferIsValid(childbuf));
+		Assert(updateblkno != InvalidBlockNumber);
+		childpage = BufferGetPage(childbuf);
+	}
 
 	/*
 	 * Try to put the incoming tuple on the page. If it doesn't fit,
@@ -307,17 +353,68 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 	START_CRIT_SECTION();
 	fit = btree->placeToPage(btree, stack->buffer, stack->off,
 							 insertdata, updateblkno,
-							 &rdata);
+							 &payloadrdata);
 	if (fit)
 	{
 		MarkBufferDirty(stack->buffer);
 
+		/* An insert to an internal page finishes the split of the child. */
+		if (childbuf != InvalidBuffer)
+		{
+			GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
+			MarkBufferDirty(childbuf);
+		}
+
 		if (RelationNeedsWAL(btree->index))
 		{
 			XLogRecPtr	recptr;
+			XLogRecData rdata[3];
+			ginxlogInsert xlrec;
+
+			xlrec.node = btree->index->rd_node;
+			xlrec.blkno = BufferGetBlockNumber(stack->buffer);
+			xlrec.offset = stack->off;
+			xlrec.flags = xlflags;
+
+			rdata[0].buffer = InvalidBuffer;
+			rdata[0].data = (char *) &xlrec;
+			rdata[0].len = sizeof(ginxlogInsert);
+
+			/*
+			 * Log information about child if this was an insertion of a
+			 * downlink.
+			 */
+			if (childbuf != InvalidBuffer)
+			{
+				struct
+				{
+					BlockNumber left;
+					BlockNumber right;
+				} children;
+
+				rdata[0].next = &rdata[1];
+
+				children.left = BufferGetBlockNumber(childbuf);
+				children.right = GinPageGetOpaque(childpage)->rightlink;
+
+				rdata[1].buffer = InvalidBuffer;
+				rdata[1].data = (char *) &children;
+				rdata[1].len = sizeof(BlockNumber) * 2;
+				rdata[1].next = &rdata[2];
+
+				rdata[2].buffer = childbuf;
+				rdata[2].buffer_std = false;
+				rdata[2].data = NULL;
+				rdata[2].len = 0;
+				rdata[2].next = payloadrdata;
+			}
+			else
+				rdata[0].next = payloadrdata;
 
 			recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT, rdata);
 			PageSetLSN(page, recptr);
+			if (childbuf != InvalidBuffer)
+				PageSetLSN(childpage, recptr);
 		}
 
 		END_CRIT_SECTION();
@@ -330,13 +427,16 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		Buffer		rbuffer;
 		Page		newlpage;
 		BlockNumber savedRightLink;
-		GinBtreeStack *parent;
-		Page		lpage,
-					rpage;
+		Page		rpage;
+		XLogRecData rdata[2];
+		ginxlogSplit data;
+		Buffer		lbuffer = InvalidBuffer;
+		Page		newrootpg = NULL;
 
 		END_CRIT_SECTION();
 
 		rbuffer = GinNewBuffer(btree->index);
+
 		/* During index build, count the new page */
 		if (buildStats)
 		{
@@ -354,21 +454,50 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		 */
 		newlpage = btree->splitPage(btree, stack->buffer, rbuffer, stack->off,
 									insertdata, updateblkno,
-									&rdata);
+									&payloadrdata);
 
-		((ginxlogSplit *) (rdata->data))->rootBlkno = btree->rootBlkno;
+		data.node = btree->index->rd_node;
+		data.rblkno = BufferGetBlockNumber(rbuffer);
+		data.flags = xlflags;
+		if (childbuf != InvalidBuffer)
+		{
+			Page childpage = BufferGetPage(childbuf);
+			GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
 
-		parent = stack->parent;
+			data.leftChildBlkno = BufferGetBlockNumber(childbuf);
+			data.rightChildBlkno = GinPageGetOpaque(childpage)->rightlink;
+		}
+		else
+			data.leftChildBlkno = data.rightChildBlkno = InvalidBlockNumber;
 
-		if (parent == NULL)
+		rdata[0].buffer = InvalidBuffer;
+		rdata[0].data = (char *) &data;
+		rdata[0].len = sizeof(ginxlogSplit);
+
+		if (childbuf != InvalidBuffer)
+		{
+			rdata[0].next = &rdata[1];
+
+			rdata[1].buffer = childbuf;
+			rdata[1].buffer_std = false;
+			rdata[1].data = NULL;
+			rdata[1].len = 0;
+			rdata[1].next = payloadrdata;
+		}
+		else
+			rdata[0].next = payloadrdata;
+
+		rpage = BufferGetPage(rbuffer);
+
+		if (stack->parent == NULL)
 		{
 			/*
 			 * split root, so we need to allocate new left page and place
 			 * pointer on root to left and right page
 			 */
-			Buffer		lbuffer = GinNewBuffer(btree->index);
+			lbuffer = GinNewBuffer(btree->index);
 
-			/* During index build, count the new page */
+			/* During index build, count the newly-added root page */
 			if (buildStats)
 			{
 				if (btree->isData)
@@ -377,82 +506,97 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 					buildStats->nEntryPages++;
 			}
 
-			((ginxlogSplit *) (rdata->data))->isRootSplit = TRUE;
-			((ginxlogSplit *) (rdata->data))->rrlink = InvalidBlockNumber;
-
-			lpage = BufferGetPage(lbuffer);
-			rpage = BufferGetPage(rbuffer);
+			/*
+			 * root never has a right-link, so we borrow the rrlink field to
+			 * store the root block number.
+			 */
+			data.rrlink = BufferGetBlockNumber(stack->buffer);
+			data.lblkno = BufferGetBlockNumber(lbuffer);
+			data.flags |= GIN_SPLIT_ROOT;
 
 			GinPageGetOpaque(rpage)->rightlink = InvalidBlockNumber;
 			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
-			((ginxlogSplit *) (rdata->data))->lblkno = BufferGetBlockNumber(lbuffer);
-
-			START_CRIT_SECTION();
 
-			GinInitBuffer(stack->buffer, GinPageGetOpaque(newlpage)->flags & ~GIN_LEAF);
-			PageRestoreTempPage(newlpage, lpage);
-			btree->fillRoot(btree, stack->buffer, lbuffer, rbuffer);
-
-			MarkBufferDirty(rbuffer);
-			MarkBufferDirty(lbuffer);
-			MarkBufferDirty(stack->buffer);
-
-			if (RelationNeedsWAL(btree->index))
-			{
-				XLogRecPtr	recptr;
-
-				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
-				PageSetLSN(page, recptr);
-				PageSetLSN(lpage, recptr);
-				PageSetLSN(rpage, recptr);
-			}
-
-			UnlockReleaseBuffer(rbuffer);
-			UnlockReleaseBuffer(lbuffer);
-			END_CRIT_SECTION();
-
-			/* During index build, count the newly-added root page */
-			if (buildStats)
-			{
-				if (btree->isData)
-					buildStats->nDataPages++;
-				else
-					buildStats->nEntryPages++;
-			}
+			/*
+			 * Construct a new root page containing downlinks to the new left
+			 * and right pages. (do this in a temporary copy first rather
+			 * than overwriting the original page directly, so that we can still
+			 * abort gracefully if this fails.)
+			 */
+			newrootpg = PageGetTempPage(rpage);
+			GinInitPage(newrootpg, GinPageGetOpaque(newlpage)->flags & ~GIN_LEAF, BLCKSZ);
 
-			return true;
+			btree->fillRoot(btree, newrootpg,
+							BufferGetBlockNumber(lbuffer), newlpage,
+							BufferGetBlockNumber(rbuffer), rpage);
 		}
 		else
 		{
 			/* split non-root page */
-			((ginxlogSplit *) (rdata->data))->isRootSplit = FALSE;
-			((ginxlogSplit *) (rdata->data))->rrlink = savedRightLink;
-
-			lpage = BufferGetPage(stack->buffer);
-			rpage = BufferGetPage(rbuffer);
+			data.rrlink = savedRightLink;
+			data.lblkno = BufferGetBlockNumber(stack->buffer);
 
 			GinPageGetOpaque(rpage)->rightlink = savedRightLink;
+			GinPageGetOpaque(newlpage)->flags |= GIN_INCOMPLETE_SPLIT;
 			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
+		}
 
-			START_CRIT_SECTION();
-			PageRestoreTempPage(newlpage, lpage);
+		/*
+		 * Ok, we have the new contents of the left page in a temporary copy
+		 * now (newlpage), and the newly-allocated right block has been filled
+		 * in. The original page is still unchanged.
+		 *
+		 * If this is a root split, we also have a temporary page containing
+		 * the new contents of the root. Copy the new left page to a
+		 * newly-allocated block, and initialize the (original) root page the
+		 * new copy. Otherwise, copy over the temporary copy of the new left
+		 * page over the old left page.
+		 */
 
-			MarkBufferDirty(rbuffer);
-			MarkBufferDirty(stack->buffer);
+		START_CRIT_SECTION();
 
-			if (RelationNeedsWAL(btree->index))
-			{
-				XLogRecPtr	recptr;
+		MarkBufferDirty(rbuffer);
 
-				recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
-				PageSetLSN(lpage, recptr);
-				PageSetLSN(rpage, recptr);
-			}
-			UnlockReleaseBuffer(rbuffer);
-			END_CRIT_SECTION();
+		if (stack->parent == NULL)
+		{
+			PageRestoreTempPage(newlpage, BufferGetPage(lbuffer));
+			MarkBufferDirty(lbuffer);
+			newlpage = newrootpg;
+		}
 
-			return false;
+		PageRestoreTempPage(newlpage, BufferGetPage(stack->buffer));
+		MarkBufferDirty(stack->buffer);
+
+		/* write WAL record */
+		if (RelationNeedsWAL(btree->index))
+		{
+			XLogRecPtr	recptr;
+
+			recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT, rdata);
+			PageSetLSN(BufferGetPage(stack->buffer), recptr);
+			PageSetLSN(rpage, recptr);
+			if (stack->parent == NULL)
+				PageSetLSN(BufferGetPage(lbuffer), recptr);
 		}
+		END_CRIT_SECTION();
+
+		/*
+		 * We can release the lock on the right page now, but keep the
+		 * original buffer locked.
+		 */
+		UnlockReleaseBuffer(rbuffer);
+		if (stack->parent == NULL)
+			UnlockReleaseBuffer(lbuffer);
+
+		/*
+		 * If we split the root, we're done. Otherwise the split is not
+		 * complete until the downlink for the new page has been inserted to
+		 * the parent.
+		 */
+		if (stack->parent == NULL)
+			return true;
+		else
+			return false;
 	}
 }
 
@@ -461,13 +605,27 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
  *
  * On entry, stack->buffer is exclusively locked.
  *
- * NB: the passed-in stack is freed, as though by freeGinBtreeStack.
+ * If freestack is true, all the buffers are released and unlocked as we
+ * crawl up the tree, and 'stack' is freed. Otherwise stack->buffer is kepy
+ * locked, and stack is unmodified, except for possibly moving right to find
+ * the correct parent of page.
  */
-void
-ginFinishSplit(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
+static void
+ginFinishSplit(GinBtree btree, GinBtreeStack *stack, bool freestack,
+			   GinStatsData *buildStats)
 {
 	Page		page;
 	bool		done;
+	bool		first = true;
+
+	/*
+	 * freestack == false when we encounter an incompletely split page during a
+	 * scan, while freestack == true is used in the normal scenario that a
+	 * split is finished right after the initial insert.
+	 */
+	if (!freestack)
+		elog(DEBUG1, "finishing incomplete split of block %u in gin index \"%s\"",
+			 stack->blkno, RelationGetRelationName(btree->index));
 
 	/* this loop crawls up the stack until the insertion is complete */
 	do
@@ -476,19 +634,26 @@ ginFinishSplit(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 		void	   *insertdata;
 		BlockNumber updateblkno;
 
-		insertdata = btree->prepareDownlink(btree, stack->buffer);
-		updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer))->rightlink;
-
 		/* search parent to lock */
 		LockBuffer(parent->buffer, GIN_EXCLUSIVE);
 
+		/*
+		 * If the parent page was incompletely split, finish that split first,
+		 * then continue with the current one.
+		 *
+		 * Note: we have to finish *all* incomplete splits we encounter, even
+		 * if we have to move right. Otherwise we might choose as the target
+		 * a page that has no downlink in the parent, and splitting it further
+		 * would fail.
+		 */
+		if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
+			ginFinishSplit(btree, parent, false, buildStats);
+
 		/* move right if it's needed */
 		page = BufferGetPage(parent->buffer);
 		while ((parent->off = btree->findChildPtr(btree, page, stack->blkno, parent->off)) == InvalidOffsetNumber)
 		{
-			BlockNumber rightlink = GinPageGetOpaque(page)->rightlink;
-
-			if (rightlink == InvalidBlockNumber)
+			if (GinPageRightMost(page))
 			{
 				/*
 				 * rightmost page, but we don't find parent, we should use
@@ -502,25 +667,44 @@ ginFinishSplit(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 			}
 
 			parent->buffer = ginStepRight(parent->buffer, btree->index, GIN_EXCLUSIVE);
-			parent->blkno = rightlink;
+			parent->blkno = BufferGetBlockNumber(parent->buffer);
 			page = BufferGetPage(parent->buffer);
-		}
 
-		/* release the child */
-		UnlockReleaseBuffer(stack->buffer);
-		pfree(stack);
-		stack = parent;
+			if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
+				ginFinishSplit(btree, parent, false, buildStats);
+		}
 
-		/* insert the downlink to parent */
-		done = ginPlaceToPage(btree, stack,
+		/* insert the downlink */
+		insertdata = btree->prepareDownlink(btree, stack->buffer);
+		updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer))->rightlink;
+		done = ginPlaceToPage(btree, parent,
 							  insertdata, updateblkno,
-							  buildStats);
+							  stack->buffer, buildStats);
 		pfree(insertdata);
+
+		/*
+		 * If the caller requested to free the stack, unlock and release the
+		 * child buffer now. Otherwise keep it pinned and locked, but if we
+		 * have to recurse up the tree, we can unlock the upper pages, only
+		 * keeping the page at the bottom of the stack locked.
+		 */
+		if (!first || freestack)
+			LockBuffer(stack->buffer, GIN_UNLOCK);
+		if (freestack)
+		{
+			ReleaseBuffer(stack->buffer);
+			pfree(stack);
+		}
+		stack = parent;
+
+		first = false;
 	} while (!done);
+
+	/* unlock the parent */
 	LockBuffer(stack->buffer, GIN_UNLOCK);
 
-	/* free the rest of the stack */
-	freeGinBtreeStack(stack);
+	if (freestack)
+		freeGinBtreeStack(stack);
 }
 
 /*
@@ -541,14 +725,18 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, void *insertdata,
 {
 	bool		done;
 
+	/* If the leaf page was incompletely split, finish the split first */
+	if (GinPageIsIncompleteSplit(BufferGetPage(stack->buffer)))
+		ginFinishSplit(btree, stack, false, buildStats);
+
 	done = ginPlaceToPage(btree, stack,
 						  insertdata, InvalidBlockNumber,
-						  buildStats);
+						  InvalidBuffer, buildStats);
 	if (done)
 	{
 		LockBuffer(stack->buffer, GIN_UNLOCK);
 		freeGinBtreeStack(stack);
 	}
 	else
-		ginFinishSplit(btree, stack, buildStats);
+		ginFinishSplit(btree, stack, true, buildStats);
 }
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 647d383..0d7537b 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -227,6 +227,7 @@ GinDataPageAddItemPointer(Page page, ItemPointer data, OffsetNumber offset)
 	OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
 	char	   *ptr;
 
+	Assert(ItemPointerIsValid(data));
 	Assert(GinPageIsLeaf(page));
 
 	if (offset == InvalidOffsetNumber)
@@ -255,6 +256,7 @@ GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset)
 	OffsetNumber maxoff = GinPageGetOpaque(page)->maxoff;
 	char	   *ptr;
 
+	Assert(PostingItemGetBlockNumber(data) != InvalidBlockNumber);
 	Assert(!GinPageIsLeaf(page));
 
 	if (offset == InvalidOffsetNumber)
@@ -337,11 +339,8 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 				XLogRecData **prdata)
 {
 	Page		page = BufferGetPage(buf);
-	int			cnt = 0;
-
 	/* these must be static so they can be returned to caller */
-	static XLogRecData rdata[3];
-	static ginxlogInsert data;
+	static XLogRecData rdata[2];
 
 	/* quick exit if it doesn't fit */
 	if (!dataIsEnoughSpace(btree, buf, off, insertdata))
@@ -357,45 +356,10 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 		PostingItemSetBlockNumber(pitem, updateblkno);
 	}
 
-	data.updateBlkno = updateblkno;
-	data.node = btree->index->rd_node;
-	data.blkno = BufferGetBlockNumber(buf);
-	data.offset = off;
-	data.nitem = 1;
-	data.isDelete = FALSE;
-	data.isData = TRUE;
-	data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
-
-	/*
-	 * Prevent full page write if child's split occurs. That is needed to
-	 * remove incomplete splits while replaying WAL
-	 *
-	 * data.updateBlkno contains new block number (of newly created right
-	 * page) for recently splited page.
-	 */
-	if (data.updateBlkno == InvalidBlockNumber)
-	{
-		rdata[0].buffer = buf;
-		rdata[0].buffer_std = FALSE;
-		rdata[0].data = NULL;
-		rdata[0].len = 0;
-		rdata[0].next = &rdata[1];
-		cnt++;
-	}
-
-	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (char *) &data;
-	rdata[cnt].len = sizeof(ginxlogInsert);
-	rdata[cnt].next = &rdata[cnt + 1];
-	cnt++;
-
-	rdata[cnt].buffer = InvalidBuffer;
-	/* data and len filled in below */
-	rdata[cnt].next = NULL;
-
 	if (GinPageIsLeaf(page))
 	{
 		GinBtreeDataLeafInsertData *items = insertdata;
+		static ginxlogInsertDataLeaf data;
 		uint32		savedPos = items->curitem;
 
 		if (GinPageRightMost(page) && off > GinPageGetOpaque(page)->maxoff)
@@ -413,18 +377,28 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 		{
 			GinDataPageAddItemPointer(page, items->items + items->curitem, off);
 			items->curitem++;
+			data.nitem = 1;
 		}
 
-		rdata[cnt].data = (char *) &items->items[savedPos];
-		rdata[cnt].len = sizeof(ItemPointerData) * data.nitem;
+		rdata[0].buffer = InvalidBuffer;
+		rdata[0].data = (char *) &data;
+		rdata[0].len = offsetof(ginxlogInsertDataLeaf, items);
+		rdata[0].next = &rdata[1];
+
+		rdata[1].buffer = InvalidBuffer;
+		rdata[1].data = (char *) &items->items[savedPos];
+		rdata[1].len = sizeof(ItemPointerData) * data.nitem;
+		rdata[1].next = NULL;
 	}
 	else
 	{
 		PostingItem *pitem = insertdata;
 		GinDataPageAddPostingItem(page, pitem, off);
 
-		rdata[cnt].data = (char *) pitem;
-		rdata[cnt].len = sizeof(PostingItem);
+		rdata[0].buffer = InvalidBuffer;
+		rdata[0].data = (char *) pitem;
+		rdata[0].len = sizeof(PostingItem);
+		rdata[0].next = NULL;
 	}
 
 	return true;
@@ -453,8 +427,8 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 	Size		freeSpace;
 
 	/* these must be static so they can be returned to caller */
-	static ginxlogSplit data;
-	static XLogRecData rdata[4];
+	static ginxlogSplitData data;
+	static XLogRecData rdata[2];
 	static char vector[2 * BLCKSZ];
 
 	GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
@@ -484,6 +458,7 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 
 	if (isleaf && GinPageRightMost(lpage) && off > GinPageGetOpaque(lpage)->maxoff)
 	{
+		/* append new items to the end */
 		GinBtreeDataLeafInsertData *items = insertdata;
 		while (items->curitem < items->nitem &&
 			   maxoff * sizeof(ItemPointerData) < 2 * (freeSpace - sizeof(ItemPointerData)))
@@ -559,25 +534,18 @@ dataSplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 	bound = GinDataPageGetRightBound(rpage);
 	*bound = oldbound;
 
-	data.node = btree->index->rd_node;
-	data.rootBlkno = InvalidBlockNumber;
-	data.lblkno = BufferGetBlockNumber(lbuf);
-	data.rblkno = BufferGetBlockNumber(rbuf);
 	data.separator = separator;
 	data.nitem = maxoff;
-	data.isData = TRUE;
-	data.isLeaf = GinPageIsLeaf(lpage) ? TRUE : FALSE;
-	data.isRootSplit = FALSE;
 	data.rightbound = oldbound;
 
 	rdata[0].buffer = InvalidBuffer;
 	rdata[0].data = (char *) &data;
-	rdata[0].len = sizeof(ginxlogSplit);
+	rdata[0].len = sizeof(ginxlogSplitData);
 	rdata[0].next = &rdata[1];
 
 	rdata[1].buffer = InvalidBuffer;
 	rdata[1].data = vector;
-	rdata[1].len = MAXALIGN(maxoff * sizeofitem);
+	rdata[1].len = maxoff * sizeofitem;
 	rdata[1].next = NULL;
 
 	return lpage;
@@ -603,21 +571,18 @@ dataPrepareDownlink(GinBtree btree, Buffer lbuf)
  * Also called from ginxlog, should not use btree
  */
 void
-ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
+ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage)
 {
-	Page		page = BufferGetPage(root),
-				lpage = BufferGetPage(lbuf),
-				rpage = BufferGetPage(rbuf);
 	PostingItem li,
 				ri;
 
 	li.key = *GinDataPageGetRightBound(lpage);
-	PostingItemSetBlockNumber(&li, BufferGetBlockNumber(lbuf));
-	GinDataPageAddPostingItem(page, &li, InvalidOffsetNumber);
+	PostingItemSetBlockNumber(&li, lblkno);
+	GinDataPageAddPostingItem(root, &li, InvalidOffsetNumber);
 
 	ri.key = *GinDataPageGetRightBound(rpage);
-	PostingItemSetBlockNumber(&ri, BufferGetBlockNumber(rbuf));
-	GinDataPageAddPostingItem(page, &ri, InvalidOffsetNumber);
+	PostingItemSetBlockNumber(&ri, rblkno);
+	GinDataPageAddPostingItem(root, &ri, InvalidOffsetNumber);
 }
 
 /*
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 56cbf8d..1751e34 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -500,7 +500,7 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 
 	/* these must be static so they can be returned to caller */
 	static XLogRecData rdata[3];
-	static ginxlogInsert data;
+	static ginxlogInsertEntry data;
 
 	/* quick exit if it doesn't fit */
 	if (!entryIsEnoughSpace(btree, buf, off, entry))
@@ -508,41 +508,17 @@ entryPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off,
 
 	*prdata = rdata;
 	entryPreparePage(btree, page, off, entry, updateblkno);
-	data.updateBlkno = updateblkno;
 
 	placed = PageAddItem(page, (Item) entry->entry, IndexTupleSize(entry->entry), off, false, false);
 	if (placed != off)
 		elog(ERROR, "failed to add item to index page in \"%s\"",
 			 RelationGetRelationName(btree->index));
 
-	data.node = btree->index->rd_node;
-	data.blkno = BufferGetBlockNumber(buf);
-	data.offset = off;
-	data.nitem = 1;
 	data.isDelete = entry->isDelete;
-	data.isData = false;
-	data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
-
-	/*
-	 * Prevent full page write if child's split occurs. That is needed to
-	 * remove incomplete splits while replaying WAL
-	 *
-	 * data.updateBlkno contains new block number (of newly created right
-	 * page) for recently splited page.
-	 */
-	if (data.updateBlkno == InvalidBlockNumber)
-	{
-		rdata[0].buffer = buf;
-		rdata[0].buffer_std = TRUE;
-		rdata[0].data = NULL;
-		rdata[0].len = 0;
-		rdata[0].next = &rdata[1];
-		cnt++;
-	}
 
 	rdata[cnt].buffer = InvalidBuffer;
 	rdata[cnt].data = (char *) &data;
-	rdata[cnt].len = sizeof(ginxlogInsert);
+	rdata[cnt].len = offsetof(ginxlogInsertEntry, tuple);
 	rdata[cnt].next = &rdata[cnt + 1];
 	cnt++;
 
@@ -569,6 +545,7 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 				maxoff,
 				separator = InvalidOffsetNumber;
 	Size		totalsize = 0;
+	Size		tupstoresize;
 	Size		lsize = 0,
 				size;
 	char	   *ptr;
@@ -580,18 +557,18 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 
 	/* these must be static so they can be returned to caller */
 	static XLogRecData rdata[2];
-	static ginxlogSplit data;
+	static ginxlogSplitEntry data;
 	static char tupstore[2 * BLCKSZ];
 
 	*prdata = rdata;
-	data.leftChildBlkno = (GinPageIsLeaf(lpage)) ?
-		InvalidOffsetNumber : GinGetDownlink(entry->entry);
-	data.updateBlkno = updateblkno;
 	entryPreparePage(btree, lpage, off, entry, updateblkno);
 
+	/*
+	 * First, append all the existing tuples and the new tuple we're inserting
+	 * one after another in a temporary workspace.
+	 */
 	maxoff = PageGetMaxOffsetNumber(lpage);
 	ptr = tupstore;
-
 	for (i = FirstOffsetNumber; i <= maxoff; i++)
 	{
 		if (i == off)
@@ -616,7 +593,12 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 		ptr += size;
 		totalsize += size + sizeof(ItemIdData);
 	}
+	tupstoresize = ptr - tupstore;
 
+	/*
+	 * Initialize the left and right pages, and copy all the tuples back to
+	 * them.
+	 */
 	GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
 	GinInitPage(lpage, GinPageGetOpaque(rpage)->flags, pageSize);
 
@@ -646,24 +628,17 @@ entrySplitPage(GinBtree btree, Buffer lbuf, Buffer rbuf, OffsetNumber off,
 		ptr += MAXALIGN(IndexTupleSize(itup));
 	}
 
-	data.node = btree->index->rd_node;
-	data.rootBlkno = InvalidBlockNumber;
-	data.lblkno = BufferGetBlockNumber(lbuf);
-	data.rblkno = BufferGetBlockNumber(rbuf);
 	data.separator = separator;
 	data.nitem = maxoff;
-	data.isData = FALSE;
-	data.isLeaf = GinPageIsLeaf(lpage) ? TRUE : FALSE;
-	data.isRootSplit = FALSE;
 
 	rdata[0].buffer = InvalidBuffer;
 	rdata[0].data = (char *) &data;
-	rdata[0].len = sizeof(ginxlogSplit);
+	rdata[0].len = sizeof(ginxlogSplitEntry);
 	rdata[0].next = &rdata[1];
 
 	rdata[1].buffer = InvalidBuffer;
 	rdata[1].data = tupstore;
-	rdata[1].len = MAXALIGN(totalsize);
+	rdata[1].len = tupstoresize;
 	rdata[1].next = NULL;
 
 	return lpage;
@@ -693,24 +668,19 @@ entryPrepareDownlink(GinBtree btree, Buffer lbuf)
  * Also called from ginxlog, should not use btree
  */
 void
-ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf)
+ginEntryFillRoot(GinBtree btree, Page root,
+				 BlockNumber lblkno, Page lpage,
+				 BlockNumber rblkno, Page rpage)
 {
-	Page		page = BufferGetPage(root);
-	Page		lpage = BufferGetPage(lbuf);
-	Page		rpage = BufferGetPage(rbuf);
 	IndexTuple	itup;
 
-	itup = GinFormInteriorTuple(getRightMostTuple(lpage),
-								lpage,
-								BufferGetBlockNumber(lbuf));
-	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+	itup = GinFormInteriorTuple(getRightMostTuple(lpage), lpage, lblkno);
+	if (PageAddItem(root, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
 		elog(ERROR, "failed to add item to index root page");
 	pfree(itup);
 
-	itup = GinFormInteriorTuple(getRightMostTuple(rpage),
-								rpage,
-								BufferGetBlockNumber(rbuf));
-	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+	itup = GinFormInteriorTuple(getRightMostTuple(rpage), rpage, rblkno);
+	if (PageAddItem(root, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
 		elog(ERROR, "failed to add item to index root page");
 	pfree(itup);
 }
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index ca7bee1..c524656 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -18,55 +18,27 @@
 #include "utils/memutils.h"
 
 static MemoryContext opCtx;		/* working memory for operations */
-static MemoryContext topCtx;
-
-typedef struct ginIncompleteSplit
-{
-	RelFileNode node;
-	BlockNumber leftBlkno;
-	BlockNumber rightBlkno;
-	BlockNumber rootBlkno;
-} ginIncompleteSplit;
-
-static List *incomplete_splits;
 
 static void
-pushIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, BlockNumber rightBlkno, BlockNumber rootBlkno)
+ginRedoClearIncompleteSplit(XLogRecPtr lsn, RelFileNode node, BlockNumber blkno)
 {
-	ginIncompleteSplit *split;
-
-	MemoryContextSwitchTo(topCtx);
-
-	split = palloc(sizeof(ginIncompleteSplit));
-
-	split->node = node;
-	split->leftBlkno = leftBlkno;
-	split->rightBlkno = rightBlkno;
-	split->rootBlkno = rootBlkno;
-
-	incomplete_splits = lappend(incomplete_splits, split);
-
-	MemoryContextSwitchTo(opCtx);
-}
+	Buffer		buffer;
+	Page		page;
 
-static void
-forgetIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, BlockNumber updateBlkno)
-{
-	ListCell   *l;
+	buffer = XLogReadBuffer(node, blkno, false);
+	if (!BufferIsValid(buffer))
+		return;					/* page was deleted, nothing to do */
+	page = (Page) BufferGetPage(buffer);
 
-	foreach(l, incomplete_splits)
+	if (lsn > PageGetLSN(page))
 	{
-		ginIncompleteSplit *split = (ginIncompleteSplit *) lfirst(l);
+		GinPageGetOpaque(page)->flags &= ~GIN_INCOMPLETE_SPLIT;
 
-		if (RelFileNodeEquals(node, split->node) &&
-			leftBlkno == split->leftBlkno &&
-			updateBlkno == split->rightBlkno)
-		{
-			incomplete_splits = list_delete_ptr(incomplete_splits, split);
-			pfree(split);
-			break;
-		}
+		PageSetLSN(page, lsn);
+		MarkBufferDirty(buffer);
 	}
+
+	UnlockReleaseBuffer(buffer);
 }
 
 static void
@@ -128,44 +100,105 @@ ginRedoCreatePTree(XLogRecPtr lsn, XLogRecord *record)
 }
 
 static void
-ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
+ginRedoInsertEntry(Buffer buffer, OffsetNumber offset, BlockNumber rightblkno,
+				   void *rdata)
 {
-	ginxlogInsert *data = (ginxlogInsert *) XLogRecGetData(record);
-	Buffer		buffer;
-	Page		page;
+	Page		page = BufferGetPage(buffer);
+	ginxlogInsertEntry *data = (ginxlogInsertEntry *) rdata;
+	IndexTuple	itup;
 
-	/* first, forget any incomplete split this insertion completes */
-	if (data->isData)
+	if (rightblkno != InvalidBlockNumber)
 	{
-		Assert(data->isDelete == FALSE);
-		if (!data->isLeaf && data->updateBlkno != InvalidBlockNumber)
-		{
-			PostingItem *pitem;
+		/* update link to right page after split */
+		Assert(!GinPageIsLeaf(page));
+		Assert(offset >= FirstOffsetNumber && offset <= PageGetMaxOffsetNumber(page));
+		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+		GinSetDownlink(itup, rightblkno);
+	}
 
-			pitem = (PostingItem *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-			forgetIncompleteSplit(data->node,
-								  PostingItemGetBlockNumber(pitem),
-								  data->updateBlkno);
-		}
+	if (data->isDelete)
+	{
+		Assert(GinPageIsLeaf(page));
+		Assert(offset >= FirstOffsetNumber && offset <= PageGetMaxOffsetNumber(page));
+		PageIndexTupleDelete(page, offset);
+	}
+
+	itup = &data->tuple;
+
+	if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), offset, false, false) == InvalidOffsetNumber)
+	{
+		RelFileNode node;
+		ForkNumber forknum;
+		BlockNumber blknum;
 
+		BufferGetTag(buffer, &node, &forknum, &blknum);
+		elog(ERROR, "failed to add item to index page in %u/%u/%u",
+			 node.spcNode, node.dbNode, node.relNode);
+	}
+}
+
+static void
+ginRedoInsertData(Buffer buffer, OffsetNumber offset, BlockNumber rightblkno,
+				  void *rdata)
+{
+	Page		page = BufferGetPage(buffer);
+
+	if (GinPageIsLeaf(page))
+	{
+		ginxlogInsertDataLeaf *data = (ginxlogInsertDataLeaf *) rdata;
+		ItemPointerData *items = data->items;
+		OffsetNumber i;
+
+		for (i = 0; i < data->nitem; i++)
+			GinDataPageAddItemPointer(page, &items[i], offset + i);
 	}
 	else
 	{
-		if (!data->isLeaf && data->updateBlkno != InvalidBlockNumber)
-		{
-			IndexTuple	itup;
+		PostingItem *pitem = (PostingItem *) rdata;
+		PostingItem *oldpitem;
 
-			itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-			forgetIncompleteSplit(data->node,
-								  GinGetDownlink(itup),
-								  data->updateBlkno);
-		}
+		/* update link to right page after split */
+		oldpitem = GinDataPageGetPostingItem(page, offset);
+		PostingItemSetBlockNumber(oldpitem, rightblkno);
+
+		GinDataPageAddPostingItem(page, pitem, offset);
+	}
+}
+
+static void
+ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
+{
+	ginxlogInsert *data = (ginxlogInsert *) XLogRecGetData(record);
+	Buffer		buffer;
+	Page		page;
+	char	   *payload;
+	BlockNumber leftChildBlkno = InvalidBlockNumber;
+	BlockNumber rightChildBlkno = InvalidBlockNumber;
+	bool		isLeaf = (data->flags & GIN_INSERT_ISLEAF) != 0;
+
+	payload = XLogRecGetData(record) + sizeof(ginxlogInsert);
+
+	/*
+	 * First clear incomplete-split flag on child page if this finishes
+	 * a split
+	 */
+	if (!isLeaf)
+	{
+		memcpy(&leftChildBlkno, payload, sizeof(BlockNumber));
+		payload += sizeof(BlockNumber);
+		memcpy(&rightChildBlkno, payload, sizeof(BlockNumber));
+		payload += sizeof(BlockNumber);
+
+		if (record->xl_info & XLR_BKP_BLOCK(0))
+			(void) RestoreBackupBlock(lsn, record, 0, false, false);
+		else
+			ginRedoClearIncompleteSplit(lsn, data->node, leftChildBlkno);
 	}
 
 	/* If we have a full-page image, restore it and we're done */
-	if (record->xl_info & XLR_BKP_BLOCK(0))
+	if (record->xl_info & XLR_BKP_BLOCK(isLeaf ? 0 : 1))
 	{
-		(void) RestoreBackupBlock(lsn, record, 0, false, false);
+		(void) RestoreBackupBlock(lsn, record, isLeaf ? 0 : 1, false, false);
 		return;
 	}
 
@@ -176,74 +209,82 @@ ginRedoInsert(XLogRecPtr lsn, XLogRecord *record)
 
 	if (lsn > PageGetLSN(page))
 	{
-		if (data->isData)
+		/* How to insert the payload is tree-type specific */
+		if (data->flags & GIN_INSERT_ISDATA)
 		{
 			Assert(GinPageIsData(page));
-
-			if (data->isLeaf)
-			{
-				OffsetNumber i;
-				ItemPointerData *items = (ItemPointerData *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-
-				Assert(GinPageIsLeaf(page));
-				Assert(data->updateBlkno == InvalidBlockNumber);
-
-				for (i = 0; i < data->nitem; i++)
-					GinDataPageAddItemPointer(page, &items[i], data->offset + i);
-			}
-			else
-			{
-				PostingItem *pitem;
-
-				Assert(!GinPageIsLeaf(page));
-
-				if (data->updateBlkno != InvalidBlockNumber)
-				{
-					/* update link to right page after split */
-					pitem = GinDataPageGetPostingItem(page, data->offset);
-					PostingItemSetBlockNumber(pitem, data->updateBlkno);
-				}
-
-				pitem = (PostingItem *) (XLogRecGetData(record) + sizeof(ginxlogInsert));
-
-				GinDataPageAddPostingItem(page, pitem, data->offset);
-			}
+			ginRedoInsertData(buffer, data->offset, rightChildBlkno, payload);
 		}
 		else
 		{
-			IndexTuple	itup;
-
 			Assert(!GinPageIsData(page));
+			ginRedoInsertEntry(buffer, data->offset, rightChildBlkno, payload);
+		}
 
-			if (data->updateBlkno != InvalidBlockNumber)
-			{
-				/* update link to right page after split */
-				Assert(!GinPageIsLeaf(page));
-				Assert(data->offset >= FirstOffsetNumber && data->offset <= PageGetMaxOffsetNumber(page));
-				itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, data->offset));
-				GinSetDownlink(itup, data->updateBlkno);
-			}
+		PageSetLSN(page, lsn);
+		MarkBufferDirty(buffer);
+	}
 
-			if (data->isDelete)
-			{
-				Assert(GinPageIsLeaf(page));
-				Assert(data->offset >= FirstOffsetNumber && data->offset <= PageGetMaxOffsetNumber(page));
-				PageIndexTupleDelete(page, data->offset);
-			}
+	UnlockReleaseBuffer(buffer);
+}
 
-			itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogInsert));
+static void
+ginRedoSplitEntry(Page lpage, Page rpage, void *rdata)
+{
+	ginxlogSplitEntry *data = (ginxlogSplitEntry *) rdata;
+	IndexTuple	itup = (IndexTuple) ((char *) rdata + sizeof(ginxlogSplitEntry));
+	OffsetNumber i;
 
-			if (PageAddItem(page, (Item) itup, IndexTupleSize(itup), data->offset, false, false) == InvalidOffsetNumber)
-				elog(ERROR, "failed to add item to index page in %u/%u/%u",
-				  data->node.spcNode, data->node.dbNode, data->node.relNode);
-		}
+	for (i = 0; i < data->separator; i++)
+	{
+		if (PageAddItem(lpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+			elog(ERROR, "failed to add item to gin index page");
+		itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
+	}
 
-		PageSetLSN(page, lsn);
+	for (i = data->separator; i < data->nitem; i++)
+	{
+		if (PageAddItem(rpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
+			elog(ERROR, "failed to add item to gin index page");
+		itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
+	}
+}
 
-		MarkBufferDirty(buffer);
+static void
+ginRedoSplitData(Page lpage, Page rpage, void *rdata)
+{
+	ginxlogSplitData *data = (ginxlogSplitData *) rdata;
+	bool		isleaf = GinPageIsLeaf(lpage);
+	char	   *ptr = (char *) rdata + sizeof(ginxlogSplitData);
+	OffsetNumber i;
+	ItemPointer bound;
+
+	if (isleaf)
+	{
+		ItemPointer items = (ItemPointer) ptr;
+		for (i = 0; i < data->separator; i++)
+			GinDataPageAddItemPointer(lpage, &items[i], InvalidOffsetNumber);
+		for (i = data->separator; i < data->nitem; i++)
+			GinDataPageAddItemPointer(rpage, &items[i], InvalidOffsetNumber);
+	}
+	else
+	{
+		PostingItem *items = (PostingItem *) ptr;
+		for (i = 0; i < data->separator; i++)
+			GinDataPageAddPostingItem(lpage, &items[i], InvalidOffsetNumber);
+		for (i = data->separator; i < data->nitem; i++)
+			GinDataPageAddPostingItem(rpage, &items[i], InvalidOffsetNumber);
 	}
 
-	UnlockReleaseBuffer(buffer);
+	/* set up right key */
+	bound = GinDataPageGetRightBound(lpage);
+	if (isleaf)
+		*bound = *GinDataPageGetItemPointer(lpage, GinPageGetOpaque(lpage)->maxoff);
+	else
+		*bound = GinDataPageGetPostingItem(lpage, GinPageGetOpaque(lpage)->maxoff)->key;
+
+	bound = GinDataPageGetRightBound(rpage);
+	*bound = data->rightbound;
 }
 
 static void
@@ -255,14 +296,30 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 	Page		lpage,
 				rpage;
 	uint32		flags = 0;
+	char	   *payload;
+	bool		isLeaf = (data->flags & GIN_INSERT_ISLEAF) != 0;
+	bool		isData = (data->flags & GIN_INSERT_ISDATA) != 0;
+	bool		isRoot = (data->flags & GIN_SPLIT_ROOT) != 0;
+
+	payload = XLogRecGetData(record) + sizeof(ginxlogSplit);
+
+	/*
+	 * First clear incomplete-split flag on child page if this finishes
+	 * a split
+	 */
+	if (!isLeaf)
+	{
+		if (record->xl_info & XLR_BKP_BLOCK(0))
+			(void) RestoreBackupBlock(lsn, record, 0, false, false);
+		else
+			ginRedoClearIncompleteSplit(lsn, data->node, data->leftChildBlkno);
+	}
 
-	if (data->isLeaf)
+	if (isLeaf)
 		flags |= GIN_LEAF;
-	if (data->isData)
-		flags |= GIN_DATA;
 
-	/* Backup blocks are not used in split records */
-	Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
+	if (isData)
+		flags |= GIN_DATA;
 
 	lbuffer = XLogReadBuffer(data->node, data->lblkno, true);
 	Assert(BufferIsValid(lbuffer));
@@ -275,64 +332,13 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 	GinInitBuffer(rbuffer, flags);
 
 	GinPageGetOpaque(lpage)->rightlink = BufferGetBlockNumber(rbuffer);
-	GinPageGetOpaque(rpage)->rightlink = data->rrlink;
-
-	if (data->isData)
-	{
-		char	   *ptr = XLogRecGetData(record) + sizeof(ginxlogSplit);
-		Size		sizeofitem = GinSizeOfDataPageItem(lpage);
-		OffsetNumber i;
-		ItemPointer bound;
-
-		for (i = 0; i < data->separator; i++)
-		{
-			if (data->isLeaf)
-				GinDataPageAddItemPointer(lpage, (ItemPointer) ptr, InvalidOffsetNumber);
-			else
-				GinDataPageAddPostingItem(lpage, (PostingItem *) ptr, InvalidOffsetNumber);
-			ptr += sizeofitem;
-		}
-
-		for (i = data->separator; i < data->nitem; i++)
-		{
-			if (data->isLeaf)
-				GinDataPageAddItemPointer(rpage, (ItemPointer) ptr, InvalidOffsetNumber);
-			else
-				GinDataPageAddPostingItem(rpage, (PostingItem *) ptr, InvalidOffsetNumber);
-			ptr += sizeofitem;
-		}
+	GinPageGetOpaque(rpage)->rightlink = isRoot ? InvalidBlockNumber : data->rrlink;
 
-		/* set up right key */
-		bound = GinDataPageGetRightBound(lpage);
-		if (data->isLeaf)
-			*bound = *GinDataPageGetItemPointer(lpage, GinPageGetOpaque(lpage)->maxoff);
-		else
-			*bound = GinDataPageGetPostingItem(lpage, GinPageGetOpaque(lpage)->maxoff)->key;
-
-		bound = GinDataPageGetRightBound(rpage);
-		*bound = data->rightbound;
-	}
+	/* Do the tree-type specific portion to restore the page contents */
+	if (isData)
+		ginRedoSplitData(lpage, rpage, payload);
 	else
-	{
-		IndexTuple	itup = (IndexTuple) (XLogRecGetData(record) + sizeof(ginxlogSplit));
-		OffsetNumber i;
-
-		for (i = 0; i < data->separator; i++)
-		{
-			if (PageAddItem(lpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
-				elog(ERROR, "failed to add item to index page in %u/%u/%u",
-				  data->node.spcNode, data->node.dbNode, data->node.relNode);
-			itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
-		}
-
-		for (i = data->separator; i < data->nitem; i++)
-		{
-			if (PageAddItem(rpage, (Item) itup, IndexTupleSize(itup), InvalidOffsetNumber, false, false) == InvalidOffsetNumber)
-				elog(ERROR, "failed to add item to index page in %u/%u/%u",
-				  data->node.spcNode, data->node.dbNode, data->node.relNode);
-			itup = (IndexTuple) (((char *) itup) + MAXALIGN(IndexTupleSize(itup)));
-		}
-	}
+		ginRedoSplitEntry(lpage, rpage, payload);
 
 	PageSetLSN(rpage, lsn);
 	MarkBufferDirty(rbuffer);
@@ -340,25 +346,31 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 	PageSetLSN(lpage, lsn);
 	MarkBufferDirty(lbuffer);
 
-	if (!data->isLeaf && data->updateBlkno != InvalidBlockNumber)
-		forgetIncompleteSplit(data->node, data->leftChildBlkno, data->updateBlkno);
-
-	if (data->isRootSplit)
+	if (isRoot)
 	{
-		Buffer		rootBuf = XLogReadBuffer(data->node, data->rootBlkno, true);
+		BlockNumber	rootBlkno = data->rrlink;
+		Buffer		rootBuf = XLogReadBuffer(data->node, rootBlkno, true);
 		Page		rootPage = BufferGetPage(rootBuf);
 
 		GinInitBuffer(rootBuf, flags & ~GIN_LEAF);
 
-		if (data->isData)
+		if (isData)
 		{
-			Assert(data->rootBlkno != GIN_ROOT_BLKNO);
-			ginDataFillRoot(NULL, rootBuf, lbuffer, rbuffer);
+			Assert(rootBlkno != GIN_ROOT_BLKNO);
+			ginDataFillRoot(NULL, BufferGetPage(rootBuf),
+							BufferGetBlockNumber(lbuffer),
+							BufferGetPage(lbuffer),
+							BufferGetBlockNumber(rbuffer),
+							BufferGetPage(rbuffer));
 		}
 		else
 		{
-			Assert(data->rootBlkno == GIN_ROOT_BLKNO);
-			ginEntryFillRoot(NULL, rootBuf, lbuffer, rbuffer);
+			Assert(rootBlkno == GIN_ROOT_BLKNO);
+			ginEntryFillRoot(NULL, BufferGetPage(rootBuf),
+							 BufferGetBlockNumber(lbuffer),
+							 BufferGetPage(lbuffer),
+							 BufferGetBlockNumber(rbuffer),
+							 BufferGetPage(rbuffer));
 		}
 
 		PageSetLSN(rootPage, lsn);
@@ -366,8 +378,6 @@ ginRedoSplit(XLogRecPtr lsn, XLogRecord *record)
 		MarkBufferDirty(rootBuf);
 		UnlockReleaseBuffer(rootBuf);
 	}
-	else
-		pushIncompleteSplit(data->node, data->lblkno, data->rblkno, data->rootBlkno);
 
 	UnlockReleaseBuffer(rbuffer);
 	UnlockReleaseBuffer(lbuffer);
@@ -711,6 +721,7 @@ void
 gin_redo(XLogRecPtr lsn, XLogRecord *record)
 {
 	uint8		info = record->xl_info & ~XLR_INFO_MASK;
+	MemoryContext oldCtx;
 
 	/*
 	 * GIN indexes do not require any conflict processing. NB: If we ever
@@ -718,7 +729,7 @@ gin_redo(XLogRecPtr lsn, XLogRecord *record)
 	 * killed tuples outside VACUUM, we'll need to handle that here.
 	 */
 
-	topCtx = MemoryContextSwitchTo(opCtx);
+	oldCtx = MemoryContextSwitchTo(opCtx);
 	switch (info)
 	{
 		case XLOG_GIN_CREATE_INDEX:
@@ -751,15 +762,13 @@ gin_redo(XLogRecPtr lsn, XLogRecord *record)
 		default:
 			elog(PANIC, "gin_redo: unknown op code %u", info);
 	}
-	MemoryContextSwitchTo(topCtx);
+	MemoryContextSwitchTo(oldCtx);
 	MemoryContextReset(opCtx);
 }
 
 void
 gin_xlog_startup(void)
 {
-	incomplete_splits = NIL;
-
 	opCtx = AllocSetContextCreate(CurrentMemoryContext,
 								  "GIN recovery temporary context",
 								  ALLOCSET_DEFAULT_MINSIZE,
@@ -767,84 +776,8 @@ gin_xlog_startup(void)
 								  ALLOCSET_DEFAULT_MAXSIZE);
 }
 
-static void
-ginContinueSplit(ginIncompleteSplit *split)
-{
-	GinBtreeData btree;
-	GinState	ginstate;
-	Relation	reln;
-	Buffer		buffer;
-	GinBtreeStack *stack;
-
-	/*
-	 * elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u",  split->rootBlkno,
-	 * split->leftBlkno, split->rightBlkno);
-	 */
-	buffer = XLogReadBuffer(split->node, split->leftBlkno, false);
-
-	/*
-	 * Failure should be impossible here, because we wrote the page earlier.
-	 */
-	if (!BufferIsValid(buffer))
-		elog(PANIC, "ginContinueSplit: left block %u not found",
-			 split->leftBlkno);
-
-	reln = CreateFakeRelcacheEntry(split->node);
-
-	if (split->rootBlkno == GIN_ROOT_BLKNO)
-	{
-		MemSet(&ginstate, 0, sizeof(ginstate));
-		ginstate.index = reln;
-
-		ginPrepareEntryScan(&btree,
-							InvalidOffsetNumber, (Datum) 0, GIN_CAT_NULL_KEY,
-							&ginstate);
-	}
-	else
-	{
-		ginPrepareDataScan(&btree, reln, split->rootBlkno);
-	}
-
-	stack = palloc(sizeof(GinBtreeStack));
-	stack->blkno = split->leftBlkno;
-	stack->buffer = buffer;
-	stack->off = InvalidOffsetNumber;
-	stack->parent = NULL;
-
-	ginFindParents(&btree, stack);
-	LockBuffer(stack->parent->buffer, GIN_UNLOCK);
-	ginFinishSplit(&btree, stack, NULL);
-
-	/* buffer is released by ginFinishSplit */
-
-	FreeFakeRelcacheEntry(reln);
-}
-
 void
 gin_xlog_cleanup(void)
 {
-	ListCell   *l;
-	MemoryContext topCtx;
-
-	topCtx = MemoryContextSwitchTo(opCtx);
-
-	foreach(l, incomplete_splits)
-	{
-		ginIncompleteSplit *split = (ginIncompleteSplit *) lfirst(l);
-
-		ginContinueSplit(split);
-		MemoryContextReset(opCtx);
-	}
-
-	MemoryContextSwitchTo(topCtx);
 	MemoryContextDelete(opCtx);
-	incomplete_splits = NIL;
-}
-
-bool
-gin_safe_restartpoint(void)
-{
-	if (incomplete_splits)
-		return false;
-	return true;
 }
diff --git a/src/backend/access/rmgrdesc/gindesc.c b/src/backend/access/rmgrdesc/gindesc.c
index 391f75f..c534c3a 100644
--- a/src/backend/access/rmgrdesc/gindesc.c
+++ b/src/backend/access/rmgrdesc/gindesc.c
@@ -41,20 +41,45 @@ gin_desc(StringInfo buf, uint8 xl_info, char *rec)
 			desc_node(buf, ((ginxlogCreatePostingTree *) rec)->node, ((ginxlogCreatePostingTree *) rec)->blkno);
 			break;
 		case XLOG_GIN_INSERT:
-			appendStringInfoString(buf, "Insert item, ");
-			desc_node(buf, ((ginxlogInsert *) rec)->node, ((ginxlogInsert *) rec)->blkno);
-			appendStringInfo(buf, " offset: %u nitem: %u isdata: %c isleaf %c isdelete %c updateBlkno:%u",
-							 ((ginxlogInsert *) rec)->offset,
-							 ((ginxlogInsert *) rec)->nitem,
-							 (((ginxlogInsert *) rec)->isData) ? 'T' : 'F',
-							 (((ginxlogInsert *) rec)->isLeaf) ? 'T' : 'F',
-							 (((ginxlogInsert *) rec)->isDelete) ? 'T' : 'F',
-							 ((ginxlogInsert *) rec)->updateBlkno);
+			{
+				ginxlogInsert *xlrec = (ginxlogInsert *) rec;
+				char	*payload = rec + sizeof(ginxlogInsert);
+
+				appendStringInfoString(buf, "Insert item, ");
+				desc_node(buf, xlrec->node, xlrec->blkno);
+				appendStringInfo(buf, " offset: %u isdata: %c isleaf: %c",
+								 xlrec->offset,
+								 (xlrec->flags & GIN_SPLIT_ISDATA) ? 'T' : 'F',
+								 (xlrec->flags & GIN_SPLIT_ISLEAF) ? 'T' : 'F');
+				if (!(xlrec->flags & GIN_SPLIT_ISLEAF))
+				{
+					BlockNumber leftChildBlkno;
+					BlockNumber rightChildBlkno;
+
+					memcpy(&leftChildBlkno, payload, sizeof(BlockNumber));
+					payload += sizeof(BlockNumber);
+					memcpy(&rightChildBlkno, payload, sizeof(BlockNumber));
+					payload += sizeof(BlockNumber);
+					appendStringInfo(buf, " children: %u/%u",
+									 leftChildBlkno, rightChildBlkno);
+				}
+				if (!(xlrec->flags & GIN_SPLIT_ISDATA))
+					appendStringInfo(buf, " isdelete: %c",
+									 (((ginxlogInsertEntry *) payload)->isDelete) ? 'T' : 'F');
+				else if (xlrec->flags & GIN_SPLIT_ISLEAF)
+					appendStringInfo(buf, " nitem: %u",
+									 (((ginxlogInsertDataLeaf *) payload)->nitem) ? 'T' : 'F');
+				else
+					appendStringInfo(buf, " pitem: %u-%u/%u",
+									 PostingItemGetBlockNumber((PostingItem *) payload),
+									 ItemPointerGetBlockNumber(&((PostingItem *) payload)->key),
+									 ItemPointerGetOffsetNumber(&((PostingItem *) payload)->key));
+			}
 			break;
 		case XLOG_GIN_SPLIT:
 			appendStringInfoString(buf, "Page split, ");
 			desc_node(buf, ((ginxlogSplit *) rec)->node, ((ginxlogSplit *) rec)->lblkno);
-			appendStringInfo(buf, " isrootsplit: %c", (((ginxlogSplit *) rec)->isRootSplit) ? 'T' : 'F');
+			appendStringInfo(buf, " isrootsplit: %c", (((ginxlogSplit *) rec)->flags & GIN_SPLIT_ROOT) ? 'T' : 'F');
 			break;
 		case XLOG_GIN_VACUUM_PAGE:
 			appendStringInfoString(buf, "Vacuum page, ");
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index b6cb48d..7dcb0e0 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -58,6 +58,5 @@ extern void gin_redo(XLogRecPtr lsn, XLogRecord *record);
 extern void gin_desc(StringInfo buf, uint8 xl_info, char *rec);
 extern void gin_xlog_startup(void);
 extern void gin_xlog_cleanup(void);
-extern bool gin_safe_restartpoint(void);
 
 #endif   /* GIN_H */
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3aa92b3..d27ff7d 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -48,6 +48,7 @@ typedef GinPageOpaqueData *GinPageOpaque;
 #define GIN_META		  (1 << 3)
 #define GIN_LIST		  (1 << 4)
 #define GIN_LIST_FULLROW  (1 << 5)		/* makes sense only on GIN_LIST page */
+#define GIN_INCOMPLETE_SPLIT (1 << 6)	/* page was split, but parent not updated */
 
 /* Page numbers of fixed-location pages */
 #define GIN_METAPAGE_BLKNO	(0)
@@ -119,6 +120,7 @@ typedef struct GinMetaPageData
 #define GinPageIsDeleted(page) ( GinPageGetOpaque(page)->flags & GIN_DELETED)
 #define GinPageSetDeleted(page)    ( GinPageGetOpaque(page)->flags |= GIN_DELETED)
 #define GinPageSetNonDeleted(page) ( GinPageGetOpaque(page)->flags &= ~GIN_DELETED)
+#define GinPageIsIncompleteSplit(page) ( GinPageGetOpaque(page)->flags & GIN_INCOMPLETE_SPLIT)
 
 #define GinPageRightMost(page) ( GinPageGetOpaque(page)->rightlink == InvalidBlockNumber)
 
@@ -336,41 +338,77 @@ typedef struct ginxlogInsert
 {
 	RelFileNode node;
 	BlockNumber blkno;
-	BlockNumber updateBlkno;
+	uint16		flags;		/* GIN_SPLIT_ISLEAF and/or GIN_SPLIT_ISDATA */
 	OffsetNumber offset;
-	bool		isDelete;
-	bool		isData;
-	bool		isLeaf;
-	OffsetNumber nitem;
 
 	/*
-	 * follows: tuples or ItemPointerData or PostingItem or list of
-	 * ItemPointerData
+	 * FOLLOWS:
+	 * if !isLeaf, left and right child block numbers of the child pages whose
+	 * split this insertion finishes. (non-aligned!)
+	 */
+
+	/*
+	 * follows: one of the following structs, depending on tree type.
+	 *
+	 * NB: the below structs are only 16-bit aligned when appended to a
+	 * ginxlogInsert struct! Beware of adding fields to them that require
+	 * stricter alignment.
 	 */
 } ginxlogInsert;
 
+typedef struct
+{
+	bool		isDelete;
+	IndexTupleData tuple;	/* variable length */
+} ginxlogInsertEntry;
+
+typedef struct
+{
+	OffsetNumber nitem;
+	ItemPointerData items[1]; /* variable length */
+} ginxlogInsertDataLeaf;
+
+/* In an insert to an internal data page, the payload is a PostingItem */
+
+
 #define XLOG_GIN_SPLIT	0x30
 
 typedef struct ginxlogSplit
 {
 	RelFileNode node;
 	BlockNumber lblkno;
-	BlockNumber rootBlkno;
 	BlockNumber rblkno;
-	BlockNumber rrlink;
+	BlockNumber rrlink;				/* right link, or root's blocknumber if root split */
+	BlockNumber	leftChildBlkno;		/* valid on a non-leaf split */
+	BlockNumber	rightChildBlkno;
+	uint16		flags;
+
+	/* follows: one of the following structs */
+} ginxlogSplit;
+
+/*
+ * Flags used in ginxlogInsert and ginxlogSplit records
+ */
+#define GIN_INSERT_ISDATA	0x01	/* for both insert and split records */
+#define GIN_INSERT_ISLEAF	0x02	/* .. */
+#define GIN_SPLIT_ROOT		0x04	/* only for split records */
+
+typedef struct
+{
 	OffsetNumber separator;
 	OffsetNumber nitem;
 
-	bool		isData;
-	bool		isLeaf;
-	bool		isRootSplit;
+	/* FOLLOWS: IndexTuples */
+} ginxlogSplitEntry;
 
-	BlockNumber leftChildBlkno;
-	BlockNumber updateBlkno;
+typedef struct
+{
+	OffsetNumber separator;
+	OffsetNumber nitem;
+	ItemPointerData rightbound;
 
-	ItemPointerData rightbound; /* used only in posting tree */
-	/* follows: list of tuple or ItemPointerData or PostingItem */
-} ginxlogSplit;
+	/* FOLLOWS: array of ItemPointers (for leaf) or PostingItems (non-leaf) */
+} ginxlogSplitData;
 
 #define XLOG_GIN_VACUUM_PAGE	0x40
 
@@ -488,7 +526,7 @@ typedef struct GinBtreeData
 	bool		(*placeToPage) (GinBtree, Buffer, OffsetNumber, void *, BlockNumber, XLogRecData **);
 	Page		(*splitPage) (GinBtree, Buffer, Buffer, OffsetNumber, void *, BlockNumber, XLogRecData **);
 	void		*(*prepareDownlink) (GinBtree, Buffer);
-	void		(*fillRoot) (GinBtree, Buffer, Buffer, Buffer);
+	void		(*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, Page);
 
 	bool		isData;
 
@@ -532,9 +570,6 @@ extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
 			   void *insertdata, GinStatsData *buildStats);
-extern void ginFindParents(GinBtree btree, GinBtreeStack *stack);
-extern void ginFinishSplit(GinBtree btree, GinBtreeStack *stack,
-			   GinStatsData *buildStats);
 
 /* ginentrypage.c */
 extern IndexTuple GinFormTuple(GinState *ginstate,
@@ -544,7 +579,7 @@ extern void GinShortenTuple(IndexTuple itup, uint32 nipd);
 extern void ginPrepareEntryScan(GinBtree btree, OffsetNumber attnum,
 					Datum key, GinNullCategory category,
 					GinState *ginstate);
-extern void ginEntryFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
+extern void ginEntryFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 
 /* gindatapage.c */
 extern BlockNumber createPostingTree(Relation index,
@@ -557,7 +592,7 @@ extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats);
 extern GinBtreeStack *ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno);
-extern void ginDataFillRoot(GinBtree btree, Buffer root, Buffer lbuf, Buffer rbuf);
+extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
 
 /* ginscan.c */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 7ad71b3..166689d 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -38,7 +38,7 @@ PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL)
 PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, NULL, NULL, NULL)
 PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint)
 PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, NULL, NULL, NULL)
-PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, gin_safe_restartpoint)
+PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, NULL)
 PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL)
 PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, NULL, NULL, NULL)
 PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_xlog_startup, spg_xlog_cleanup, NULL)
-- 
1.8.4.2

#6Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#5)
1 attachment(s)
Re: Handling GIN incomplete splits

On Thu, Nov 21, 2013 at 9:44 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Here's a new version. To ease the review, I split the remaining patch again
into two, where the first patch is just yet more refactoring.

I also fixed some bugs in WAL logging and replay that I bumped into while
testing.

Cool. Here is the review of the two remaining patches.
1) More refactoring, general remarks:
- Code compiles without warnings
- Passes make check
- If I got it correctly... this patch separates the part managing data
to be inserted from ginbtree. I can understand the meaning behind that
if we consider that GinBtree is used only to define methods and search
conditions (flags and keys). However I am wondering if this does not
make the code more complicated... Particularly the flag isDelete that
is only passed to ginxlogInsert meritates its comment IMO. Note that I
haven't read the 2nd patch when writing that :)
- With this patch, previous SELECT example takes 5.236 ms in average,
runtime does not change.
1-1) This block of code is repeated several times and should be
refactored into a single function:
/* During index build, count the new page */
if (buildStats)
{
    if (btree->isData)
        buildStats->nDataPages++;
    else
        buildStats->nEntryPages++;
}
Something with a function like that perhaps:
static void ginCountNewPage(GinBtree btree, GinStatsData *buildStats);
1-2) Could it be possible to change the variable name of
"GinBtreeEntryInsertData *entry" in entryIsEnoughSpace? entry->entry
is kind of hard to apprehend... Renaming it to either insertEntry.
Another idea would be to rename entry in GinBtreeEntryInsertData to
entryData or entryTuple.
1-3) This block of code is present two times:
+       if (!isleaf)
+       {
+               PostingItem *pitem = GinDataPageGetPostingItem(lpage, off);
+               PostingItemSetBlockNumber(pitem, updateblkno);
+       }
Should the update of a downlink to point to the next page be a
separate function?
2) post recovery cleanup:
- OK, so roughly the soul of this patch is to change the update
mechanism for a left child gin page so as the parent split is always
done first before any new data is inserted in this child. And this
ensures that we can remove the xlog cleanup mechanism for gin page
splits in the same fashion as gist... xlog redo mechanism is then
adapted according to that.
- Compilation fails becausze the flags GIN_SPLIT_ISLEAF and
GIN_SPLIT_ISDATA are missing in gin_private.h. The patch attached
fixes that though.
- With my additional patch, it passes make check, compilation shows no warnings.
- I did some tests with the patch:
-- Index creation time
vanilla: 3266.834
with the two patches: 3412.473 ms
-- Tried the new redo mechanism by simulating some server failures a
couple of times and saw no failures
- I am seeing similar run times for queries like the example used in
previous emails of this thread. So no problem on this side.
- And... Here are some comments about the code:
2-1) In ginFindParents, is the case where the stack has no parent
possible (aka the stack is the root itself)? Shouldn't this code path
check if root is NULL or not?
2-2) Not sure that this structure is in-line with the project policy:
struct
{
       BlockNumber left;
       BlockNumber right;
} children;
Why not adding a complementary structure in gin_private.h doing that?
It could be used as well in ginxlogSplit to specify a left/right
family of block numbers.
2-3) s/kepy/kept (comments of ginFinishSplit)
Other than that, the patch looks great. I particularly like the new
redo logic in ginxlog.c, the code is more easily understandable with
the split redo removed. Even if I am just a noob for this code, it is
nicely built and structured.

Regards,
--
Michael

Attachments:

0003-gin-flag-fix.patchtext/plain; charset=US-ASCII; name=0003-gin-flag-fix.patchDownload
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index d27ff7d..382a23c 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -48,7 +48,9 @@ typedef GinPageOpaqueData *GinPageOpaque;
 #define GIN_META		  (1 << 3)
 #define GIN_LIST		  (1 << 4)
 #define GIN_LIST_FULLROW  (1 << 5)		/* makes sense only on GIN_LIST page */
-#define GIN_INCOMPLETE_SPLIT (1 << 6)	/* page was split, but parent not updated */
+#define GIN_SPLIT_ISLEAF  (1 << 6)
+#define GIN_SPLIT_ISDATA  (1 << 7)
+#define GIN_INCOMPLETE_SPLIT (1 << 8)	/* page was split, but parent not updated */
 
 /* Page numbers of fixed-location pages */
 #define GIN_METAPAGE_BLKNO	(0)
#7Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Michael Paquier (#6)
Re: Handling GIN incomplete splits

On 11/22/13 15:04, Michael Paquier wrote:

On Thu, Nov 21, 2013 at 9:44 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Here's a new version. To ease the review, I split the remaining patch again
into two, where the first patch is just yet more refactoring.

I also fixed some bugs in WAL logging and replay that I bumped into while
testing.

Cool. Here is the review of the two remaining patches.
1) More refactoring, general remarks:
- Code compiles without warnings
- Passes make check
- If I got it correctly... this patch separates the part managing data
to be inserted from ginbtree. I can understand the meaning behind that
if we consider that GinBtree is used only to define methods and search
conditions (flags and keys). However I am wondering if this does not
make the code more complicated...

Well, I think it's an improvement in readability to separate the
insertion payload from the search information. With the second patch it
becomes more important, because if an incompletely split page is
encountered while you're inserting a value, you needs to insert the
downlink for the old incomplete split first, and then continue the
insertion of the original vaule where you left. That "context switching"
becomes a lot easier when value you're inserting is passed around
separately.

Particularly the flag isDelete that
is only passed to ginxlogInsert meritates its comment IMO. Note that I
haven't read the 2nd patch when writing that :)

Yeah, that gets removed in the second patch, which changes the WAL
record format. :-)

1-2) Could it be possible to change the variable name of
"GinBtreeEntryInsertData *entry" in entryIsEnoughSpace? entry->entry
is kind of hard to apprehend... Renaming it to either insertEntry.
Another idea would be to rename entry in GinBtreeEntryInsertData to
entryData or entryTuple.

ok, renamed the variable to insertData.

1-3) This block of code is present two times:
+       if (!isleaf)
+       {
+               PostingItem *pitem = GinDataPageGetPostingItem(lpage, off);
+               PostingItemSetBlockNumber(pitem, updateblkno);
+       }
Should the update of a downlink to point to the next page be a
separate function?

Doesn't seem any better to me.

Thanks, committed this refactoring patch now. Will continue on the main
patch.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: Handling GIN incomplete splits

On Wed, Nov 13, 2013 at 8:49 AM, Heikki Linnakangas <hlinnakangas@vmware.com

wrote:

Here's another part of my crusade against xlog cleanup routines. This
series of patches gets rid of the gin_cleanup() function, which is
currently used to finish splits of GIN b-tree pages, if the system crashes
(or an error occurs) between splitting a page and inserting its downlink to
the parent.

The first three patches just move code around. IMHO they make the code
more readable, so they should be committed in any case. The meat is in the
fourth patch.

Thoughts, objections?

Alexander, I'm sorry if this conflicts with your GIN patches. Feel free to
post the latest versions of your patches against the current master,
ignoring patches. I can fix the bitrot. That said, I think these
refactorings will make your code look a little bit nicer too, so you might
want to rebase because of that anyway.

Hi Heikki,

The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the
LWLock code during the operation below, with it trying to take
an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
same lockid.

CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
WITH (FASTUPDATE=OFF);

It happens pretty reliably using osm2pgsql.

I will try to come up with a simple reproducible demonstration, and stack
trace, over the weekend.

Cheers,

Jeff

#9Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Michael Paquier (#6)
Re: Handling GIN incomplete splits

On 11/22/13 15:04, Michael Paquier wrote:

2) post recovery cleanup:
- OK, so roughly the soul of this patch is to change the update
mechanism for a left child gin page so as the parent split is always
done first before any new data is inserted in this child. And this
ensures that we can remove the xlog cleanup mechanism for gin page
splits in the same fashion as gist... xlog redo mechanism is then
adapted according to that.

- I did some tests with the patch:
-- Index creation time
vanilla: 3266.834
with the two patches: 3412.473 ms

Hmm. I didn't expect any change in index creation time. Is that
repeatable, or within the margin of error?

2-1) In ginFindParents, is the case where the stack has no parent
possible (aka the stack is the root itself)? Shouldn't this code path
check if root is NULL or not?

No. A root split doesn't need to find the parent (because there isn't
any), so we never call ginFindParents with a stack containing just the
root. If you look at the only caller of ginFindParents, it's quite clear
that stack->parent always exists.

2-2) Not sure that this structure is in-line with the project policy:
struct
{
BlockNumber left;
BlockNumber right;
} children;
Why not adding a complementary structure in gin_private.h doing that?
It could be used as well in ginxlogSplit to specify a left/right
family of block numbers.

I don't think we have a project policy against in-line structs. Might be
better style to not do it, anyway, though.

Come to think of it, that children variable mustn't be declared inside
the if-statement; it's no longer in scope when the XLogInsert was done,
so the &children was pointing to potentially uninitialized piece of
stack. Valgrind would've complained.

I ended up using BlockIdData[2] for that.

Committed, thanks for the review!

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#8)
Re: Handling GIN incomplete splits

On Wed, Nov 27, 2013 at 9:40 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Nov 13, 2013 at 8:49 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

Here's another part of my crusade against xlog cleanup routines. This
series of patches gets rid of the gin_cleanup() function, which is
currently used to finish splits of GIN b-tree pages, if the system crashes
(or an error occurs) between splitting a page and inserting its downlink to
the parent.

The first three patches just move code around. IMHO they make the code
more readable, so they should be committed in any case. The meat is in the
fourth patch.

Thoughts, objections?

Alexander, I'm sorry if this conflicts with your GIN patches. Feel free
to post the latest versions of your patches against the current master,
ignoring patches. I can fix the bitrot. That said, I think these
refactorings will make your code look a little bit nicer too, so you might
want to rebase because of that anyway.

Hi Heikki,

The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the
LWLock code during the operation below, with it trying to take
an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
same lockid.

CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
WITH (FASTUPDATE=OFF);

It happens pretty reliably using osm2pgsql.

I will try to come up with a simple reproducible demonstration, and stack
trace, over the weekend.

Whatever the problem, it seems to have been fixed in ce5326eed386959aa,
"More GIN refactoring".

Cheers,

Jeff

#11Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Jeff Janes (#10)
Re: Handling GIN incomplete splits

On 12/01/2013 10:40 PM, Jeff Janes wrote:

On Wed, Nov 27, 2013 at 9:40 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the
LWLock code during the operation below, with it trying to take
an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
same lockid.

CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
WITH (FASTUPDATE=OFF);

It happens pretty reliably using osm2pgsql.

I will try to come up with a simple reproducible demonstration, and stack
trace, over the weekend.

Whatever the problem, it seems to have been fixed in ce5326eed386959aa,
"More GIN refactoring".

That's good, I guess :-). Thanks for the testing. Did you import the
full planet.osm? I tried with a subset containing just Finland, but
didn't see any problems.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Heikki Linnakangas (#11)
Re: Handling GIN incomplete splits

On Mon, Dec 2, 2013 at 1:26 AM, Heikki Linnakangas
<hlinnakangas@vmware.com>wrote:

On 12/01/2013 10:40 PM, Jeff Janes wrote:

On Wed, Nov 27, 2013 at 9:40 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in

the
LWLock code during the operation below, with it trying to take
an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
same lockid.

CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
WITH (FASTUPDATE=OFF);

It happens pretty reliably using osm2pgsql.

I will try to come up with a simple reproducible demonstration, and stack
trace, over the weekend.

Whatever the problem, it seems to have been fixed in ce5326eed386959aa,
"More GIN refactoring".

That's good, I guess :-). Thanks for the testing. Did you import the full
planet.osm? I tried with a subset containing just Finland, but didn't see
any problems.

I used Antarctica. I don't have the RAM to process the full planet, or the
bandwidth to download it very easily.

Do you think it is worth chasing down where the problem was, to make sure
it was truly fixed rather than simply changed in a way that happens not to
trigger any more in this situation?

Cheers,

Jeff