Review: GIN non-intrusive vacuum of posting tree

Started by Jeff Davisalmost 9 years ago24 messages
#1Jeff Davis
pgsql@j-davis.com

/messages/by-id/CAJEAwVE4UAmm8fr+NW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw@mail.gmail.com

Let me re-summarize what's been done here to make sure I understand:

Each key in GIN has its own posting tree, which is itself a btree,
holding all of the tuples that contain that key. This posting tree is
really just a set of tuples, and searches always scan an entire
posting tree (because all the tuples contain the key, so all are a
potential match).

Currently, the cleanup process requires blocking all reads and writes
in the posting list. I assume this is only a problem when a key is
common enough that its posting list is quite large (how large?).

This patch makes a couple changes to improve concurrency:

1. Vacuum leaves first, without removing any pages. Instead, just keep
track of pages which are completely empty (more generally, subtrees
that contain empty pages).

2. Free the empty pages in those subtrees. That requires blocking
reads and writes to that subtree, but not the entire posting list.
Blocking writes is easy: acquiring a cleanup lock on the root of any
subtree always blocks any writes to that subtree, because the write
keeps a pin on all pages in that path. Blocking reads is accomplished
by locking the page to be deleted, then locking/unlocking the page one
left of that (which may be outside the subtree?).

Maybe a basic question, but why is the posting list a btree at all,
and not, say a doubly-linked list?

Review of the code itself:

* Nice and simple, with a net line count of only +45.
* Remove commented-out code.
* In the README, "leaf-to-right" should be left-to-right; and "takinf"
should be "taking".
* When you say the leftmost page is never deleted; do you mean the
leftmost of the entire posting tree, or leftmost of the subtree that
you are removing pages from?
* In order to keep out concurrent reads, you need to lock/unlock the
left page while holding exclusive lock on the page being deleted, but
I didn't see how that happens exactly in the code. Where does that
happen?

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andrew Borodin
borodin@octonica.com
In reply to: Jeff Davis (#1)
1 attachment(s)
Re: Review: GIN non-intrusive vacuum of posting tree

Hi, Jeff!

Thanks for review!

Here's a patch corrected according to your suggestions.

2017-01-19 11:48 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

/messages/by-id/CAJEAwVE4UAmm8fr+NW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw@mail.gmail.com

Let me re-summarize what's been done here to make sure I understand:

Each key in GIN has its own posting tree, which is itself a btree,
holding all of the tuples that contain that key. This posting tree is
really just a set of tuples, and searches always scan an entire
posting tree (because all the tuples contain the key, so all are a
potential match).

Currently, the cleanup process requires blocking all reads and writes
in the posting list. I assume this is only a problem when a key is
common enough that its posting list is quite large (how large?).

The posting tree shall be at least 3 levels high to make this patch
useful. I guess it's at least 10000 postings of a single term
(assuming average hundred of tuples per page).

This patch makes a couple changes to improve concurrency:

1. Vacuum leaves first, without removing any pages. Instead, just keep
track of pages which are completely empty (more generally, subtrees
that contain empty pages).

2. Free the empty pages in those subtrees. That requires blocking
reads and writes to that subtree, but not the entire posting list.
Blocking writes is easy: acquiring a cleanup lock on the root of any
subtree always blocks any writes to that subtree, because the write
keeps a pin on all pages in that path. Blocking reads is accomplished
by locking the page to be deleted, then locking/unlocking the page one
left of that (which may be outside the subtree?).

No, leftmost page within tree. It may be referenced by rightlink from
outside the subtree, that's the reason we ceep it alive even if it's
empy.

Maybe a basic question, but why is the posting list a btree at all,
and not, say a doubly-linked list?

When GIN executes searches usually it have to fetch documents which
contains intersection of posting lists. It is faster to intersect
B-trees, if common elements are rare.

Review of the code itself:

* Nice and simple, with a net line count of only +45.
* Remove commented-out code.

I've removed the code. Though it's purpose was to accent changed
tricky concurrency places

* In the README, "leaf-to-right" should be left-to-right; and "takinf"
should be "taking".

Fixed

* When you say the leftmost page is never deleted; do you mean the
leftmost of the entire posting tree, or leftmost of the subtree that
you are removing pages from?

Leftmost of a subtree, containing totally empty subtree. It's not
neccesarily leftmost page of whole posting tree.

* In order to keep out concurrent reads, you need to lock/unlock the
left page while holding exclusive lock on the page being deleted, but
I didn't see how that happens exactly in the code. Where does that
happen?

in function ginDeletePage()
LockBuffer(lBuffer, GIN_EXCLUSIVE);
We have to mark this page dirty, since it's rightlink is changed. We
cannot mark it dirty without locking it, even if we surely know that
no any reader can reach it out (due to cleanup lock at the root of
cleaned subtree).

Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.

Best regards, Andrey Borodin.

Attachments:

gin_nonintrusive_vacuum_v2.difftext/plain; charset=US-ASCII; name=gin_nonintrusive_vacuum_v2.diffDownload
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index fade0cb..990b5ff 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -314,10 +314,17 @@ deleted.
 The previous paragraph's reasoning only applies to searches, and only to
 posting trees. To protect from inserters following a downlink to a deleted
 page, vacuum simply locks out all concurrent insertions to the posting tree,
-by holding a super-exclusive lock on the posting tree root. Inserters hold a
-pin on the root page, but searches do not, so while new searches cannot begin
-while root page is locked, any already-in-progress scans can continue
-concurrently with vacuum. In the entry tree, we never delete pages.
+by holding a super-exclusive lock on the parent page of subtree with deletable
+pages. Inserters hold a pin on the root page, but searches do not, so while
+new searches cannot begin while root page is locked, any already-in-progress
+scans can continue concurrently with vacuum in corresponding subtree of
+posting tree. To exclude interference with readers vacuum takes exclusive
+locks in a depth-first scan in left-to-right order of page tuples. Leftmost
+page is never deleted. Thus before deleting any page we obtain exclusive
+lock on any left page, effectively excluding deadlock with any reader, despite
+taking parent lock before current and left lock after current. We take left
+lock not for a concurrency reasons, but rather in need to mark page dirty.
+In the entry tree, we never delete pages.
 
 (This is quite different from the mechanism the btree indexam uses to make
 page-deletions safe; it stamps the deleted pages with an XID and keeps the
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index a0afec4..dc28c81 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -30,7 +30,7 @@ static void ginFinishSplit(GinBtree btree, GinBtreeStack *stack,
 /*
  * Lock buffer by needed method for search.
  */
-static int
+int
 ginTraverseLock(Buffer buffer, bool searchMode)
 {
 	Page		page;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 2685a1c..b3ebd95 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -108,75 +108,17 @@ xlogVacuumPage(Relation index, Buffer buffer)
 	PageSetLSN(page, recptr);
 }
 
-static bool
-ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot, Buffer *rootBuffer)
-{
-	Buffer		buffer;
-	Page		page;
-	bool		hasVoidPage = FALSE;
-	MemoryContext oldCxt;
-
-	buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
-								RBM_NORMAL, gvs->strategy);
-	page = BufferGetPage(buffer);
-
-	/*
-	 * We should be sure that we don't concurrent with inserts, insert process
-	 * never release root page until end (but it can unlock it and lock
-	 * again). New scan can't start but previously started ones work
-	 * concurrently.
-	 */
-	if (isRoot)
-		LockBufferForCleanup(buffer);
-	else
-		LockBuffer(buffer, GIN_EXCLUSIVE);
-
-	Assert(GinPageIsData(page));
-
-	if (GinPageIsLeaf(page))
-	{
-		oldCxt = MemoryContextSwitchTo(gvs->tmpCxt);
-		ginVacuumPostingTreeLeaf(gvs->index, buffer, gvs);
-		MemoryContextSwitchTo(oldCxt);
-		MemoryContextReset(gvs->tmpCxt);
-
-		/* if root is a leaf page, we don't desire further processing */
-		if (!isRoot && !hasVoidPage && GinDataLeafPageIsEmpty(page))
-			hasVoidPage = TRUE;
-	}
-	else
-	{
-		OffsetNumber i;
-		bool		isChildHasVoid = FALSE;
-
-		for (i = FirstOffsetNumber; i <= GinPageGetOpaque(page)->maxoff; i++)
-		{
-			PostingItem *pitem = GinDataPageGetPostingItem(page, i);
 
-			if (ginVacuumPostingTreeLeaves(gvs, PostingItemGetBlockNumber(pitem), FALSE, NULL))
-				isChildHasVoid = TRUE;
-		}
-
-		if (isChildHasVoid)
-			hasVoidPage = TRUE;
-	}
+typedef struct DataPageDeleteStack
+{
+	struct DataPageDeleteStack *child;
+	struct DataPageDeleteStack *parent;
 
-	/*
-	 * if we have root and there are empty pages in tree, then we don't
-	 * release lock to go further processing and guarantee that tree is unused
-	 */
-	if (!(isRoot && hasVoidPage))
-	{
-		UnlockReleaseBuffer(buffer);
-	}
-	else
-	{
-		Assert(rootBuffer);
-		*rootBuffer = buffer;
-	}
+	BlockNumber blkno;			/* current block number */
+	BlockNumber leftBlkno;		/* rightest non-deleted page on left */
+	bool		isRoot;
+} DataPageDeleteStack;
 
-	return hasVoidPage;
-}
 
 /*
  * Delete a posting tree page.
@@ -193,8 +135,13 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 	BlockNumber rightlink;
 
 	/*
-	 * Lock the pages in the same order as an insertion would, to avoid
-	 * deadlocks: left, then right, then parent.
+	 * This function MUST be called only if someone of parent pages hold
+	 * exclusive cleanup lock. This guarantees that no insertions currently
+	 * happen in this subtree. Caller also acquire Exclusive lock on deletable
+	 * page and is acquiring and releasing exclusive lock on left page before.
+	 * Left page was locked and released. Then parent and this page are locked.
+	 * We acquire left page lock here only to mark page dirty after changing
+	 * right pointer.
 	 */
 	lBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, leftBlkno,
 								 RBM_NORMAL, gvs->strategy);
@@ -204,10 +151,6 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 								 RBM_NORMAL, gvs->strategy);
 
 	LockBuffer(lBuffer, GIN_EXCLUSIVE);
-	LockBuffer(dBuffer, GIN_EXCLUSIVE);
-	if (!isParentRoot)			/* parent is already locked by
-								 * LockBufferForCleanup() */
-		LockBuffer(pBuffer, GIN_EXCLUSIVE);
 
 	START_CRIT_SECTION();
 
@@ -271,26 +214,15 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 		PageSetLSN(BufferGetPage(lBuffer), recptr);
 	}
 
-	if (!isParentRoot)
-		LockBuffer(pBuffer, GIN_UNLOCK);
 	ReleaseBuffer(pBuffer);
 	UnlockReleaseBuffer(lBuffer);
-	UnlockReleaseBuffer(dBuffer);
+	ReleaseBuffer(dBuffer);
 
 	END_CRIT_SECTION();
 
 	gvs->result->pages_deleted++;
 }
 
-typedef struct DataPageDeleteStack
-{
-	struct DataPageDeleteStack *child;
-	struct DataPageDeleteStack *parent;
-
-	BlockNumber blkno;			/* current block number */
-	BlockNumber leftBlkno;		/* rightest non-deleted page on left */
-	bool		isRoot;
-} DataPageDeleteStack;
 
 /*
  * scans posting tree and deletes empty pages
@@ -324,6 +256,10 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
 
 	buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
 								RBM_NORMAL, gvs->strategy);
+
+	if(!isRoot)
+		LockBuffer(buffer, GIN_EXCLUSIVE);
+
 	page = BufferGetPage(buffer);
 
 	Assert(GinPageIsData(page));
@@ -358,6 +294,9 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
 		}
 	}
 
+	if(!isRoot)
+			LockBuffer(buffer, GIN_UNLOCK);
+
 	ReleaseBuffer(buffer);
 
 	if (!meDelete)
@@ -366,37 +305,124 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
 	return meDelete;
 }
 
-static void
-ginVacuumPostingTree(GinVacuumState *gvs, BlockNumber rootBlkno)
+
+/*
+ * Scan through posting tree, delete empty tuples from leaf pages.
+ * Also, this function collects empty subtrees (with all empty leafs).
+ * For parents of these subtrees CleanUp lock is taken, then we call
+ * ScanToDelete. This is done for every inner page, which points to
+ * empty subtree.
+ */
+static bool
+ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot)
 {
-	Buffer		rootBuffer = InvalidBuffer;
-	DataPageDeleteStack root,
-			   *ptr,
-			   *tmp;
+	Buffer		buffer;
+	Page		page;
+	bool		hasVoidPage = FALSE;
+	MemoryContext oldCxt;
 
-	if (ginVacuumPostingTreeLeaves(gvs, rootBlkno, TRUE, &rootBuffer) == FALSE)
+	buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
+								RBM_NORMAL, gvs->strategy);
+	page = BufferGetPage(buffer);
+
+	ginTraverseLock(buffer,false);
+
+	Assert(GinPageIsData(page));
+
+	if (GinPageIsLeaf(page))
 	{
-		Assert(rootBuffer == InvalidBuffer);
-		return;
+		oldCxt = MemoryContextSwitchTo(gvs->tmpCxt);
+		ginVacuumPostingTreeLeaf(gvs->index, buffer, gvs);
+		MemoryContextSwitchTo(oldCxt);
+		MemoryContextReset(gvs->tmpCxt);
+
+		/* if root is a leaf page, we don't desire further processing */
+		if (GinDataLeafPageIsEmpty(page))
+			hasVoidPage = TRUE;
+
+		UnlockReleaseBuffer(buffer);
+
+		return hasVoidPage;
 	}
+	else
+	{
+		OffsetNumber 	i;
+		bool			isChildHasVoid = FALSE;
+		bool			isAnyNonempy = FALSE;
+		OffsetNumber	maxoff = GinPageGetOpaque(page)->maxoff;
+		BlockNumber*	children = palloc(sizeof(BlockNumber) * (maxoff + 1));
 
-	memset(&root, 0, sizeof(DataPageDeleteStack));
-	root.leftBlkno = InvalidBlockNumber;
-	root.isRoot = TRUE;
+		/*
+		 * Read all children BlockNumbers.
+		 * Not sure it is safe if there are many concurrent vacuums.
+		 */
 
-	vacuum_delay_point();
+		for (i = FirstOffsetNumber; i <= maxoff; i++)
+		{
+			PostingItem *pitem = GinDataPageGetPostingItem(page, i);
 
-	ginScanToDelete(gvs, rootBlkno, TRUE, &root, InvalidOffsetNumber);
+			children[i] = PostingItemGetBlockNumber(pitem);
+		}
 
-	ptr = root.child;
-	while (ptr)
-	{
-		tmp = ptr->child;
-		pfree(ptr);
-		ptr = tmp;
+		UnlockReleaseBuffer(buffer);
+
+		for (i = FirstOffsetNumber; i <= maxoff; i++)
+		{
+			if (ginVacuumPostingTreeLeaves(gvs, children[i], FALSE))
+				isChildHasVoid = TRUE;
+			else
+				isAnyNonempy = TRUE;
+		}
+
+		pfree(children);
+
+		vacuum_delay_point();
+
+		/*
+		 * All subtree is empty - just return TRUE to indicate that parent must
+		 * do a cleanup. Unless we are ROOT an there is way to go upper.
+		 */
+
+		if(isChildHasVoid && !isAnyNonempy && !isRoot)
+			return TRUE;
+
+		if(isChildHasVoid)
+		{
+			DataPageDeleteStack root,
+					   *ptr,
+					   *tmp;
+
+			buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
+											RBM_NORMAL, gvs->strategy);
+			LockBufferForCleanup(buffer);
+
+			memset(&root, 0, sizeof(DataPageDeleteStack));
+				root.leftBlkno = InvalidBlockNumber;
+				root.isRoot = TRUE;
+
+			ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber);
+
+			ptr = root.child;
+
+			while (ptr)
+			{
+				tmp = ptr->child;
+				pfree(ptr);
+				ptr = tmp;
+			}
+
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Here we have deleted all empty subtrees */
+		return FALSE;
 	}
+}
 
-	UnlockReleaseBuffer(rootBuffer);
+static void
+ginVacuumPostingTree(GinVacuumState *gvs, BlockNumber rootBlkno)
+{
+	ginVacuumPostingTreeLeaves(gvs, rootBlkno, TRUE);
 }
 
 /*
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index bf589ab..b738f47 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -981,4 +981,6 @@ ginCompareItemPointers(ItemPointer a, ItemPointer b)
 		return -1;
 }
 
+extern int ginTraverseLock(Buffer buffer, bool searchMode);
+
 #endif   /* GIN_PRIVATE_H */
#3Andrew Borodin
borodin@octonica.com
In reply to: Jeff Davis (#1)
Re: Review: GIN non-intrusive vacuum of posting tree

Hello Jeff!

Review of the code itself:

How do you think, is there anything else to improve in that patch or
we can mark it as 'Ready for committer'?

I've checked the concurrency algorithm with original work of Lehman
and Yao on B-link-tree. For me everything looks fine (safe, deadlock
free and not increased probability of livelock due to
LockBufferForCleanup call).

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jeff Davis
pgsql@j-davis.com
In reply to: Andrew Borodin (#3)
Re: Review: GIN non-intrusive vacuum of posting tree

On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:

Hello Jeff!

Review of the code itself:

How do you think, is there anything else to improve in that patch or
we can mark it as 'Ready for committer'?

I've checked the concurrency algorithm with original work of Lehman
and Yao on B-link-tree. For me everything looks fine (safe, deadlock
free and not increased probability of livelock due to
LockBufferForCleanup call).

Hi,

I looked at the patch some more.

I have some reservations about the complexity and novelty of the
approach. I don't think I've seen an approach quite like this one
before. It seems like the main reason you are using this approach is
because the btree structure was too simple (no left links); is that
right? My gut is telling me we should either leave it simple, or use a
real concurrent btree implementation.

One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

Another idea is to partition the posting tree so that the root page
actually points to K posting trees. Scans would need to merge them,
but it would allow inserts to progress in one tree while the other is
being vacuumed. I think this would give good concurrency even for K=2.
I just had this idea now, so I didn't think it through very well.

What do you think?

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Borodin
borodin@octonica.com
In reply to: Jeff Davis (#4)
Re: Review: GIN non-intrusive vacuum of posting tree

Hi!

2017-01-23 11:32 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

I have some reservations about the complexity and novelty of the
approach. I don't think I've seen an approach quite like this one
before. It seems like the main reason you are using this approach is
because the btree structure was too simple (no left links); is that
right? My gut is telling me we should either leave it simple, or use a
real concurrent btree implementation.

GIN B-tree is departed from backend\access\nbtree almost like nbtree
is departed from L&Y algorithm. As far as I understand there is no
"high key" concept as in nbtree. I'm not sure that it's possible to
implement same level of cuncurrency as in nbtree without that.
Also GIN B-tree is actually two different B-trees, large portions of
code is shared between Entries tree and Postings tree. I'm not sure
going fully concurrent vacuum worth it, there will be a lot of
changes. And then there is compression...installing high keys into GIN
may be a mess, with pg_upgrade.

Proposed patch, on a contrary, simplifies code. There is more code
deleted than added. It does not affect WAL, changes nothing in page
layout.

One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

As far as I recall, this is resemplant to Lenin and Shasha algorithm.
I'll look into it, but I think it relies on concept of "high key" on a
page somehow (high key is the key from a sibling page, not local
rightmost key as GIN B-tree uses).

Another idea is to partition the posting tree so that the root page
actually points to K posting trees. Scans would need to merge them,
but it would allow inserts to progress in one tree while the other is
being vacuumed. I think this would give good concurrency even for K=2.
I just had this idea now, so I didn't think it through very well.

This is efficient from a point of view of frequent vacuums.
concurrency of same key inserts can benefit from this approach. But
all aother operations are more complicated and less efficient.
GIN already has Fast Update list to tackle problems in this manner.
But I do not feel I have resources to try to implement it...

Thank you for your considerations. I'll think about concurrency and
"high keys" more, but I'm realy afraid of amount of changes which may
be prerequisites.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Jeff Davis
pgsql@j-davis.com
In reply to: Andrew Borodin (#5)
Re: Review: GIN non-intrusive vacuum of posting tree

On Mon, Jan 23, 2017 at 1:55 AM, Andrew Borodin <borodin@octonica.com> wrote:

Proposed patch, on a contrary, simplifies code. There is more code
deleted than added. It does not affect WAL, changes nothing in page
layout.

I appreciate that it is fewer lines of code. My hesitation comes from
a couple areas:

1. I haven't seen the approach before of finding the parent of any
empty subtree. While that sounds like a reasonable thing to do, it
worries me to invent new approaches in an area as well-studied as
btrees. The problem is less that it's too complex, and more that it's
different. Future developers looking at this code will expect it to be
either very simple or very standard, because it's just a posting list.

2. It relies on searches to only start from the very leftmost page
(correct?). If someone wants to optimize the intersection of two
posting trees later, they might break it if they don't understand the
assumptions in this patch.

3. This adds a heap allocation, and copies the contents of the page,
and then releases the pin and lock. GinStepRight is careful to avoid
doing that (it locks the target before releasing the page containing
the pointer). I don't see a problem in your patch, but again, we are
breaking an assumption that future developers might make.

Your patch solves a real problem (a 90-second stall is clearly not
good) and I don't want to dismiss that. But I'd like to consider some
alternatives that may not have these downsides.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andrew Borodin
borodin@octonica.com
In reply to: Jeff Davis (#6)
Re: Review: GIN non-intrusive vacuum of posting tree

Hello!

I see your point on alternatives. I will do my best to generate more
ideas on how vacuum can be done withing existing index structure, this
will take a day or two.
In this message I'll answer some questions.

2017-01-23 22:45 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

1. I haven't seen the approach before of finding the parent of any
empty subtree. While that sounds like a reasonable thing to do, it
worries me to invent new approaches in an area as well-studied as
btrees. The problem is less that it's too complex, and more that it's
different. Future developers looking at this code will expect it to be
either very simple or very standard, because it's just a posting list.

Technically, approach of locking a subtree is not novel. Lehman and
Yao focused on "that any process for manipulating the tree uses only a
small (constant) number of locks at any time." We are locking unknown
and possibly large amount of pages.

2. It relies on searches to only start from the very leftmost page
(correct?). If someone wants to optimize the intersection of two
posting trees later, they might break it if they don't understand the
assumptions in this patch.

Searches start from root, to find a correct position.
But cuncurrency of the patch does not depend on that.
Patch relies on the idea, that at any given subtree, if:
1. There is no inserts within subtree (guaranteed by cleanup lock)
2. Every page was locked with Exclusive lock at least once, locks were
taken from left to right, without lock coupling.
than: there is no read searches within the subtree.

3. This adds a heap allocation, and copies the contents of the page,
and then releases the pin and lock. GinStepRight is careful to avoid
doing that (it locks the target before releasing the page containing
the pointer). I don't see a problem in your patch, but again, we are
breaking an assumption that future developers might make.

Yes, heap allocation worries me a lot too, but let me explain details.

GinStepRight - is the place where lock coupling is done. That is, you
lock rightlink before releasing already locked page.
Generally, B-tree tends to lock just one page at a time, but this
place is an important exception (not the only exception, but others
behave similarly).

Proposed patch is doing tree traversal at worst two times.

First: scan trough the tree, lock internal pages for read, lock leafs
for write, delete dead tuples. If empty subtrees are found, invoke for
them second stage.
Here children Block Numbers are stored in palloc`d array, to do this
scan with minimum of locks. This is unacceptable if there may be
concurrent VACUUM, which can delete a page when we are asleep for some
reason. (My point of worring number 1)

Second: this stage recives subtree containing a lot of pages which
must be just deleted. Here we take cleanup lock on subtree root,
ensuring no inserts are within (they hold pins on stack from root to
leaf). Than we take exclusive locks on all pages from left to right,
without lock coupling. When we find empty page, we lock left page too,
thus doing lock coupling form right to left. If search might be wihtin
a subtree, it could deadlock with us. (My point of worring number 2)

But both points seems to have reasonable explanation why it cannot happen.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jeff Davis
pgsql@j-davis.com
In reply to: Andrew Borodin (#7)
Re: Review: GIN non-intrusive vacuum of posting tree

On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin <borodin@octonica.com> wrote:

Technically, approach of locking a subtree is not novel. Lehman and
Yao focused on "that any process for manipulating the tree uses only a
small (constant) number of locks at any time." We are locking unknown
and possibly large amount of pages.

By the way, can you show me where the Lehman and Yao paper addresses
page recycling?

It says that one approach is to allow fewer than K entries on a leaf
node; presumably as few as zero. But it doesn't seem to show how to
remove all references to the page and recycle it in a new place in the
tree.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andrew Borodin
borodin@octonica.com
In reply to: Jeff Davis (#8)
Re: Review: GIN non-intrusive vacuum of posting tree

2017-01-24 22:29 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin <borodin@octonica.com> wrote:

Technically, approach of locking a subtree is not novel. Lehman and
Yao focused on "that any process for manipulating the tree uses only a
small (constant) number of locks at any time." We are locking unknown
and possibly large amount of pages.

By the way, can you show me where the Lehman and Yao paper addresses
page recycling?

It says that one approach is to allow fewer than K entries on a leaf
node; presumably as few as zero. But it doesn't seem to show how to
remove all references to the page and recycle it in a new place in the
tree.

Regards,
Jeff Davis

Here J. Hellerstein comments L&Y paper [1]http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html saying that effectively
there is no deletion algorithm provided.
Here [2]http://dl.acm.org/citation.cfm?id=324589 is paper referenced from nbtree docs. I'll check if this is
applicable to GIN B-tree.

[1]: http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html
[2]: http://dl.acm.org/citation.cfm?id=324589

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrew Borodin
borodin@octonica.com
In reply to: Andrew Borodin (#9)
Re: Review: GIN non-intrusive vacuum of posting tree

2017-01-25 12:09 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

2017-01-24 22:29 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

By the way, can you show me where the Lehman and Yao paper addresses
page recycling?

Here J. Hellerstein comments L&Y paper [1] saying that effectively
there is no deletion algorithm provided.
Here [2] is paper referenced from nbtree docs. I'll check if this is
applicable to GIN B-tree.

[1] http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html
[2] http://dl.acm.org/citation.cfm?id=324589

Hi!

I'll summarize here my state of studying concurrent methods of page unlinking.

GIN B-tree does not have "high key". That means, that rightmost key on
a page is maximal for the non-leaf page.
But I do not see anything theoretical in a way of implementation of
Lanin and Shasha`s methods of page merging, with slight modifications.
Their paper does not even mention high key(high fence key in papers by
Goetz Graefe).

But it's not a simple task due to large portions of shared code
between entry tree and posting tree.

Also, I do not see a reason why this method can be practically
superior to proposed patch.

Currently, I do not have resources to implement a proof of concept for
fully concurrent page unlinking to make benchmarking.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Michael Paquier
michael.paquier@gmail.com
In reply to: Andrew Borodin (#10)
Re: Review: GIN non-intrusive vacuum of posting tree

On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin <borodin@octonica.com> wrote:

I'll summarize here my state of studying concurrent methods of page unlinking.

GIN B-tree does not have "high key". That means, that rightmost key on
a page is maximal for the non-leaf page.
But I do not see anything theoretical in a way of implementation of
Lanin and Shasha`s methods of page merging, with slight modifications.
Their paper does not even mention high key(high fence key in papers by
Goetz Graefe).

But it's not a simple task due to large portions of shared code
between entry tree and posting tree.

Also, I do not see a reason why this method can be practically
superior to proposed patch.

Currently, I do not have resources to implement a proof of concept for
fully concurrent page unlinking to make benchmarking.

I am marking this patch as returned with feedback.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Vladimir Borodin
root@simply.name
In reply to: Michael Paquier (#11)
Re: Review: GIN non-intrusive vacuum of posting tree

31 янв. 2017 г., в 9:50, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin <borodin@octonica.com> wrote:

I'll summarize here my state of studying concurrent methods of page unlinking.

GIN B-tree does not have "high key". That means, that rightmost key on
a page is maximal for the non-leaf page.
But I do not see anything theoretical in a way of implementation of
Lanin and Shasha`s methods of page merging, with slight modifications.
Their paper does not even mention high key(high fence key in papers by
Goetz Graefe).

But it's not a simple task due to large portions of shared code
between entry tree and posting tree.

Also, I do not see a reason why this method can be practically
superior to proposed patch.

Currently, I do not have resources to implement a proof of concept for
fully concurrent page unlinking to make benchmarking.

I am marking this patch as returned with feedback.

Michael, sorry, but why? If I understood everything right, the main question from Jeff was why is it implemented in such way? And Jeff wanted to see other ways of solving the problem. Andrew wrote about them above and it seems that implementing them would be quite expensive and without any obvious win. I would rather expect some reaction from Jeff or may be someone else, so shouldn’t it be marked as «Ready for committer» or at least «Moved to next CF»?

--
Michael

--
May the force be with you…
https://simply.name

#13Michael Paquier
michael.paquier@gmail.com
In reply to: Vladimir Borodin (#12)
Re: Review: GIN non-intrusive vacuum of posting tree

On Tue, Jan 31, 2017 at 4:31 PM, Vladimir Borodin <root@simply.name> wrote:

31 янв. 2017 г., в 9:50, Michael Paquier <michael.paquier@gmail.com>
написал(а):

I am marking this patch as returned with feedback.

Michael, sorry, but why?

Because I have been through many patches today.

If I understood everything right, the main question
from Jeff was why is it implemented in such way? And Jeff wanted to see
other ways of solving the problem. Andrew wrote about them above and it
seems that implementing them would be quite expensive and without any
obvious win. I would rather expect some reaction from Jeff or may be someone
else, so shouldn’t it be marked as «Ready for committer» or at least «Moved
to next CF»?

I have moved to to the next CF.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#4)
Re: Review: GIN non-intrusive vacuum of posting tree

On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

Do you think this approach is viable as a simplification?

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andrew Borodin
borodin@octonica.com
In reply to: Jeff Davis (#14)
Re: Review: GIN non-intrusive vacuum of posting tree

Hi, Jeff!

2017-02-05 3:45 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

Do you think this approach is viable as a simplification?

To consider this approach I need to ask several questions.

1. Are we discussing simplification of existing GIN vacuum? Or
proposed GIN vacuum? Please, note that they do not differ in the way
page is unlinked, function ginDeletePage() is almost untoched.

2. What do you mean by "stage"? In terms of the paper "A symmetric
concurrent B-tree algorithm" by Lanin&Shasha: stage is an
uninterrupted period of holding lock on nonempty page set.
Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S
Both paper (L&Y and L&S) tend to avoid lock coupling, which is
inevitable if you want to do parent unlink first. To prevent insertion
of records on a page, you have to mark it. If you are doing this in
the stage when you unlink from parent - you have to own both locks.

3. What do you want to simplify? Currently we unlink a page from
parent and left sibling in one stage, at cost of aquiring CleanUp lock
(the way we aquire it - is the diference between current and patched
version).
2-stage algorithms will not be simplier, yet it will be more concurrent.
Please note, that during absence of high fence keys in GIN B-tree we
actually should implement algorithm from figure 3A
https://www.screencast.com/t/2cfGZtrzaz0z (It would be incorrect, but
only with presence of high keys)

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16David Steele
david@pgmasters.net
In reply to: Andrew Borodin (#15)
Re: Review: GIN non-intrusive vacuum of posting tree

On 2/5/17 11:04 AM, Andrew Borodin wrote:

Hi, Jeff!

2017-02-05 3:45 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:

On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

Do you think this approach is viable as a simplification?

To consider this approach I need to ask several questions.

1. Are we discussing simplification of existing GIN vacuum? Or
proposed GIN vacuum? Please, note that they do not differ in the way
page is unlinked, function ginDeletePage() is almost untoched.

2. What do you mean by "stage"? In terms of the paper "A symmetric
concurrent B-tree algorithm" by Lanin&Shasha: stage is an
uninterrupted period of holding lock on nonempty page set.
Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S
Both paper (L&Y and L&S) tend to avoid lock coupling, which is
inevitable if you want to do parent unlink first. To prevent insertion
of records on a page, you have to mark it. If you are doing this in
the stage when you unlink from parent - you have to own both locks.

3. What do you want to simplify? Currently we unlink a page from
parent and left sibling in one stage, at cost of aquiring CleanUp lock
(the way we aquire it - is the diference between current and patched
version).
2-stage algorithms will not be simplier, yet it will be more concurrent.
Please note, that during absence of high fence keys in GIN B-tree we
actually should implement algorithm from figure 3A
https://www.screencast.com/t/2cfGZtrzaz0z (It would be incorrect, but
only with presence of high keys)

This patch applies cleanly and compiles at cccbdde.

Jeff, any thoughts on Andrew's responses?

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andrew Borodin
borodin@octonica.com
In reply to: David Steele (#16)
Re: Review: GIN non-intrusive vacuum of posting tree

2017-03-16 21:27 GMT+05:00 David Steele <david@pgmasters.net>:

This patch applies cleanly and compiles at cccbdde.

Jeff, any thoughts on Andrew's responses?

Hi, David!

I've got some updates on the matter of this patch, since the
understanding of the B-tree bothered me much.
Currently, I'm at PgConf.Russia, where I've contacted Theodor Sigaev,
and he answered my questions about the GIN.
0. I think that proposed patch is safe (deadlock free, does not
introduce new livelocks, all the resources guarded properly)
1. There _are_ high keys at the posting trees, they are just called
rightmost keys, but in fact they are high keys in terms of L&Y
algorithm.
2. Thus, L&S fully concurrent vacuum is possible, indeed, and
furthermore Theodor suggested that I should implement not only page
eviction, but also page merge and tree condence algorithm.
3. Eventually, I'll do that, certainly, but, currently, I can't
predict the time it'll take. I think I'll start somewhere in the
summer, may be right after GiST intrapage indexing.

As for now, I think that having this patch in PostgreSQL 10 is viable.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andrew Borodin (#17)
Re: Review: GIN non-intrusive vacuum of posting tree

On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin <borodin@octonica.com> wrote:

2. Thus, L&S fully concurrent vacuum is possible, indeed, and
furthermore Theodor suggested that I should implement not only page
eviction, but also page merge and tree condence algorithm.

I think that it's very hard to make merging of pages that are not
completely empty work, while also using the L&Y algorithm.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andrew Borodin
borodin@octonica.com
In reply to: Peter Geoghegan (#18)
Re: Review: GIN non-intrusive vacuum of posting tree

2017-03-16 23:55 GMT+05:00 Peter Geoghegan <pg@bowt.ie>:

On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin <borodin@octonica.com> wrote:

2. Thus, L&S fully concurrent vacuum is possible, indeed, and
furthermore Theodor suggested that I should implement not only page
eviction, but also page merge and tree condence algorithm.

I think that it's very hard to make merging of pages that are not
completely empty work, while also using the L&Y algorithm.

That's true. This is a distant plan...

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Teodor Sigaev
teodor@sigaev.ru
In reply to: Andrew Borodin (#2)
Re: Review: GIN non-intrusive vacuum of posting tree

Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.

I had a look on patch and found some issue.
Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves():

/*
* All subtree is empty - just return TRUE to indicate that parent must
* do a cleanup. Unless we are ROOT an there is way to go upper.
*/

if(isChildHasVoid && !isAnyNonempy && !isRoot)
return TRUE;

if(isChildHasVoid)
{
...
ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber);
}

In first 'if' clause I see !isRoot, so second if and corresponding
ginScanToDelete() could be called only for root in posting tree. If so, it seems
to me, it wasn't a good idea to move ginScanToDelete() from
ginVacuumPostingTree() and patch should just remove lock for cleanup over
ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page
then lock the whole posting tree to do ginScanToDelete() work.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andrew Borodin
borodin@octonica.com
In reply to: Teodor Sigaev (#20)
Re: Review: GIN non-intrusive vacuum of posting tree

Hi, Teodor!

2017-03-21 20:32 GMT+05:00 Teodor Sigaev <teodor@sigaev.ru>:

I had a look on patch

That's great, thanks!

/*
* All subtree is empty - just return TRUE to indicate that parent
must
* do a cleanup. Unless we are ROOT an there is way to go upper.
*/

if(isChildHasVoid && !isAnyNonempy && !isRoot)
return TRUE;

if(isChildHasVoid)
{
...
ginScanToDelete(gvs, blkno, TRUE, &root,
InvalidOffsetNumber);
}

In first 'if' clause I see !isRoot, so second if and corresponding
ginScanToDelete() could be called only for root in posting tree.

No, second conditional code will be called for any subtree, which
contains totally empty subtree. That check !isRoot covers case when
the entire posting tree should be erased: we cannot just quit out of
recursive cleanup, we have to make a scan here, starting from root.

Probably, variable isChildHasVoid has a bit confusing name. This flag
indicates that some subtree:
1. Had empty pages
2. Did not bother deleting them, because there is a chance that it is
a part of a bigger empty subtree.
May be it'd be better to call the variable "someChildIsVoidSubtree".

just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page then lock the whole posting tree to do ginScanToDelete() work.

It is, indeed, viable approach. But currently proposed patch is
capable of dealing with small page deletes without much of locking
fuss, even in 4-5 level trees.

How do you think, which way should we take?

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Teodor Sigaev
teodor@sigaev.ru
In reply to: Andrew Borodin (#21)
Re: Review: GIN non-intrusive vacuum of posting tree

No, second conditional code will be called for any subtree, which
contains totally empty subtree. That check !isRoot covers case when
the entire posting tree should be erased: we cannot just quit out of
recursive cleanup, we have to make a scan here, starting from root.

Oh, I see

Probably, variable isChildHasVoid has a bit confusing name. This flag
indicates that some subtree:
1. Had empty pages
2. Did not bother deleting them, because there is a chance that it is
a part of a bigger empty subtree.
May be it'd be better to call the variable "someChildIsVoidSubtree".

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

And if the whole posting tree is empty,then we could mark root page as leaf and
remove all other pages in tree without any locking. Although, it could be a task
for separate patch.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Andrew Borodin
borodin@octonica.com
In reply to: Teodor Sigaev (#22)
Re: Review: GIN non-intrusive vacuum of posting tree

2017-03-22 22:48 GMT+05:00 Teodor Sigaev <teodor@sigaev.ru>:

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.

And if the whole posting tree is empty,then we could mark root page as leaf
and remove all other pages in tree without any locking. Although, it could
be a task for separate patch.

From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Teodor Sigaev
teodor@sigaev.ru
In reply to: Andrew Borodin (#23)
Re: Review: GIN non-intrusive vacuum of posting tree

Thank you, pushed

Andrew Borodin wrote:

2017-03-22 22:48 GMT+05:00 Teodor Sigaev <teodor@sigaev.ru>:

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.

And if the whole posting tree is empty,then we could mark root page as leaf
and remove all other pages in tree without any locking. Although, it could
be a task for separate patch.

From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers