Amcheck: do rightlink verification with lock coupling

Started by Andrey Borodinover 6 years ago21 messages
#1Andrey Borodin
x4mmm@yandex-team.ru
1 attachment(s)

Hi!

This is a thread to discuss amcheck feature started in other thread[0]/messages/by-id/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru.

Currently amcheck is scanning every B-tree level. If verification is done with ShareLock - amcheck will test that each page leftlink is pointing to page with rightlink backwards.
This is important invariant, in our experience it proved to be good at detecting various corruptions.
But this invariant can be detected only if both pages are not modified (e.g. split concurrently).

PFA patch, that in case of suspicion locks two pages and retests that invariant. This allows detection of several corruptions on standby.

This patch violates one of amcheck design principles: current code does not ever take more than one page lock. I do not know: should we hold this rule or should we use more deep check?

Thanks!

Best regards, Andrey Borodin.

[0]: /messages/by-id/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru

Attachments:

v2-0001-In-amcheck-nbtree-do-rightlink-verification-with-loc.patchapplication/octet-stream; name=v2-0001-In-amcheck-nbtree-do-rightlink-verification-with-loc.patch; x-unix-mode=0644Download
From 9fa9b27a550c2202c7c5fc0185d9db5b7c86585e Mon Sep 17 00:00:00 2001
From: Andrey <amborodin@acm.org>
Date: Thu, 15 Aug 2019 16:43:16 +0300
Subject: [PATCH] In amcheck nbtree do rightlink verification with lock
 coupling

When doing nbtree verification we ignore rightlink-leftling invariant violations
unless ShareLock is taken. To ensure detection of corruption this behaiovr is changed.
With this commit we couple page locks and recheck links consistency again.
---
 contrib/amcheck/verify_nbtree.c | 41 +++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 05e7d678ed..665a37d05a 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -133,6 +133,7 @@ static inline bool btree_index_mainfork_expected(Relation rel);
 static void bt_check_every_level(Relation rel, Relation heaprel,
 								 bool heapkeyspace, bool readonly, bool heapallindexed,
 								 bool rootdescend);
+static void bt_recheck_block_rightlink(BtreeCheckState *state, BlockNumber leftblockno);
 static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
 											   BtreeLevel level);
 static void bt_target_page_check(BtreeCheckState *state);
@@ -628,6 +629,42 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 	MemoryContextDelete(state->targetcontext);
 }
 
+/*
+ * Verify that coherence of rightling's leftlink under lock
+ */
+static void bt_recheck_block_rightlink(BtreeCheckState *state, BlockNumber leftblockno)
+{
+	Buffer		lbuffer, rbuffer;
+	Page		lpage,rpage;
+	BTPageOpaque lopaque, ropaque;
+
+	/* Read and lock left block */
+	lbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftblockno, RBM_NORMAL,
+								state->checkstrategy);
+	LockBuffer(lbuffer, BT_READ);
+	lpage = BufferGetPage(lbuffer);
+	lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
+
+	/* Read right block by following lopaque->btpo_next of left block */
+	rbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, lopaque->btpo_next, RBM_NORMAL,
+								 state->checkstrategy);
+	/* Here we are going to couple locks on left block and right block */
+	LockBuffer(rbuffer, BT_READ);
+	rpage = BufferGetPage(rbuffer);
+	ropaque = (BTPageOpaque) PageGetSpecialPointer(rpage);
+
+	if (ropaque->btpo_prev != leftblockno)
+		ereport(ERROR,
+				(errcode(ERRCODE_INDEX_CORRUPTED),
+				 errmsg("left link/right link pair in index \"%s\" not in agreement",
+						RelationGetRelationName(state->rel)),
+				 errdetail_internal("Block=%u left block=%u left link from block=%u.",
+									leftblockno, lopaque->btpo_next, ropaque->btpo_prev)));
+
+	UnlockReleaseBuffer(lbuffer);
+	UnlockReleaseBuffer(rbuffer);
+}
+
 /*
  * Given a left-most block at some level, move right, verifying each page
  * individually (with more verification across pages for "readonly"
@@ -783,6 +820,7 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 		/*
 		 * readonly mode can only ever land on live pages and half-dead pages,
 		 * so sibling pointers should always be in mutual agreement
+		 * if not in readonly mode - we have to recheck links under lock
 		 */
 		if (state->readonly && opaque->btpo_prev != leftcurrent)
 			ereport(ERROR,
@@ -791,6 +829,9 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 							RelationGetRelationName(state->rel)),
 					 errdetail_internal("Block=%u left block=%u left link from block=%u.",
 										current, leftcurrent, opaque->btpo_prev)));
+		else if (opaque->btpo_prev != leftcurrent && level.level == 0)
+		/* on leaf level - recheck rightlinks */
+			bt_recheck_block_rightlink(state, leftcurrent);
 
 		/* Check level, which must be valid for non-ignorable page */
 		if (level.level != opaque->btpo.level)
-- 
2.20.1

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrey Borodin (#1)
Re: Amcheck: do rightlink verification with lock coupling

On 2019-Sep-12, Andrey Borodin wrote:

This patch violates one of amcheck design principles: current code
does not ever take more than one page lock. I do not know: should we
hold this rule or should we use more deep check?

The check does seem worthwhile to me.

As far as I know, in btree you can lock the right sibling of a page
while keeping lock on the page itself, without risk of deadlock. So I'm
not sure what's the issue with that. This is in the README:

In most cases we release our lock and pin on a page before attempting
to acquire pin and lock on the page we are moving to. In a few places
it is necessary to lock the next page before releasing the current one.
This is safe when moving right or up, but not when moving left or down
(else we'd create the possibility of deadlocks).

I suppose Peter was concerned about being able to run amcheck without
causing any trouble at all for concurrent operation; maybe we can retain
that property by making this check optional.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#2)
Re: Amcheck: do rightlink verification with lock coupling

On Thu, Sep 12, 2019 at 10:16:12AM -0300, Alvaro Herrera wrote:

On 2019-Sep-12, Andrey Borodin wrote:

This patch violates one of amcheck design principles: current code
does not ever take more than one page lock. I do not know: should we
hold this rule or should we use more deep check?

The check does seem worthwhile to me.

As far as I know, in btree you can lock the right sibling of a page
while keeping lock on the page itself, without risk of deadlock. So I'm
not sure what's the issue with that. This is in the README:

In most cases we release our lock and pin on a page before attempting
to acquire pin and lock on the page we are moving to. In a few places
it is necessary to lock the next page before releasing the current one.
This is safe when moving right or up, but not when moving left or down
(else we'd create the possibility of deadlocks).

I suppose Peter was concerned about being able to run amcheck without
causing any trouble at all for concurrent operation; maybe we can retain
that property by making this check optional.

Peter, any opinion on this proposed amcheck patch? In the other thread
[1]: /messages/by-id/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru
make this check optional seems like a reasonable compromise with respect
to the locking.

[1]: /messages/by-id/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Tomas Vondra (#3)
Re: Amcheck: do rightlink verification with lock coupling

On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Peter, any opinion on this proposed amcheck patch? In the other thread
[1] you seemed to agree this is worth checking, and Alvaro's proposal to
make this check optional seems like a reasonable compromise with respect
to the locking.

It's a good idea, and it probably doesn't even need to be made
optional -- lock coupling to the right is safe on a primary, and
should also be safe on standbys (though I should triple check the REDO
routines to be sure). The patch only does lock coupling when it proves
necessary, which ought to only happen when there is a concurrent page
split, which ought to be infrequent. Maybe there is no need to
compromise.

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

To be clear, I believe that Andrey wrote this patch for a reason -- I
assume that it makes a noticeable and consistent difference. I would
like to gain a better understanding of why that was for my own
benefit, though. For example, it might be that page deletion was a
factor that made the test I mentioned less effective. I care about the
specifics.

--
Peter Geoghegan

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Peter Geoghegan (#4)
Re: Amcheck: do rightlink verification with lock coupling

On Fri, Jan 10, 2020 at 06:49:33PM -0800, Peter Geoghegan wrote:

On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Peter, any opinion on this proposed amcheck patch? In the other thread
[1] you seemed to agree this is worth checking, and Alvaro's proposal to
make this check optional seems like a reasonable compromise with respect
to the locking.

It's a good idea, and it probably doesn't even need to be made
optional -- lock coupling to the right is safe on a primary, and
should also be safe on standbys (though I should triple check the REDO
routines to be sure). The patch only does lock coupling when it proves
necessary, which ought to only happen when there is a concurrent page
split, which ought to be infrequent. Maybe there is no need to
compromise.

OK, that makes sense.

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

To be clear, I believe that Andrey wrote this patch for a reason -- I
assume that it makes a noticeable and consistent difference. I would
like to gain a better understanding of why that was for my own
benefit, though. For example, it might be that page deletion was a
factor that made the test I mentioned less effective. I care about the
specifics.

Understood. Is that a reason to not commit of this patch now, though?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Tomas Vondra (#5)
Re: Amcheck: do rightlink verification with lock coupling

On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Understood. Is that a reason to not commit of this patch now, though?

It could use some polishing. Are you interested in committing it?

--
Peter Geoghegan

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Peter Geoghegan (#6)
Re: Amcheck: do rightlink verification with lock coupling

On Mon, Jan 13, 2020 at 03:49:40PM -0800, Peter Geoghegan wrote:

On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Understood. Is that a reason to not commit of this patch now, though?

It could use some polishing. Are you interested in committing it?

Not really - as a CFM I was trying to revive patches that seem in good
shape but not moving.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#4)
Re: Amcheck: do rightlink verification with lock coupling

Hi Peter! Sorry for answering so long.

11 янв. 2020 г., в 7:49, Peter Geoghegan <pg@bowt.ie> написал(а):

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

We were dealing with corruption caused by lost page update. Consider two pages:
A->B
If A is split into A` and C we get:
A`->C->B
But if update of A is lost we still have
A->B, but B backward pointers points to C.
B's smallest key is bigger than hikey of A`, but this do not violate
cross-pages invariant.

Page updates may be lost due to bug in backup software with incremental
backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
fsync error handling, bug in ssd firmware etc. And our data checksums do not
detect this kind of corruption. BTW I think that it would be better if our
checksums were not stored on a page itseft, they could detect this kind of faults.

We were able to timely detect all those problems on primaries in our testing
environment. But much later found out that some standbys were corrupted,
the problem appeared only when they were promoted.
Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more
invariant in standby checks.

Thanks!

Best regards, Andrey Borodin.

#9Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Andrey Borodin (#8)
Re: Amcheck: do rightlink verification with lock coupling

14 янв. 2020 г., в 9:47, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):

Page updates may be lost due to bug in backup software with incremental
backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
fsync error handling, bug in ssd firmware etc. And our data checksums do not
detect this kind of corruption. BTW I think that it would be better if our
checksums were not stored on a page itseft, they could detect this kind of faults.

Observed it just now.
There is one HA cluster where a node was marked dead. This node was disconnected from cluster, but due to human error there was postgres running.
Node managed to install block-level incremental backup to the chain. And backup software did not detect that backup step was taken from part of timeline that was not in actual timeline's history.
Result of restoration is:

man-w%/%db R # select bt_index_check('%.pk_%');
bt_index_check
----------------

(1 row)

Time: 1411.065 ms (00:01.411)
man-w%/%db R # select patched_index_check('%.pk_%');
ERROR: XX002: left link/right link pair in index "pk_labels" not in agreement
DETAIL: Block=42705 left block=42707 left link from block=45495.
LOCATION: bt_recheck_block_rightlink, verify_nbtree.c:621
Time: 671.336 ms

('%' is replacing removed chars)

I understand that this corruption was not introduced by postgres itself, but by combination of bug in two 3rd party tools and human error.
But I can imagine similar corruptions with different root causes.

Best regards, Andrey Borodin.

In reply to: Andrey Borodin (#8)
Re: Amcheck: do rightlink verification with lock coupling

On Mon, Jan 13, 2020 at 8:47 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

11 янв. 2020 г., в 7:49, Peter Geoghegan <pg@bowt.ie> написал(а):
I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

We were dealing with corruption caused by lost page update. Consider two pages:
A->B
If A is split into A` and C we get:
A`->C->B
But if update of A is lost we still have
A->B, but B backward pointers points to C.
B's smallest key is bigger than hikey of A`, but this do not violate
cross-pages invariant.

Page updates may be lost due to bug in backup software with incremental
backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
fsync error handling, bug in ssd firmware etc. And our data checksums do not
detect this kind of corruption. BTW I think that it would be better if our
checksums were not stored on a page itseft, they could detect this kind of faults.

I find this argument convincing. I'll try to get this committed soon.

While you could have used bt_index_parent_check() or heapallindexed to
detect the issue, those two options are a lot more expensive (plus the
former option won't work on a standby). Relaxing the principle that
says that we shouldn't hold multiple buffer locks concurrently doesn't
seem like that much to ask for to get such a clear benefit.

I think that this is safe, but page deletion/half-dead pages need more
thought. In general, the target page could have become "ignorable"
when rechecked.

We were able to timely detect all those problems on primaries in our testing
environment. But much later found out that some standbys were corrupted,
the problem appeared only when they were promoted.
Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more
invariant in standby checks.

I looked at that thread just now, but Grygory didn't say why this true
root check was particularly important, so I can't see much upside.
Plus that seems riskier than what you have in mind here.

Does it have something to do with the true root looking like a deleted
page? The details matter.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#10)
Re: Amcheck: do rightlink verification with lock coupling

On Thu, Jan 16, 2020 at 5:11 PM Peter Geoghegan <pg@bowt.ie> wrote:

I find this argument convincing. I'll try to get this committed soon.

While you could have used bt_index_parent_check() or heapallindexed to
detect the issue, those two options are a lot more expensive (plus the
former option won't work on a standby). Relaxing the principle that
says that we shouldn't hold multiple buffer locks concurrently doesn't
seem like that much to ask for to get such a clear benefit.

Having looked into it some more, I now have my doubts about this
patch. REDO routine code like btree_xlog_split() and
btree_xlog_unlink_page() feel entitled to only lock one page at a
time, which invalidates the assumption that we can couple locks on the
leaf level to verify mutual agreement in left and right sibling links
(with only an AccessShareLock on bt_index_check()'s target index
relation). It would definitely be safe for bt_index_check() to so this
were it not running in recovery mode, but that doesn't seem very
useful on its own.

I tried to come up with a specific example of how this could be
unsafe, but my explanation was all over the place (this could have had
something to do with it being Friday evening). Even still, it's up to
the patch to justify why it's safe, and that seems even more
difficult.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#11)
Re: Amcheck: do rightlink verification with lock coupling

On Fri, Jan 17, 2020 at 5:43 PM Peter Geoghegan <pg@bowt.ie> wrote:

I tried to come up with a specific example of how this could be
unsafe, but my explanation was all over the place (this could have had
something to do with it being Friday evening). Even still, it's up to
the patch to justify why it's safe, and that seems even more
difficult.

I can't see a way around this problem, so I'm marking the patch rejected.

Thanks
--
Peter Geoghegan

#13Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#12)
Re: Amcheck: do rightlink verification with lock coupling

Hi!

23 янв. 2020 г., в 00:59, Peter Geoghegan <pg@bowt.ie> написал(а):

On Fri, Jan 17, 2020 at 5:43 PM Peter Geoghegan <pg@bowt.ie> wrote:

I tried to come up with a specific example of how this could be
unsafe, but my explanation was all over the place (this could have had
something to do with it being Friday evening). Even still, it's up to
the patch to justify why it's safe, and that seems even more
difficult.

I can't see a way around this problem, so I'm marking the patch rejected.

In this thread [0]/messages/by-id/CAH2-Wzm7T_O+VUeo7=_NGPncs08z3JEybEwVLZGaASnbfg5vDA@mail.gmail.com we decided that lock coupling is necessary for btree_xlog_unlink_page().
So, maybe let's revive this patch?

Thanks!

Best regards, Andrey Borodin.

[0]: /messages/by-id/CAH2-Wzm7T_O+VUeo7=_NGPncs08z3JEybEwVLZGaASnbfg5vDA@mail.gmail.com

In reply to: Andrey M. Borodin (#13)
Re: Amcheck: do rightlink verification with lock coupling

On Mon, Jul 20, 2020 at 11:46 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

In this thread [0] we decided that lock coupling is necessary for btree_xlog_unlink_page().
So, maybe let's revive this patch?

Yes, let's do that. Can you resubmit it, please?

Peter Geoghegan

#15Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#14)
1 attachment(s)
Re: Amcheck: do rightlink verification with lock coupling

4 авг. 2020 г., в 03:44, Peter Geoghegan <pg@bowt.ie> написал(а):

On Mon, Jul 20, 2020 at 11:46 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

In this thread [0] we decided that lock coupling is necessary for btree_xlog_unlink_page().
So, maybe let's revive this patch?

Yes, let's do that. Can you resubmit it, please?

PFA v3.
Changes: fixed few typos in comments.

BTW, reviewing this patch again I cannot understand why we verify link coherence only on leaf level but not for internal pages?
I do not see any actual problems here.

Corruption detection power of leftlink-rightlinks on internal pages is diminishingly small compared to leaf pages. But there seems to be no reason to exclude internal pages?

Thanks!

Best regards, Andrey Borodin.

Attachments:

v3-0001-In-amcheck-nbtree-do-rightlink-verification-with-.patchapplication/octet-stream; name=v3-0001-In-amcheck-nbtree-do-rightlink-verification-with-.patch; x-unix-mode=0644Download
From daaa9c92789f393404d7a5d3ec685cee46d9c3c8 Mon Sep 17 00:00:00 2001
From: Andrey <amborodin@acm.org>
Date: Thu, 15 Aug 2019 16:43:16 +0300
Subject: [PATCH v3] In amcheck nbtree do rightlink verification with lock
 coupling

When doing nbtree verification we ignore rightlink-leftling invariant violations
unless ShareLock is taken. To ensure detection of corruption this behaiovr is changed.
With this commit we couple page locks and recheck links consistency again.
---
 contrib/amcheck/verify_nbtree.c | 41 +++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..46dfe9be1f 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -143,6 +143,7 @@ static inline bool btree_index_mainfork_expected(Relation rel);
 static void bt_check_every_level(Relation rel, Relation heaprel,
 								 bool heapkeyspace, bool readonly, bool heapallindexed,
 								 bool rootdescend);
+static void bt_recheck_block_rightlink(BtreeCheckState *state, BlockNumber leftblockno);
 static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
 											   BtreeLevel level);
 static void bt_target_page_check(BtreeCheckState *state);
@@ -620,6 +621,42 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 	MemoryContextDelete(state->targetcontext);
 }
 
+/*
+ * Verify the coherence of rightlink's leftlink under lock
+ */
+static void bt_recheck_block_rightlink(BtreeCheckState *state, BlockNumber leftblockno)
+{
+	Buffer		lbuffer, rbuffer;
+	Page		lpage,rpage;
+	BTPageOpaque lopaque, ropaque;
+
+	/* Read and lock left block */
+	lbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftblockno, RBM_NORMAL,
+								state->checkstrategy);
+	LockBuffer(lbuffer, BT_READ);
+	lpage = BufferGetPage(lbuffer);
+	lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
+
+	/* Read right block by following lopaque->btpo_next of left block */
+	rbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, lopaque->btpo_next, RBM_NORMAL,
+								 state->checkstrategy);
+	/* Here we are going to couple locks on left block and right block */
+	LockBuffer(rbuffer, BT_READ);
+	rpage = BufferGetPage(rbuffer);
+	ropaque = (BTPageOpaque) PageGetSpecialPointer(rpage);
+
+	if (ropaque->btpo_prev != leftblockno)
+		ereport(ERROR,
+				(errcode(ERRCODE_INDEX_CORRUPTED),
+				 errmsg("left link/right link pair in index \"%s\" not in agreement",
+						RelationGetRelationName(state->rel)),
+				 errdetail_internal("Block=%u left block=%u left link from block=%u.",
+									leftblockno, lopaque->btpo_next, ropaque->btpo_prev)));
+
+	UnlockReleaseBuffer(lbuffer);
+	UnlockReleaseBuffer(rbuffer);
+}
+
 /*
  * Given a left-most block at some level, move right, verifying each page
  * individually (with more verification across pages for "readonly"
@@ -778,6 +815,7 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 		/*
 		 * readonly mode can only ever land on live pages and half-dead pages,
 		 * so sibling pointers should always be in mutual agreement
+		 * if not in readonly mode - we have to recheck links under lock
 		 */
 		if (state->readonly && opaque->btpo_prev != leftcurrent)
 			ereport(ERROR,
@@ -786,6 +824,9 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 							RelationGetRelationName(state->rel)),
 					 errdetail_internal("Block=%u left block=%u left link from block=%u.",
 										current, leftcurrent, opaque->btpo_prev)));
+		else if (opaque->btpo_prev != leftcurrent && level.level == 0)
+		/* on leaf level - recheck rightlinks */
+			bt_recheck_block_rightlink(state, leftcurrent);
 
 		/* Check level, which must be valid for non-ignorable page */
 		if (level.level != opaque->btpo.level)
-- 
2.24.3 (Apple Git-128)

In reply to: Andrey M. Borodin (#15)
1 attachment(s)
Re: Amcheck: do rightlink verification with lock coupling

On Tue, Aug 4, 2020 at 9:33 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

BTW, reviewing this patch again I cannot understand why we verify link coherence only on leaf level but not for internal pages?
I do not see any actual problems here.

Well, I thought that it might be a good idea to limit it to the leaf
level, based on the theory that we rarely couple locks on internal
page levels in general. But yeah, that's probably not a good enough
reason to avoid lock coupling on internal pages. It's probably better
to do it everywhere than to explain why we don't do it on the internal
level -- the explanation will probably be confusing. And even if there
was a performance issue, it could only happen when there are
concurrent internal page splits -- but those are supposed to be rare.

Attached is v4, which now checks internal pages (just like leaf
pages). The main other change in this revised version is that we make
the error raised by bt_index_check() match the error used in the old
bt_index_parent_check() case -- we always want to blame the current
target page when amcheck complains (technically the page we blame when
the invariant fails isn't strictly guaranteed to be quite the same
thing as the target, but it's close enough to not really matter in
reality). Other adjustments:

* Added _bt_checkpage() calls for buffers, as is standard practice in nbtree.

* Added protection against locking the same page a second time in the
event of a faulty sibling link -- we should avoid a self-deadlock in
the event of a page that is corrupt in just the wrong way.

* Updated obsolescent comments that claimed that we never couple
buffer locks in amcheck.

I would like to commit something like this in the next day or two.

Thoughts?

--
Peter Geoghegan

Attachments:

v4-0001-Add-amcheck-sibling-link-checks.patchapplication/octet-stream; name=v4-0001-Add-amcheck-sibling-link-checks.patchDownload
From 8f141b150a442f54a5d271e430e173f1942fbde6 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 5 Aug 2020 16:07:46 -0700
Subject: [PATCH v4] Add amcheck sibling link checks.

Teach contrib/amcheck's bt_index_check() function to check agreement
between left links and right links between siblings.  This was always
possible with bt_index_parent_check(), but the check tends to catch a
lot of real world issues, and can only be used on replicas with the
former function (since it only requires an AccessShareLock).
---
 contrib/amcheck/verify_nbtree.c | 123 ++++++++++++++++++++++++++++----
 1 file changed, 110 insertions(+), 13 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c9f9e755dc..cf1860110f 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -145,6 +145,9 @@ static void bt_check_every_level(Relation rel, Relation heaprel,
 								 bool rootdescend);
 static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
 											   BtreeLevel level);
+static void bt_recheck_sibling_links(BtreeCheckState *state,
+									 BlockNumber btpo_prev_from_target,
+									 BlockNumber leftcurrent);
 static void bt_target_page_check(BtreeCheckState *state);
 static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state);
 static void bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
@@ -775,17 +778,9 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 			 */
 		}
 
-		/*
-		 * readonly mode can only ever land on live pages and half-dead pages,
-		 * so sibling pointers should always be in mutual agreement
-		 */
-		if (state->readonly && opaque->btpo_prev != leftcurrent)
-			ereport(ERROR,
-					(errcode(ERRCODE_INDEX_CORRUPTED),
-					 errmsg("left link/right link pair in index \"%s\" not in agreement",
-							RelationGetRelationName(state->rel)),
-					 errdetail_internal("Block=%u left block=%u left link from block=%u.",
-										current, leftcurrent, opaque->btpo_prev)));
+		/* Sibling links should be in mutual agreement */
+		if (opaque->btpo_prev != leftcurrent)
+			bt_recheck_sibling_links(state, opaque->btpo_prev, leftcurrent);
 
 		/* Check level, which must be valid for non-ignorable page */
 		if (level.level != opaque->btpo.level)
@@ -865,6 +860,106 @@ nextpage:
 	return nextleveldown;
 }
 
+/*
+ * Raise an error when target page's left link does not match the right link
+ * from the page that was previously the target page (called leftcurrent
+ * here).  Make sure that this condition has definitely been violated in the
+ * !readonly case, where the apparent violation of this invariant is probably
+ * just due to a harmless concurrent page split.
+ *
+ * This is the only place where amcheck will ever "couple" buffer locks (in
+ * the style of_bt_check_unique).  This is only needed in !readonly mode.  We
+ * generally prefer to avoid more thorough cross-page checks in !readonly
+ * mode, but this case is relatively important, and isn't very disruptive in
+ * practice.  Also, we probably won't end up here all that often when the
+ * index isn't corrupt (that only happens in the event of a concurrent page
+ * split), so the performance impact of lock coupling here should be
+ * acceptable in practice.
+ */
+static void
+bt_recheck_sibling_links(BtreeCheckState *state,
+						 BlockNumber btpo_prev_from_target,
+						 BlockNumber leftcurrent)
+{
+	if (!state->readonly)
+	{
+		Buffer		lbuffer;
+		Buffer		rbuffer;
+		Page		page;
+		BTPageOpaque opaque;
+		BlockNumber rightblock;
+
+		/*
+		 * Actual recheck is required when we only have an AccessShareLock on
+		 * relation.  Couple locks in the usual order for nbtree: Left to
+		 * right.
+		 */
+		lbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftcurrent,
+									 RBM_NORMAL, state->checkstrategy);
+		LockBuffer(lbuffer, BT_READ);
+		_bt_checkpage(state->rel, lbuffer);
+		page = BufferGetPage(lbuffer);
+		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+		rightblock = opaque->btpo_next;
+
+		/*
+		 * Avoid deadlock in rare cases where sibling link points to the page
+		 * that currently contains it
+		 */
+		if (rightblock != leftcurrent)
+		{
+			rbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, rightblock,
+										 RBM_NORMAL, state->checkstrategy);
+			LockBuffer(rbuffer, BT_READ);
+			_bt_checkpage(state->rel, rbuffer);
+			page = BufferGetPage(rbuffer);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			/* btpo_prev_from_target may have changed; update it */
+			btpo_prev_from_target = opaque->btpo_prev;
+		}
+		else
+		{
+			/*
+			 * The "cross page item order invariant" check will generally
+			 * catch this first, but don't rely on that from this distance
+			 */
+			rbuffer = InvalidBuffer;
+			btpo_prev_from_target = P_NONE;
+		}
+
+		if (BufferIsValid(rbuffer))
+			UnlockReleaseBuffer(rbuffer);
+		UnlockReleaseBuffer(lbuffer);
+
+		if (btpo_prev_from_target == leftcurrent)
+		{
+			ereport(DEBUG1,
+					(errcode(ERRCODE_NO_DATA),
+					 errmsg("harmless concurrent page split detected in index %s",
+							RelationGetRelationName(state->rel)),
+					 errdetail_internal("Block=%u right sibling=%u.",
+										leftcurrent, rightblock)));
+			return;
+		}
+
+		/*
+		 * Index is corrupt.  Make sure that we report correct target.  This
+		 * theoretically could have changed in rare cases where there is index
+		 * corruption as well as a concurrent page split. (Note that
+		 * btpo_prev_from_target was already updated above.)
+		 */
+		state->targetblock = rightblock;
+	}
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INDEX_CORRUPTED),
+			 errmsg("left link/right link pair in index \"%s\" not in agreement",
+					RelationGetRelationName(state->rel)),
+			 errdetail_internal("Block=%u left block=%u left link from block=%u.",
+								state->targetblock, leftcurrent,
+								btpo_prev_from_target)));
+}
+
 /*
  * Function performs the following checks on target page, or pages ancillary to
  * target page:
@@ -1964,8 +2059,8 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
 	 * On the other hand, we'd need to lock down race conditions involving
 	 * deletion of child's left page, for long enough to read the child page
 	 * into memory (in other words, a scheme with concurrently held buffer
-	 * locks on both child and left-of-child pages).  That's unacceptable for
-	 * amcheck functions on general principle, though.
+	 * locks on both child and left-of-child pages).  That hardly seems worth
+	 * the trouble.
 	 */
 	Assert(state->readonly);
 
@@ -2774,6 +2869,8 @@ invariant_l_nontarget_offset(BtreeCheckState *state, BTScanInsert key,
  * There is never an attempt to get a consistent view of multiple pages using
  * multiple concurrent buffer locks; in general, we only acquire a single pin
  * and buffer lock at a time, which is often all that the nbtree code requires.
+ * (Actually, bt_recheck_sibling_links couples buffer locks, which is the only
+ * exception to this general rule.)
  *
  * Operating on a copy of the page is useful because it prevents control
  * getting stuck in an uninterruptible state when an underlying operator class
-- 
2.25.1

#17Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#16)
Re: Amcheck: do rightlink verification with lock coupling

6 авг. 2020 г., в 04:25, Peter Geoghegan <pg@bowt.ie> написал(а):

* Added _bt_checkpage() calls for buffers, as is standard practice in nbtree.

* Added protection against locking the same page a second time in the
event of a faulty sibling link -- we should avoid a self-deadlock in
the event of a page that is corrupt in just the wrong way.

* Updated obsolescent comments that claimed that we never couple
buffer locks in amcheck.

Cool, thanks!
There's mintor typo: missing space in "of_bt_check_unique".

I would like to commit something like this in the next day or two.

Thoughts?

Sounds great! Thanks!

Best regards, Andrey Borodin.

In reply to: Andrey M. Borodin (#17)
2 attachment(s)
Re: Amcheck: do rightlink verification with lock coupling

On Wed, Aug 5, 2020 at 9:50 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Sounds great! Thanks!

I'm afraid that there is another problem, this time with
btree_xlog_split(). It's possible to get false positives when running
the new test continually on a standby. You can see this by running
verification on a standby continually, while the primary runs with a
workload that gets many page splits.

It's easy to see if you apply this patch:

--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -435,6 +435,9 @@ btree_xlog_split(bool newitemonleft,
XLogReaderState *record)
  UnlockReleaseBuffer(lbuf);
  UnlockReleaseBuffer(rbuf);
+ /* trick */
+ pg_usleep(10 * 1000L);
+

The only thing that we can do is adjust the locking in
btree_xlog_split() to match the primary (kind of like commit 9a9db08a,
except with page splits instead of page deletion). Attached is a
revised version of the patch, along with the changes that we'd need to
REDO to make the amcheck patch really work.

I'm not sure if this change to the REDO routine is worth the overhead
or trouble, though. I have to think about it some more.

BTW, the first patch in the series now has a new check for page
deletion -- that was missing from v4.

--
Peter Geoghegan

Attachments:

v5-0002-Adjust-btree_xlog_split-locking.patchapplication/octet-stream; name=v5-0002-Adjust-btree_xlog_split-locking.patchDownload
From 060617843dae5f91a0b5f94da3cd9cd4f4dd2cd0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 6 Aug 2020 09:34:15 -0700
Subject: [PATCH v5 2/2] Adjust btree_xlog_split() locking.

---
 src/backend/access/nbtree/nbtxlog.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index be0fa450f3..a228c3a986 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -427,21 +427,8 @@ btree_xlog_split(bool newitemonleft, XLogReaderState *record)
 		MarkBufferDirty(lbuf);
 	}
 
-	/*
-	 * We no longer need the buffers.  They must be released together, so that
-	 * readers cannot observe two inconsistent halves.
-	 */
-	if (BufferIsValid(lbuf))
-		UnlockReleaseBuffer(lbuf);
-	UnlockReleaseBuffer(rbuf);
-
 	/*
 	 * Fix left-link of the page to the right of the new right sibling.
-	 *
-	 * Note: in normal operation, we do this while still holding lock on the
-	 * two split pages.  However, that's not necessary for correctness in WAL
-	 * replay, because no other index update can be in progress, and readers
-	 * will cope properly when following an obsolete left-link.
 	 */
 	if (rnext != P_NONE)
 	{
@@ -460,6 +447,15 @@ btree_xlog_split(bool newitemonleft, XLogReaderState *record)
 		if (BufferIsValid(buffer))
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * We no longer need the buffers.  They must be released together, so that
+	 * readers cannot observe two inconsistent halves.
+	 */
+	if (BufferIsValid(lbuf))
+		UnlockReleaseBuffer(lbuf);
+	UnlockReleaseBuffer(rbuf);
+
 }
 
 static void
-- 
2.25.1

v5-0001-Add-amcheck-sibling-link-checks.patchapplication/octet-stream; name=v5-0001-Add-amcheck-sibling-link-checks.patchDownload
From b946d148c6cc983b9b2f22bf8d1c2dd8bed5b62b Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 5 Aug 2020 16:07:46 -0700
Subject: [PATCH v5 1/2] Add amcheck sibling link checks.

Teach contrib/amcheck's bt_index_check() function to check agreement
between left links and right links between siblings.  This was always
possible with bt_index_parent_check(), but the check tends to catch a
lot of real world issues, and can only be used on replicas with the
former function (since it only requires an AccessShareLock).
---
 contrib/amcheck/verify_nbtree.c | 132 ++++++++++++++++++++++++++++----
 1 file changed, 119 insertions(+), 13 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c9f9e755dc..fdf0601f4a 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -145,6 +145,9 @@ static void bt_check_every_level(Relation rel, Relation heaprel,
 								 bool rootdescend);
 static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
 											   BtreeLevel level);
+static void bt_recheck_sibling_links(BtreeCheckState *state,
+									 BlockNumber btpo_prev_from_target,
+									 BlockNumber leftcurrent);
 static void bt_target_page_check(BtreeCheckState *state);
 static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state);
 static void bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
@@ -775,17 +778,9 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 			 */
 		}
 
-		/*
-		 * readonly mode can only ever land on live pages and half-dead pages,
-		 * so sibling pointers should always be in mutual agreement
-		 */
-		if (state->readonly && opaque->btpo_prev != leftcurrent)
-			ereport(ERROR,
-					(errcode(ERRCODE_INDEX_CORRUPTED),
-					 errmsg("left link/right link pair in index \"%s\" not in agreement",
-							RelationGetRelationName(state->rel)),
-					 errdetail_internal("Block=%u left block=%u left link from block=%u.",
-										current, leftcurrent, opaque->btpo_prev)));
+		/* Sibling links should be in mutual agreement */
+		if (opaque->btpo_prev != leftcurrent)
+			bt_recheck_sibling_links(state, opaque->btpo_prev, leftcurrent);
 
 		/* Check level, which must be valid for non-ignorable page */
 		if (level.level != opaque->btpo.level)
@@ -865,6 +860,115 @@ nextpage:
 	return nextleveldown;
 }
 
+/*
+ * Raise an error when target page's left link does not match the right link
+ * from the page that was previously the target page (called leftcurrent
+ * here).  Make sure that this condition has definitely been violated in the
+ * !readonly case, where the apparent violation of this invariant is probably
+ * just due to a harmless concurrent page split.
+ *
+ * This is the only place where amcheck will "couple" buffer locks (in
+ * !readonly mode).  In general we prefer to simply avoid more thorough
+ * cross-page checks in !readonly mode instead, but we're more ambitious here.
+ * Inconsistencies among sibling page's sibling links often occur when a page
+ * split is reflected in some but not all B-Tree leaf pages.  That's a common
+ * indicator of subtle problems during recovery.  Detecting this even when
+ * running on a replica is quite valuable.
+ *
+ * We probably won't end up here all that often when the index isn't corrupt.
+ * That only happens for !readonly callers, and only in the event of a
+ * concurrent page split.  The performance impact of performing lock coupling
+ * here is minimal in practice.
+ */
+static void
+bt_recheck_sibling_links(BtreeCheckState *state,
+						 BlockNumber btpo_prev_from_target,
+						 BlockNumber leftcurrent)
+{
+	if (!state->readonly)
+	{
+		Buffer		lbuf;
+		Buffer		newtargetbuf;
+		Page		page;
+		BTPageOpaque opaque;
+		BlockNumber	newtargetblock;
+
+		/*
+		 * Actual recheck is required when we only have an AccessShareLock on
+		 * relation.  Couple locks in the usual order for nbtree: Left to
+		 * right.
+		 */
+		lbuf = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftcurrent,
+								  RBM_NORMAL, state->checkstrategy);
+		LockBuffer(lbuf, BT_READ);
+		_bt_checkpage(state->rel, lbuf);
+		page = BufferGetPage(lbuf);
+		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+		if (P_ISDELETED(opaque))
+		{
+			/*
+			 * Cannot reason about concurrently deleted page -- no sibling
+			 * links.  (We don't have to consider this with new target page
+			 * below because we'll have locked lbuf there.)
+			 */
+			UnlockReleaseBuffer(lbuf);
+			return;
+		}
+
+		newtargetblock = opaque->btpo_next;
+		if (newtargetblock != leftcurrent)
+		{
+			newtargetbuf = ReadBufferExtended(state->rel, MAIN_FORKNUM,
+											  newtargetblock, RBM_NORMAL,
+											  state->checkstrategy);
+			LockBuffer(newtargetbuf, BT_READ);
+			_bt_checkpage(state->rel, newtargetbuf);
+			page = BufferGetPage(newtargetbuf);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			/* btpo_prev_from_target may have changed; update it */
+			btpo_prev_from_target = opaque->btpo_prev;
+		}
+		else
+		{
+			/* Self-deadlock avoided, but index is corrupt */
+			newtargetbuf = InvalidBuffer;
+			btpo_prev_from_target = P_NONE;
+		}
+
+		if (BufferIsValid(newtargetbuf))
+			UnlockReleaseBuffer(newtargetbuf);
+		UnlockReleaseBuffer(lbuf);
+
+		if (btpo_prev_from_target == leftcurrent)
+		{
+			/* Report split in left sibling, not target (or new target) */
+			ereport(DEBUG1,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("harmless concurrent page split detected in index %s",
+							RelationGetRelationName(state->rel)),
+					 errdetail_internal("Block=%u right sibling=%u.",
+										leftcurrent, newtargetblock)));
+			return;
+		}
+
+		/*
+		 * Index is corrupt.  Make sure that we report correct target.  This
+		 * theoretically could have changed in rare cases where there is index
+		 * corruption as well as a concurrent page split. (Note that
+		 * btpo_prev_from_target was already updated above.)
+		 */
+		state->targetblock = newtargetblock;
+	}
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INDEX_CORRUPTED),
+			 errmsg("left link/right link pair in index \"%s\" not in agreement",
+					RelationGetRelationName(state->rel)),
+			 errdetail_internal("Block=%u left block=%u left link from block=%u.",
+								state->targetblock, leftcurrent,
+								btpo_prev_from_target)));
+}
+
 /*
  * Function performs the following checks on target page, or pages ancillary to
  * target page:
@@ -1964,8 +2068,8 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
 	 * On the other hand, we'd need to lock down race conditions involving
 	 * deletion of child's left page, for long enough to read the child page
 	 * into memory (in other words, a scheme with concurrently held buffer
-	 * locks on both child and left-of-child pages).  That's unacceptable for
-	 * amcheck functions on general principle, though.
+	 * locks on both child and left-of-child pages).  That hardly seems worth
+	 * the trouble.
 	 */
 	Assert(state->readonly);
 
@@ -2774,6 +2878,8 @@ invariant_l_nontarget_offset(BtreeCheckState *state, BTScanInsert key,
  * There is never an attempt to get a consistent view of multiple pages using
  * multiple concurrent buffer locks; in general, we only acquire a single pin
  * and buffer lock at a time, which is often all that the nbtree code requires.
+ * (Actually, bt_recheck_sibling_links couples buffer locks, which is the only
+ * exception to this general rule.)
  *
  * Operating on a copy of the page is useful because it prevents control
  * getting stuck in an uninterruptible state when an underlying operator class
-- 
2.25.1

#19Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#18)
Re: Amcheck: do rightlink verification with lock coupling

6 авг. 2020 г., в 21:38, Peter Geoghegan <pg@bowt.ie> написал(а):

On Wed, Aug 5, 2020 at 9:50 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Sounds great! Thanks!

I'm afraid that there is another problem, this time with
btree_xlog_split(). It's possible to get false positives when running
the new test continually on a standby. You can see this by running
verification on a standby continually, while the primary runs with a
workload that gets many page splits.

Yes, I see the problem...

The only thing that we can do is adjust the locking in
btree_xlog_split() to match the primary (kind of like commit 9a9db08a,
except with page splits instead of page deletion). Attached is a
revised version of the patch, along with the changes that we'd need to
REDO to make the amcheck patch really work.

I'm not sure if this change to the REDO routine is worth the overhead
or trouble, though. I have to think about it some more.

If we want to check relations between pages we must either apply them together (under locks) or tolerate some fraction of false positives. I understand that mitigating and tolerating false positives is nonsense in mathematica sense, but from practical point of view it's just OK.

But having complete solution with no false positives seems much better.

BTW, the first patch in the series now has a new check for page
deletion -- that was missing from v4.

Yes, seems like that was a bug..

Thanks!

Best regards, Andrey Borodin.

In reply to: Andrey M. Borodin (#19)
Re: Amcheck: do rightlink verification with lock coupling

On Thu, Aug 6, 2020 at 10:59 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

But having complete solution with no false positives seems much better.

Agreed. I know that you didn't pursue this for no reason -- having the
check available makes bt_check_index() a lot more valuable in
practice. It detects what is actually a classic example of subtle
B-Tree corruption (left link corruption), which appears in Modern
B-Tree techniques in its discussion of corruption detection. It's
actually the canonical example of how B-Tree corruption can be very
subtle in the real world.

I pushed a cleaned up version of this patch just now. I added some
commentary about this canonical example in header comments for the new
function.

Thanks
--
Peter Geoghegan

#21Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#20)
Re: Amcheck: do rightlink verification with lock coupling

8 авг. 2020 г., в 23:14, Peter Geoghegan <pg@bowt.ie> написал(а):

I pushed a cleaned up version of this patch just now. I added some
commentary about this canonical example in header comments for the new
function.

Thanks for working on this!

Best regards, Andrey Borodin.