Do not check unlogged indexes on standby

Started by Andrey Borodinover 6 years ago12 messages
#1Andrey Borodin
x4mmm@yandex-team.ru
1 attachment(s)

Hi hackers!

Currently, if we check indexes on standby we often get

man-psbpshn0skhsxynd/xiva_xtable_testing_01 R # select bt_index_check('xiva_loadtest.pk_uid');
ERROR: 58P01: could not open file "base/16453/125407": No such file or directory

I think that we should print warning and that's it. Amcheck should not give false positives.

Or, maybe, there are some design considerations that I miss?

BTW I really want to enable rightlink-leftlink invariant validation on standby..

Thanks!

Best regards, Andrey Borodin.

Attachments:

0001-Do-not-verify-unlogged-indexes-during-amcheck.patchapplication/octet-stream; name=0001-Do-not-verify-unlogged-indexes-during-amcheck.patch; x-unix-mode=0644Download
From 8e8eada51518ab4b77f1c99b8c691784fd938fd7 Mon Sep 17 00:00:00 2001
From: Andrey <amborodin@acm.org>
Date: Mon, 12 Aug 2019 14:37:40 +0500
Subject: [PATCH] Do not verify unlogged indexes during amcheck

On standby unlogged index may be absent and user gets error.
In this case we just emmit warning to user.
---
 contrib/amcheck/verify_nbtree.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index a1438a2855..ef7e411cdb 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -119,7 +119,7 @@ PG_FUNCTION_INFO_V1(bt_index_parent_check);
 
 static void bt_index_check_internal(Oid indrelid, bool parentcheck,
 						bool heapallindexed);
-static inline void btree_index_checkable(Relation rel);
+static inline bool btree_index_checkable(Relation rel);
 static void bt_check_every_level(Relation rel, Relation heaprel,
 					 bool readonly, bool heapallindexed);
 static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
@@ -248,10 +248,9 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed)
 						RelationGetRelationName(indrel))));
 
 	/* Relation suitable for checking as B-Tree? */
-	btree_index_checkable(indrel);
-
-	/* Check index, possibly against table it is an index on */
-	bt_check_every_level(indrel, heaprel, parentcheck, heapallindexed);
+	if (btree_index_checkable(indrel))
+		/* Check index, possibly against table it is an index on */
+		bt_check_every_level(indrel, heaprel, parentcheck, heapallindexed);
 
 	/*
 	 * Release locks early. That's ok here because nothing in the called
@@ -271,7 +270,7 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed)
  * callable by non-superusers. If granted, it's useful to be able to check a
  * whole cluster.
  */
-static inline void
+static inline bool
 btree_index_checkable(Relation rel)
 {
 	if (rel->rd_rel->relkind != RELKIND_INDEX ||
@@ -295,6 +294,18 @@ btree_index_checkable(Relation rel)
 				 errmsg("cannot check index \"%s\"",
 						RelationGetRelationName(rel)),
 				 errdetail("Index is not valid")));
+
+	if (!RelationNeedsWAL(rel) && RecoveryInProgress())
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot check index \"%s\"",
+						RelationGetRelationName(rel)),
+				 errdetail("Index is unlogged and recovery is in progress")));
+		return false;
+	}
+
+	return true;
 }
 
 /*
-- 
2.20.1

In reply to: Andrey Borodin (#1)
Re: Do not check unlogged indexes on standby

On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Currently, if we check indexes on standby we often get

man-psbpshn0skhsxynd/xiva_xtable_testing_01 R # select bt_index_check('xiva_loadtest.pk_uid');
ERROR: 58P01: could not open file "base/16453/125407": No such file or directory

I think that we should print warning and that's it. Amcheck should not give false positives.

I agree -- amcheck should just skip over unlogged tables during
recovery, since there is simply nothing to check.

I pushed your patch to all branches that have amcheck just now, so now
we skip over unlogged relations when in recovery, though I made some
revisions.

Your patch didn't handle temp tables/indexes that were created in the
first session correctly -- we must be careful about the distinction
between unlogged tables, and tables that don't require WAL logging
(the later includes temp tables). Also, I thought that it was a good
idea to actively test for the presence of a main fork when we don't
skip (i.e. when the system isn't in recovery and the B-Tree indexes
isn't unlogged) -- we now give a clean report of corruption when that
happens, rather than letting an ambiguous "can't happen" error get
raised by low-level code. This might be possible with system catalog
corruption, for example. Finally, I thought that the WARNING was a bit
strong -- a NOTICE is more appropriate.

Thanks!

--
Peter Geoghegan

In reply to: Andrey Borodin (#1)
Re: Do not check unlogged indexes on standby

On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

BTW I really want to enable rightlink-leftlink invariant validation on standby..

That seems very hard. My hope was that bt_check_index() can detect the
same problem a different way. The bt_right_page_check_scankey()
cross-page check (which needs nothing more than an AccessShareLock)
will often detect such problems, because the page image itself will be
totally wrong in some way.

I'm guessing that you have direct experience with that *not* being
good enough, though. Can you share further details? I suppose that
bt_right_page_check_scankey() helps with transposed pages, but doesn't
help so much when you have WAL-level inconsistencies.

--
Peter Geoghegan

#4Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#3)
Re: Do not check unlogged indexes on standby

13 авг. 2019 г., в 3:23, Peter Geoghegan <pg@bowt.ie> написал(а):

I pushed your patch to all branches that have amcheck just now, so now
we skip over unlogged relations when in recovery, though I made some
revisions.

Oh, cool, thanks!

Your patch didn't handle temp tables/indexes that were created in the
first session correctly -- we must be careful about the distinction
between unlogged tables, and tables that don't require WAL logging
(the later includes temp tables). Also, I thought that it was a good
idea to actively test for the presence of a main fork when we don't
skip (i.e. when the system isn't in recovery and the B-Tree indexes
isn't unlogged) -- we now give a clean report of corruption when that
happens, rather than letting an ambiguous "can't happen" error get
raised by low-level code. This might be possible with system catalog
corruption, for example. Finally, I thought that the WARNING was a bit
strong -- a NOTICE is more appropriate.

+1

13 авг. 2019 г., в 3:36, Peter Geoghegan <pg@bowt.ie> написал(а):

On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

BTW I really want to enable rightlink-leftlink invariant validation on standby..

That seems very hard. My hope was that bt_check_index() can detect the
same problem a different way. The bt_right_page_check_scankey()
cross-page check (which needs nothing more than an AccessShareLock)
will often detect such problems, because the page image itself will be
totally wrong in some way.

I'm guessing that you have direct experience with that *not* being
good enough, though. Can you share further details? I suppose that
bt_right_page_check_scankey() helps with transposed pages, but doesn't
help so much when you have WAL-level inconsistencies.

We have a bunch of internal testing HA clusters that suffered from corruption conditions.
We fixed everything that can be detected with parent-check on primaries or usual check on standbys.
(page updates were lost both on primary and during WAL replay)
But from time to time when clusters switch primary from one availability zone to another we observe
"right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index"

We are going to search for these clusters with this [0]https://github.com/x4m/amcheck/commit/894d8bafb3c9a26bbc168ea5f4f33bcd1fc9f495 tolerating possible fraction of false positives, we have them anyway.
But I think I could put some effort into making corruption-detection tooling better.
I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck.

Best regards, Andrey Borodin.

[0]: https://github.com/x4m/amcheck/commit/894d8bafb3c9a26bbc168ea5f4f33bcd1fc9f495

In reply to: Andrey Borodin (#4)
Re: Do not check unlogged indexes on standby

On Tue, Aug 13, 2019 at 5:17 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

We have a bunch of internal testing HA clusters that suffered from corruption conditions.
We fixed everything that can be detected with parent-check on primaries or usual check on standbys.
(page updates were lost both on primary and during WAL replay)
But from time to time when clusters switch primary from one availability zone to another we observe
"right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index"

That sounds like an issue caused by a failure to replay all available
WAL, where only one page happened to get written out by a checkpoint
before a crash. It's something like that. That wouldn't be caught by
the cross-page bt_index_check() check that we do already.

We are going to search for these clusters with this [0] tolerating possible fraction of false positives, we have them anyway.
But I think I could put some effort into making corruption-detection tooling better.
I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck.

That's one possibility. When I first designed amcheck it was important
to be conservative, so I invented a general rule about never acquiring
multiple buffer locks at once. I still think that that was the correct
decision for the bt_downlink_check() check (the main extra
bt_index_parent_check() check), but I think that you're right about
retrying to verify the sibling links when bt_index_check() is called
from SQL.

nbtree will often "couple" buffer locks on the leaf level; it will
acquire a lock on a leaf page, and not release that lock until it has
also acquired a lock on the right sibling page (I'm mostly thinking of
_bt_stepright()). I am in favor of a patch that makes amcheck perform
sibling link verification within bt_index_check(), by retrying while
pessimistically coupling buffer locks. (Though I think that that
should just happen on the leaf level. We should not try to be too
clever about ignorable/half-dead/deleted pages, to be conservative.)

--
Peter Geoghegan

#6Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Peter Geoghegan (#5)
1 attachment(s)
Re: Do not check unlogged indexes on standby

13 авг. 2019 г., в 20:30, Peter Geoghegan <pg@bowt.ie> написал(а):

That's one possibility. When I first designed amcheck it was important
to be conservative, so I invented a general rule about never acquiring
multiple buffer locks at once. I still think that that was the correct
decision for the bt_downlink_check() check (the main extra
bt_index_parent_check() check), but I think that you're right about
retrying to verify the sibling links when bt_index_check() is called
from SQL.

nbtree will often "couple" buffer locks on the leaf level; it will
acquire a lock on a leaf page, and not release that lock until it has
also acquired a lock on the right sibling page (I'm mostly thinking of
_bt_stepright()). I am in favor of a patch that makes amcheck perform
sibling link verification within bt_index_check(), by retrying while
pessimistically coupling buffer locks. (Though I think that that
should just happen on the leaf level. We should not try to be too
clever about ignorable/half-dead/deleted pages, to be conservative.)

PFA V1 of this check retry.

Best regards, Andrey Borodin.

Attachments:

0001-In-amcheck-nbtree-do-rightlink-verification-with-loc.patchapplication/octet-stream; name=0001-In-amcheck-nbtree-do-rightlink-verification-with-loc.patch; x-unix-mode=0644Download
From 71ce0f09c601f6dfacf0fb807f240e5c7e6374ab Mon Sep 17 00:00:00 2001
From: Andrey <amborodin@acm.org>
Date: Thu, 15 Aug 2019 16:43:16 +0300
Subject: [PATCH] In amcheck nbtree do rightlink verification with lock
 coupling

When doing nbtree verification we ignore rightlink-leftling invariant violations
unless ShareLock is taken. To ensure detection of corruption, we couple page
locks and recheck links consistency again.
---
 contrib/amcheck/verify_nbtree.c | 41 +++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index ef7e411cdb..c6467530ed 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -122,6 +122,7 @@ static void bt_index_check_internal(Oid indrelid, bool parentcheck,
 static inline bool btree_index_checkable(Relation rel);
 static void bt_check_every_level(Relation rel, Relation heaprel,
 					 bool readonly, bool heapallindexed);
+static void bt_recheck_block_rightlink(BtreeCheckState *state, BlockNumber leftblockno);
 static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
 							 BtreeLevel level);
 static void bt_target_page_check(BtreeCheckState *state);
@@ -560,6 +561,42 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly,
 	MemoryContextDelete(state->targetcontext);
 }
 
+/*
+ * Verify that coherence of rightling's leftlink under lock
+ */
+static void bt_recheck_block_rightlink(BtreeCheckState *state, BlockNumber leftblockno)
+{
+	Buffer		lbuffer, rbuffer;
+	Page		lpage,rpage;
+	BTPageOpaque lopaque, ropaque;
+
+	/* Read and lock left block */
+	lbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftblockno, RBM_NORMAL,
+								state->checkstrategy);
+	LockBuffer(lbuffer, BT_READ);
+	lpage = BufferGetPage(lbuffer);
+	lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
+
+	/* Read right block by following lopaque->btpo_next of left block */
+	rbuffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, lopaque->btpo_next, RBM_NORMAL,
+								 state->checkstrategy);
+	/* Here we are going to couple locks on left block and right block */
+	LockBuffer(rbuffer, BT_READ);
+	rpage = BufferGetPage(rbuffer);
+	ropaque = (BTPageOpaque) PageGetSpecialPointer(rpage);
+
+	if (ropaque->btpo_prev != leftblockno)
+		ereport(ERROR,
+				(errcode(ERRCODE_INDEX_CORRUPTED),
+				 errmsg("left link/right link pair in index \"%s\" not in agreement",
+						RelationGetRelationName(state->rel)),
+				 errdetail_internal("Block=%u left block=%u left link from block=%u.",
+									leftblockno, lopaque->btpo_next, ropaque->btpo_prev)));
+
+	UnlockReleaseBuffer(lbuffer);
+	UnlockReleaseBuffer(rbuffer);
+}
+
 /*
  * Given a left-most block at some level, move right, verifying each page
  * individually (with more verification across pages for "readonly"
@@ -713,6 +750,7 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 		/*
 		 * readonly mode can only ever land on live pages and half-dead pages,
 		 * so sibling pointers should always be in mutual agreement
+		 * if not in readonly mode - we have to recheck links under lock
 		 */
 		if (state->readonly && opaque->btpo_prev != leftcurrent)
 			ereport(ERROR,
@@ -721,6 +759,9 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
 							RelationGetRelationName(state->rel)),
 					 errdetail_internal("Block=%u left block=%u left link from block=%u.",
 										current, leftcurrent, opaque->btpo_prev)));
+		else if (opaque->btpo_prev != leftcurrent && level.level == 0)
+		/* on leaf level - recheck rightlinks */
+			bt_recheck_block_rightlink(state, leftcurrent);
 
 		/* Check level, which must be valid for non-ignorable page */
 		if (level.level != opaque->btpo.level)
-- 
2.20.1

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrey Borodin (#6)
Re: Do not check unlogged indexes on standby

On 2019-Aug-15, Andrey Borodin wrote:

PFA V1 of this check retry.

CFbot complains that this doesn't apply; can you please rebase?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Alvaro Herrera (#7)
Re: Do not check unlogged indexes on standby

The patch has been committed already.

Peter Geoghegan
(Sent from my phone)

In reply to: Peter Geoghegan (#8)
Re: Do not check unlogged indexes on standby

On Wed, Sep 11, 2019 at 7:10 PM Peter Geoghegan <pg@bowt.ie> wrote:

The patch has been committed already.

Oh, wait. It hasn't. Andrey didn't create a new thread for his largely
independent patch, so I incorrectly assumed he created a CF entry for
his original bugfix.

--
Peter Geoghegan

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#9)
Re: Do not check unlogged indexes on standby

On 2019-Sep-11, Peter Geoghegan wrote:

On Wed, Sep 11, 2019 at 7:10 PM Peter Geoghegan <pg@bowt.ie> wrote:

The patch has been committed already.

Oh, wait. It hasn't. Andrey didn't create a new thread for his largely
independent patch, so I incorrectly assumed he created a CF entry for
his original bugfix.

So, I'm confused. There appear to be two bugfix patches in this thread,
with no relationship between them, and as far as I can tell only one of
them has been addressed. What was applied (6754fe65a4c6) is
significantly different from what Andrey submitted. Is that correct?
If so, we still have an open bug, right?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Alvaro Herrera (#10)
Re: Do not check unlogged indexes on standby

On Wed, Feb 5, 2020 at 1:27 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

So, I'm confused. There appear to be two bugfix patches in this thread,
with no relationship between them, and as far as I can tell only one of
them has been addressed. What was applied (6754fe65a4c6) is
significantly different from what Andrey submitted. Is that correct?
If so, we still have an open bug, right?

No. We had two separate patches on this thread:

1. A bugfix patch to make amcheck not do the wrong thing with unlogged
indexes when operating on a standby.

2. An unrelated feature/enhancement that would allow amcheck to detect
more types of corruption with only an AccessShareLock on the relation.

The first item was dealt with way back in August, without controversy
-- my commit 6754fe65 was more or less Andrey's bugfix.

The second item genereated another thread a little after this thread.
Everything was handled on this other thread. Ultimately, I rejected
the enhancement on the grounds that it wasn't safe on standbys in the
face of concurrent splits and deletions [1]/messages/by-id/CAH2-Wzmb_QOmHX=uWjCFV4Gf1810kz-yVzK6RA=VS41EFcKh=g@mail.gmail.com -- Peter Geoghegan.

I believe that all of the items discussed on this thread have been
resolved. Did I miss a CF entry or something?

[1]: /messages/by-id/CAH2-Wzmb_QOmHX=uWjCFV4Gf1810kz-yVzK6RA=VS41EFcKh=g@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#11)
Re: Do not check unlogged indexes on standby

On 2020-Feb-05, Peter Geoghegan wrote:

The second item genereated another thread a little after this thread.
Everything was handled on this other thread. Ultimately, I rejected
the enhancement on the grounds that it wasn't safe on standbys in the
face of concurrent splits and deletions [1].

I believe that all of the items discussed on this thread have been
resolved. Did I miss a CF entry or something?

Nah. I just had one of the messages flagged in my inbox, and I wasn't
sure what had happened since the other thread was not referenced in this
one. I wasn't looking at any CF entries.

Thanks for the explanation,

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services