amcheck verification for GiST and GIN
Hello! My name is Grigory Kryachko, I decided to join the efforts with
Andrey Borodin in his working on amcheck.
Here is the patch which I (with Andrey as my advisor) built on the top of
the last patch from this thread: https://commitfest.postgresql.org/25/1800/
.
It adds an ability to verify validity of GIN index. It is not polished
yet, but it works and we wanted to show it to you so you can give us some
feedback, and also let you know about this work if you have any plans of
writing something like that yourselves, so that you do not redo what is
already done.
In the mentioned above thread there was an issue with right type of lock,
we have not addressed it yet. Right now I am primarily interested in
feedback about GIN part.
Attachments:
amchek_gin_gist.patchtext/x-patch; charset=US-ASCII; name=amchek_gin_gist.patchDownload
From 3f70942f9e84e594fe31784c1da35c6939d919ba Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Tue, 10 Mar 2020 20:42:15 +0500
Subject: [PATCH 01/15] apply Andrey's patch
---
contrib/amcheck/Makefile | 9 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 14 +
contrib/amcheck/amcheck.c | 160 ++++++++++
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 23 ++
contrib/amcheck/expected/check_gist.out | 18 ++
contrib/amcheck/sql/check_gist.sql | 9 +
contrib/amcheck/verify_gist.c | 374 ++++++++++++++++++++++++
contrib/amcheck/verify_nbtree.c | 174 ++---------
doc/src/sgml/amcheck.sgml | 19 ++
10 files changed, 653 insertions(+), 149 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.c
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_gist.out
create mode 100644 contrib/amcheck/sql/check_gist.sql
create mode 100644 contrib/amcheck/verify_gist.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..df7b15375f 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,18 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ amcheck.o \
+ verify_gist.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql \
+ amcheck--1.1--1.2.sql \
+ amcheck--1.0--1.1.sql \
+ amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_gist
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..44b88a40a0
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,14 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- gist_index_parent_check()
+--
+CREATE FUNCTION gist_index_parent_check(index regclass)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'gist_index_parent_check'
+LANGUAGE C STRICT;
+
+REVOKE ALL ON FUNCTION gist_index_parent_check(regclass) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
new file mode 100644
index 0000000000..5bc3f831bc
--- /dev/null
+++ b/contrib/amcheck/amcheck.c
@@ -0,0 +1,160 @@
+/*-------------------------------------------------------------------------
+ *
+ * amcheck.c
+ * Utility functions common to all access methods.
+ *
+ * Copyright (c) 2017-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "amcheck.h"
+#include "catalog/index.h"
+#include "commands/tablecmds.h"
+
+
+/*
+ * Lock acquisition reused across different am types
+ */
+void
+amcheck_lock_relation(Oid indrelid, Relation *indrel,
+ Relation *heaprel, LOCKMODE lockmode)
+{
+ Oid heapid;
+
+ /*
+ * We must lock table before index to avoid deadlocks. However, if the
+ * passed indrelid isn't an index then IndexGetRelation() will fail.
+ * Rather than emitting a not-very-helpful error message, postpone
+ * complaining, expecting that the is-it-an-index test below will fail.
+ *
+ * In hot standby mode this will raise an error when lockmode is
+ * AccessExclusiveLock.
+ */
+ heapid = IndexGetRelation(indrelid, true);
+ if (OidIsValid(heapid))
+ *heaprel = table_open(heapid, lockmode);
+ else
+ *heaprel = NULL;
+
+ /*
+ * Open the target index relations separately (like relation_openrv(), but
+ * with heap relation locked first to prevent deadlocking). In hot
+ * standby mode this will raise an error when lockmode is
+ * AccessExclusiveLock.
+ *
+ * There is no need for the usual indcheckxmin usability horizon test
+ * here, even in the nbtree heapallindexed case, because index undergoing
+ * verification only needs to have entries for a new transaction snapshot.
+ * (If caller is about to do nbtree parentcheck verification, there is no
+ * question about committed or recently dead heap tuples lacking index
+ * entries due to concurrent activity.)
+ */
+ *indrel = index_open(indrelid, lockmode);
+
+ /*
+ * Since we did the IndexGetRelation call above without any lock, it's
+ * barely possible that a race against an index drop/recreation could have
+ * netted us the wrong table.
+ */
+ if (*heaprel == NULL || heapid != IndexGetRelation(indrelid, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_TABLE),
+ errmsg("could not open parent table of index %s",
+ RelationGetRelationName(*indrel))));
+}
+
+/*
+ * Unlock index and heap relations early for amcheck_lock_relation() caller.
+ *
+ * This is ok because nothing in the called routines will trigger shared cache
+ * invalidations to be sent, so we can relax the usual pattern of only
+ * releasing locks after commit.
+ */
+void
+amcheck_unlock_relation(Oid indrelid, Relation indrel, Relation heaprel,
+ LOCKMODE lockmode)
+{
+ index_close(indrel, lockmode);
+ if (heaprel)
+ table_close(heaprel, lockmode);
+}
+
+/*
+ * Check if index relation should have a file for its main relation
+ * fork. Verification uses this to skip unlogged indexes when in hot standby
+ * mode, where there is simply nothing to verify.
+ *
+ * NB: Caller should call btree_index_checkable() or gist_index_checkable()
+ * before calling here.
+ */
+bool
+amcheck_index_mainfork_expected(Relation rel)
+{
+ if (rel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED ||
+ !RecoveryInProgress())
+ return true;
+
+ ereport(NOTICE,
+ (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+ errmsg("cannot verify unlogged index \"%s\" during recovery, skipping",
+ RelationGetRelationName(rel))));
+
+ return false;
+}
+
+/*
+ * PageGetItemId() wrapper that validates returned line pointer.
+ *
+ * Buffer page/page item access macros generally trust that line pointers are
+ * not corrupt, which might cause problems for verification itself. For
+ * example, there is no bounds checking in PageGetItem(). Passing it a
+ * corrupt line pointer can cause it to return a tuple/pointer that is unsafe
+ * to dereference.
+ *
+ * Validating line pointers before tuples avoids undefined behavior and
+ * assertion failures with corrupt indexes, making the verification process
+ * more robust and predictable.
+ */
+ItemId
+PageGetItemIdCareful(Relation rel, BlockNumber block, Page page,
+ OffsetNumber offset, size_t opaquesize)
+{
+ ItemId itemid = PageGetItemId(page, offset);
+
+ if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
+ BLCKSZ - opaquesize)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("line pointer points past end of tuple space in index \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid))));
+
+ /*
+ * Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree and gist
+ * never uses either. Verify that line pointer has storage, too, since
+ * even LP_DEAD items should.
+ */
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
+ ItemIdGetLength(itemid) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("invalid line pointer storage in index \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid))));
+
+ return itemid;
+}
\ No newline at end of file
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..b19e617177
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * amcheck.h
+ * Shared routines for amcheck verifications.
+ *
+ * Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/amcheck.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "storage/lockdefs.h"
+#include "utils/relcache.h"
+
+extern void amcheck_lock_relation(Oid indrelid, Relation *indrel,
+ Relation *heaprel, LOCKMODE lockmode);
+extern void amcheck_unlock_relation(Oid indrelid, Relation indrel,
+ Relation heaprel, LOCKMODE lockmode);
+extern bool amcheck_index_mainfork_expected(Relation rel);
+
+extern ItemId PageGetItemIdCareful(Relation rel, BlockNumber block,
+ Page page, OffsetNumber offset, size_t opaquesize);
\ No newline at end of file
diff --git a/contrib/amcheck/expected/check_gist.out b/contrib/amcheck/expected/check_gist.out
new file mode 100644
index 0000000000..8f3ec20946
--- /dev/null
+++ b/contrib/amcheck/expected/check_gist.out
@@ -0,0 +1,18 @@
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE gist_check AS SELECT point(random(),s) c FROM generate_series(1,10000) s;
+INSERT INTO gist_check SELECT point(random(),s) c FROM generate_series(1,100000) s;
+CREATE INDEX gist_check_idx ON gist_check USING gist(c);
+SELECT gist_index_parent_check('gist_check_idx');
+ gist_index_parent_check
+-------------------------
+
+(1 row)
+
+-- cleanup
+DROP TABLE gist_check;
diff --git a/contrib/amcheck/sql/check_gist.sql b/contrib/amcheck/sql/check_gist.sql
new file mode 100644
index 0000000000..0ee2e943c9
--- /dev/null
+++ b/contrib/amcheck/sql/check_gist.sql
@@ -0,0 +1,9 @@
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+CREATE TABLE gist_check AS SELECT point(random(),s) c FROM generate_series(1,10000) s;
+INSERT INTO gist_check SELECT point(random(),s) c FROM generate_series(1,100000) s;
+CREATE INDEX gist_check_idx ON gist_check USING gist(c);
+SELECT gist_index_parent_check('gist_check_idx');
+
+-- cleanup
+DROP TABLE gist_check;
diff --git a/contrib/amcheck/verify_gist.c b/contrib/amcheck/verify_gist.c
new file mode 100644
index 0000000000..3c741071f0
--- /dev/null
+++ b/contrib/amcheck/verify_gist.c
@@ -0,0 +1,374 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_gist.c
+ * Verifies the integrity of GiST indexes based on invariants.
+ *
+ * Verification checks that all paths in GiST graph contain
+ * consistent keys: tuples on parent pages consistently include tuples
+ * from children pages. Also, verification checks graph invariants:
+ * internal page must have at least one downlinks, internal page can
+ * reference either only leaf pages or only internal pages.
+ *
+ *
+ * Copyright (c) 2017-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/verify_gist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/gist_private.h"
+#include "amcheck.h"
+#include "catalog/pg_am.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/*
+ * GistScanItem represents one item of depth-first scan of GiST index.
+ */
+typedef struct GistScanItem
+{
+ int depth;
+ IndexTuple parenttup;
+ BlockNumber parentblk;
+ XLogRecPtr parentlsn;
+ BlockNumber blkno;
+ struct GistScanItem *next;
+} GistScanItem;
+
+PG_FUNCTION_INFO_V1(gist_index_parent_check);
+
+static void gist_index_checkable(Relation rel);
+static void gist_check_parent_keys_consistency(Relation rel);
+static void check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo);
+static IndexTuple gist_refind_parent(Relation rel, BlockNumber parentblkno,
+ BlockNumber childblkno,
+ BufferAccessStrategy strategy);
+
+/*
+ * gist_index_parent_check(index regclass)
+ *
+ * Verify integrity of GiST index.
+ *
+ * Acquires AccessShareLock on heap & index relations.
+ */
+Datum gist_index_parent_check(PG_FUNCTION_ARGS)
+{
+ Oid indrelid = PG_GETARG_OID(0);
+ Relation indrel;
+ Relation heaprel;
+ LOCKMODE lockmode = AccessShareLock;
+
+ /* lock table and index with neccesary level */
+ amcheck_lock_relation(indrelid, &indrel, &heaprel, lockmode);
+
+ /* verify that this is GiST eligible for check */
+ gist_index_checkable(indrel);
+
+ if (amcheck_index_mainfork_expected(indrel))
+ gist_check_parent_keys_consistency(indrel);
+
+ /* Unlock index and table */
+ amcheck_unlock_relation(indrelid, indrel, heaprel, lockmode);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Check that relation is eligible for GiST verification
+ */
+static void
+gist_index_checkable(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_INDEX ||
+ rel->rd_rel->relam != GIST_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only GiST indexes are supported as targets for this verification"),
+ errdetail("Relation \"%s\" is not a GiST index.",
+ RelationGetRelationName(rel))));
+
+ if (RELATION_IS_OTHER_TEMP(rel))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot access temporary tables of other sessions"),
+ errdetail("Index \"%s\" is associated with temporary relation.",
+ RelationGetRelationName(rel))));
+
+ if (!rel->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot check index \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Index is not valid")));
+}
+
+/*
+ * Main entry point for GiST check. Allocates memory context and scans through
+ * GiST graph. This function verifies that tuples of internal pages cover all
+ * the key space of each tuple on leaf page. To do this we invoke
+ * gist_check_internal_page() for every internal page.
+ *
+ * gist_check_internal_page() in it's turn takes every tuple and tries to
+ * adjust it by tuples on referenced child page. Parent gist tuple should
+ * never require any adjustments.
+ */
+static void
+gist_check_parent_keys_consistency(Relation rel)
+{
+ BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
+ GistScanItem *stack;
+ MemoryContext mctx;
+ MemoryContext oldcontext;
+ GISTSTATE *state;
+ int leafdepth;
+
+ mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(mctx);
+
+ state = initGISTstate(rel);
+
+ /*
+ * We don't know the height of the tree yet, but as soon as we encounter a
+ * leaf page, we will set 'leafdepth' to its depth.
+ */
+ leafdepth = -1;
+
+ /* Start the scan at the root page */
+ stack = (GistScanItem *) palloc0(sizeof(GistScanItem));
+ stack->depth = 0;
+ stack->parenttup = NULL;
+ stack->parentblk = InvalidBlockNumber;
+ stack->parentlsn = InvalidXLogRecPtr;
+ stack->blkno = GIST_ROOT_BLKNO;
+
+ while (stack)
+ {
+ GistScanItem *stack_next;
+ Buffer buffer;
+ Page page;
+ OffsetNumber i, maxoff;
+ XLogRecPtr lsn;
+
+ CHECK_FOR_INTERRUPTS();
+
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
+ RBM_NORMAL, strategy);
+ LockBuffer(buffer, GIST_SHARE);
+ page = (Page) BufferGetPage(buffer);
+ lsn = BufferGetLSNAtomic(buffer);
+
+ /* Do basic sanity checks on the page headers */
+ check_index_page(rel, buffer, stack->blkno);
+
+ /*
+ * It's possible that the page was split since we looked at the
+ * parent, so that we didn't missed the downlink of the right sibling
+ * when we scanned the parent. If so, add the right sibling to the
+ * stack now.
+ */
+ if (GistFollowRight(page) || stack->parentlsn < GistPageGetNSN(page))
+ {
+ /* split page detected, install right link to the stack */
+ GistScanItem *ptr = (GistScanItem *) palloc(sizeof(GistScanItem));
+
+ ptr->depth = stack->depth;
+ ptr->parenttup = CopyIndexTuple(stack->parenttup);
+ ptr->parentblk = stack->parentblk;
+ ptr->parentlsn = stack->parentlsn;
+ ptr->blkno = GistPageGetOpaque(page)->rightlink;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+
+ /* Check that the tree has the same height in all branches */
+ if (GistPageIsLeaf(page))
+ {
+ if (leafdepth == -1)
+ leafdepth = stack->depth;
+ else if (stack->depth != leafdepth)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ }
+
+ /*
+ * Check that each tuple looks valid, and is consistent with the
+ * downlink we followed when we stepped on this page.
+ */
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+ {
+ ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, i, sizeof(GISTPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+
+ /*
+ * Check that it's not a leftover invalid tuple from pre-9.1 See
+ * also gistdoinsert() and gistbulkdelete() handling of such
+ * tuples. We do consider it error here.
+ */
+ if (GistTupleIsInvalid(idxtuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("index \"%s\" contains an inner tuple marked as invalid, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i),
+ errdetail("This is caused by an incomplete page split at crash recovery before upgrading to PostgreSQL 9.1."),
+ errhint("Please REINDEX it.")));
+
+ if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent tuple sizes, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+
+ /*
+ * Check if this tuple is consistent with the downlink in the
+ * parent.
+ */
+ if (stack->parenttup &&
+ gistgetadjusted(rel, stack->parenttup, idxtuple, state))
+ {
+ /*
+ * There was a discrepancy between parent and child tuples.
+ * We need to verify it is not a result of concurrent call of
+ * gistplacetopage(). So, lock parent and try to find downlink
+ * for current page. It may be missing due to concurrent page
+ * split, this is OK.
+ */
+ pfree(stack->parenttup);
+ stack->parenttup = gist_refind_parent(rel, stack->parentblk,
+ stack->blkno, strategy);
+
+ /* We found it - make a final check before failing */
+ if (!stack->parenttup)
+ elog(NOTICE, "Unable to find parent tuple for block %u on block %u due to concurrent split",
+ stack->blkno, stack->parentblk);
+ else if (gistgetadjusted(rel, stack->parenttup, idxtuple, state))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent records on page %u offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+ else
+ {
+ /*
+ * But now it is properly adjusted - nothing to do here.
+ */
+ }
+ }
+
+ /* If this is an internal page, recurse into the child */
+ if (!GistPageIsLeaf(page))
+ {
+ GistScanItem *ptr;
+
+ ptr = (GistScanItem *) palloc(sizeof(GistScanItem));
+ ptr->depth = stack->depth + 1;
+ ptr->parenttup = CopyIndexTuple(idxtuple);
+ ptr->parentblk = stack->blkno;
+ ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
+ ptr->parentlsn = lsn;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+ }
+
+ LockBuffer(buffer, GIST_UNLOCK);
+ ReleaseBuffer(buffer);
+
+ /* Step to next item in the queue */
+ stack_next = stack->next;
+ if (stack->parenttup)
+ pfree(stack->parenttup);
+ pfree(stack);
+ stack = stack_next;
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(mctx);
+}
+
+static void
+check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo)
+{
+ Page page = BufferGetPage(buffer);
+
+ gistcheckpage(rel, buffer);
+
+ if (GistPageGetOpaque(page)->gist_page_id != GIST_PAGE_ID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has corrupted page %d",
+ RelationGetRelationName(rel), blockNo)));
+
+ if (GistPageIsDeleted(page))
+ {
+ if (!GistPageIsLeaf(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has deleted internal page %d",
+ RelationGetRelationName(rel), blockNo)));
+ if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has deleted page %d with tuples",
+ RelationGetRelationName(rel), blockNo)));
+ }
+ else if (PageGetMaxOffsetNumber(page) > MaxIndexTuplesPerPage)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has page %d with exceeding count of tuples",
+ RelationGetRelationName(rel), blockNo)));
+}
+
+/*
+ * Try to re-find downlink pointing to 'blkno', in 'parentblkno'.
+ *
+ * If found, returns a palloc'd copy of the downlink tuple. Otherwise,
+ * returns NULL.
+ */
+static IndexTuple
+gist_refind_parent(Relation rel, BlockNumber parentblkno,
+ BlockNumber childblkno, BufferAccessStrategy strategy)
+{
+ Buffer parentbuf;
+ Page parentpage;
+ OffsetNumber o,
+ parent_maxoff;
+ IndexTuple result = NULL;
+
+ parentbuf = ReadBufferExtended(rel, MAIN_FORKNUM, parentblkno, RBM_NORMAL,
+ strategy);
+
+ LockBuffer(parentbuf, GIST_SHARE);
+ parentpage = BufferGetPage(parentbuf);
+
+ if (GistPageIsLeaf(parentpage))
+ {
+ UnlockReleaseBuffer(parentbuf);
+ return result;
+ }
+
+ parent_maxoff = PageGetMaxOffsetNumber(parentpage);
+ for (o = FirstOffsetNumber; o <= parent_maxoff; o = OffsetNumberNext(o))
+ {
+ ItemId p_iid = PageGetItemIdCareful(rel, parentblkno, parentpage, o, sizeof(GISTPageOpaqueData));
+ IndexTuple itup = (IndexTuple) PageGetItem(parentpage, p_iid);
+
+ if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
+ {
+ /* Found it! Make copy and return it */
+ result = CopyIndexTuple(itup);
+ break;
+ }
+ }
+
+ UnlockReleaseBuffer(parentbuf);
+
+ return result;
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 31717321b0..c352803bd7 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -25,16 +25,14 @@
#include "access/htup_details.h"
#include "access/nbtree.h"
-#include "access/table.h"
#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
+#include "amcheck.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
-#include "commands/tablecmds.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
-#include "storage/lmgr.h"
#include "storage/smgr.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
@@ -129,7 +127,6 @@ PG_FUNCTION_INFO_V1(bt_index_parent_check);
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
bool heapallindexed, bool rootdescend);
static inline void btree_index_checkable(Relation rel);
-static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
bool rootdescend);
@@ -164,8 +161,6 @@ static inline bool invariant_l_nontarget_offset(BtreeCheckState *state,
static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum);
static inline BTScanInsert bt_mkscankey_pivotsearch(Relation rel,
IndexTuple itup);
-static ItemId PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block,
- Page page, OffsetNumber offset);
static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
@@ -226,7 +221,6 @@ static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
bool rootdescend)
{
- Oid heapid;
Relation indrel;
Relation heaprel;
LOCKMODE lockmode;
@@ -236,49 +230,13 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
else
lockmode = AccessShareLock;
- /*
- * We must lock table before index to avoid deadlocks. However, if the
- * passed indrelid isn't an index then IndexGetRelation() will fail.
- * Rather than emitting a not-very-helpful error message, postpone
- * complaining, expecting that the is-it-an-index test below will fail.
- *
- * In hot standby mode this will raise an error when parentcheck is true.
- */
- heapid = IndexGetRelation(indrelid, true);
- if (OidIsValid(heapid))
- heaprel = table_open(heapid, lockmode);
- else
- heaprel = NULL;
-
- /*
- * Open the target index relations separately (like relation_openrv(), but
- * with heap relation locked first to prevent deadlocking). In hot
- * standby mode this will raise an error when parentcheck is true.
- *
- * There is no need for the usual indcheckxmin usability horizon test
- * here, even in the heapallindexed case, because index undergoing
- * verification only needs to have entries for a new transaction snapshot.
- * (If this is a parentcheck verification, there is no question about
- * committed or recently dead heap tuples lacking index entries due to
- * concurrent activity.)
- */
- indrel = index_open(indrelid, lockmode);
-
- /*
- * Since we did the IndexGetRelation call above without any lock, it's
- * barely possible that a race against an index drop/recreation could have
- * netted us the wrong table.
- */
- if (heaprel == NULL || heapid != IndexGetRelation(indrelid, false))
- ereport(ERROR,
- (errcode(ERRCODE_UNDEFINED_TABLE),
- errmsg("could not open parent table of index %s",
- RelationGetRelationName(indrel))));
+ /* lock table and index with neccesary level */
+ amcheck_lock_relation(indrelid, &indrel, &heaprel, lockmode);
/* Relation suitable for checking as B-Tree? */
btree_index_checkable(indrel);
- if (btree_index_mainfork_expected(indrel))
+ if (amcheck_index_mainfork_expected(indrel))
{
bool heapkeyspace,
allequalimage;
@@ -296,14 +254,8 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
heapallindexed, rootdescend);
}
- /*
- * Release locks early. That's ok here because nothing in the called
- * routines will trigger shared cache invalidations to be sent, so we can
- * relax the usual pattern of only releasing locks after commit.
- */
- index_close(indrel, lockmode);
- if (heaprel)
- table_close(heaprel, lockmode);
+ /* Unlock index and table */
+ amcheck_unlock_relation(indrelid, indrel, heaprel, lockmode);
}
/*
@@ -340,28 +292,6 @@ btree_index_checkable(Relation rel)
errdetail("Index is not valid.")));
}
-/*
- * Check if B-Tree index relation should have a file for its main relation
- * fork. Verification uses this to skip unlogged indexes when in hot standby
- * mode, where there is simply nothing to verify.
- *
- * NB: Caller should call btree_index_checkable() before calling here.
- */
-static inline bool
-btree_index_mainfork_expected(Relation rel)
-{
- if (rel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED ||
- !RecoveryInProgress())
- return true;
-
- ereport(NOTICE,
- (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
- errmsg("cannot verify unlogged index \"%s\" during recovery, skipping",
- RelationGetRelationName(rel))));
-
- return false;
-}
-
/*
* Main entry point for B-Tree SQL-callable functions. Walks the B-Tree in
* logical order, verifying invariants as it goes. Optionally, verification
@@ -757,9 +687,9 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
ItemId itemid;
/* Internal page -- downlink gets leftmost on next level */
- itemid = PageGetItemIdCareful(state, state->targetblock,
+ itemid = PageGetItemIdCareful(state->rel, state->targetblock,
state->target,
- P_FIRSTDATAKEY(opaque));
+ P_FIRSTDATAKEY(opaque), sizeof(BTPageOpaqueData));
itup = (IndexTuple) PageGetItem(state->target, itemid);
nextleveldown.leftmost = BTreeTupleGetDownLink(itup);
nextleveldown.level = opaque->btpo.level - 1;
@@ -894,8 +824,8 @@ bt_target_page_check(BtreeCheckState *state)
IndexTuple itup;
/* Verify line pointer before checking tuple */
- itemid = PageGetItemIdCareful(state, state->targetblock,
- state->target, P_HIKEY);
+ itemid = PageGetItemIdCareful(state->rel, state->targetblock,
+ state->target, P_HIKEY, sizeof(BTPageOpaqueData));
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
P_HIKEY))
{
@@ -931,8 +861,8 @@ bt_target_page_check(BtreeCheckState *state)
CHECK_FOR_INTERRUPTS();
- itemid = PageGetItemIdCareful(state, state->targetblock,
- state->target, offset);
+ itemid = PageGetItemIdCareful(state->rel, state->targetblock,
+ state->target, offset, sizeof(BTPageOpaqueData));
itup = (IndexTuple) PageGetItem(state->target, itemid);
tupsize = IndexTupleSize(itup);
@@ -1246,9 +1176,9 @@ bt_target_page_check(BtreeCheckState *state)
OffsetNumberNext(offset));
/* Reuse itup to get pointed-to heap location of second item */
- itemid = PageGetItemIdCareful(state, state->targetblock,
+ itemid = PageGetItemIdCareful(state->rel, state->targetblock,
state->target,
- OffsetNumberNext(offset));
+ OffsetNumberNext(offset), sizeof(BTPageOpaqueData));
itup = (IndexTuple) PageGetItem(state->target, itemid);
tid = BTreeTupleGetPointsToTID(itup);
nhtid = psprintf("(%u,%u)",
@@ -1534,8 +1464,8 @@ bt_right_page_check_scankey(BtreeCheckState *state)
if (P_ISLEAF(opaque) && nline >= P_FIRSTDATAKEY(opaque))
{
/* Return first data item (if any) */
- rightitem = PageGetItemIdCareful(state, targetnext, rightpage,
- P_FIRSTDATAKEY(opaque));
+ rightitem = PageGetItemIdCareful(state->rel, targetnext, rightpage,
+ P_FIRSTDATAKEY(opaque), sizeof(BTPageOpaqueData));
}
else if (!P_ISLEAF(opaque) &&
nline >= OffsetNumberNext(P_FIRSTDATAKEY(opaque)))
@@ -1544,8 +1474,9 @@ bt_right_page_check_scankey(BtreeCheckState *state)
* Return first item after the internal page's "negative infinity"
* item
*/
- rightitem = PageGetItemIdCareful(state, targetnext, rightpage,
- OffsetNumberNext(P_FIRSTDATAKEY(opaque)));
+ rightitem = PageGetItemIdCareful(state->rel, targetnext, rightpage,
+ OffsetNumberNext(P_FIRSTDATAKEY(opaque)),
+ sizeof(BTPageOpaqueData));
}
else
{
@@ -1817,8 +1748,8 @@ bt_downlink_missing_check(BtreeCheckState *state)
RelationGetRelationName(state->rel));
level = topaque->btpo.level;
- itemid = PageGetItemIdCareful(state, state->targetblock, state->target,
- P_FIRSTDATAKEY(topaque));
+ itemid = PageGetItemIdCareful(state->rel, state->targetblock, state->target,
+ P_FIRSTDATAKEY(topaque), sizeof(BTPageOpaqueData));
itup = (IndexTuple) PageGetItem(state->target, itemid);
childblk = BTreeTupleGetDownLink(itup);
for (;;)
@@ -1842,8 +1773,8 @@ bt_downlink_missing_check(BtreeCheckState *state)
level - 1, copaque->btpo.level)));
level = copaque->btpo.level;
- itemid = PageGetItemIdCareful(state, childblk, child,
- P_FIRSTDATAKEY(copaque));
+ itemid = PageGetItemIdCareful(state->rel, childblk, child,
+ P_FIRSTDATAKEY(copaque), sizeof(BTPageOpaqueData));
itup = (IndexTuple) PageGetItem(child, itemid);
childblk = BTreeTupleGetDownLink(itup);
/* Be slightly more pro-active in freeing this memory, just in case */
@@ -1892,7 +1823,7 @@ bt_downlink_missing_check(BtreeCheckState *state)
*/
if (P_ISHALFDEAD(copaque) && !P_RIGHTMOST(copaque))
{
- itemid = PageGetItemIdCareful(state, childblk, child, P_HIKEY);
+ itemid = PageGetItemIdCareful(state->rel, childblk, child, P_HIKEY, sizeof(BTPageOpaqueData));
itup = (IndexTuple) PageGetItem(child, itemid);
if (BTreeTupleGetTopParent(itup) == state->targetblock)
return;
@@ -2263,8 +2194,8 @@ invariant_l_offset(BtreeCheckState *state, BTScanInsert key,
Assert(key->pivotsearch);
/* Verify line pointer before checking tuple */
- itemid = PageGetItemIdCareful(state, state->targetblock, state->target,
- upperbound);
+ itemid = PageGetItemIdCareful(state->rel, state->targetblock, state->target,
+ upperbound, sizeof(BTPageOpaqueData));
/* pg_upgrade'd indexes may legally have equal sibling tuples */
if (!key->heapkeyspace)
return invariant_leq_offset(state, key, upperbound);
@@ -2386,8 +2317,8 @@ invariant_l_nontarget_offset(BtreeCheckState *state, BTScanInsert key,
Assert(key->pivotsearch);
/* Verify line pointer before checking tuple */
- itemid = PageGetItemIdCareful(state, nontargetblock, nontarget,
- upperbound);
+ itemid = PageGetItemIdCareful(state->rel, nontargetblock, nontarget,
+ upperbound, sizeof(BTPageOpaqueData));
cmp = _bt_compare(state->rel, key, nontarget, upperbound);
/* pg_upgrade'd indexes may legally have equal sibling tuples */
@@ -2600,55 +2531,6 @@ bt_mkscankey_pivotsearch(Relation rel, IndexTuple itup)
return skey;
}
-/*
- * PageGetItemId() wrapper that validates returned line pointer.
- *
- * Buffer page/page item access macros generally trust that line pointers are
- * not corrupt, which might cause problems for verification itself. For
- * example, there is no bounds checking in PageGetItem(). Passing it a
- * corrupt line pointer can cause it to return a tuple/pointer that is unsafe
- * to dereference.
- *
- * Validating line pointers before tuples avoids undefined behavior and
- * assertion failures with corrupt indexes, making the verification process
- * more robust and predictable.
- */
-static ItemId
-PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
- OffsetNumber offset)
-{
- ItemId itemid = PageGetItemId(page, offset);
-
- if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
- BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
-
- /*
- * Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
- * never uses either. Verify that line pointer has storage, too, since
- * even LP_DEAD items should within nbtree.
- */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
- ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
-
- return itemid;
-}
-
/*
* BTreeTupleGetHeapTID() wrapper that enforces that a heap TID is present in
* cases where that is mandatory (i.e. for non-pivot tuples)
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index fe0fe9c186..aba096dce0 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -165,6 +165,25 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>gist_index_parent_check(index regclass) returns void</function>
+ <indexterm>
+ <primary>gist_index_parent_check</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <function>gist_index_parent_check</function> tests that its target GiST
+ has consistent parent-child tuples relations (no parent tuples
+ require tuple adjustement) and page graph respects balanced-tree
+ invariants (internal pages reference only leaf page or only internal
+ pages).
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
--
2.17.1
From 646e7041f0d317fd613db62ed5f6d282826bdec0 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sun, 22 Mar 2020 20:20:08 +0500
Subject: [PATCH 02/15] Add very basic check (just walks the tree and does
nothing)
---
contrib/amcheck/Makefile | 3 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 10 +
contrib/amcheck/expected/check_gin.out | 18 ++
contrib/amcheck/sql/check_gin.sql | 9 +
contrib/amcheck/verify_gin.c | 250 +++++++++++++++++++++++++
5 files changed, 289 insertions(+), 1 deletion(-)
create mode 100644 contrib/amcheck/expected/check_gin.out
create mode 100644 contrib/amcheck/sql/check_gin.sql
create mode 100644 contrib/amcheck/verify_gin.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index df7b15375f..fd9a4b699e 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -4,6 +4,7 @@ MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
amcheck.o \
+ verify_gin.o \
verify_gist.o \
verify_nbtree.o
@@ -14,7 +15,7 @@ DATA = amcheck--1.2--1.3.sql \
amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree check_gist
+REGRESS = check check_btree check_gin check_gist
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
index 44b88a40a0..068602b89c 100644
--- a/contrib/amcheck/amcheck--1.2--1.3.sql
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -12,3 +12,13 @@ AS 'MODULE_PATHNAME', 'gist_index_parent_check'
LANGUAGE C STRICT;
REVOKE ALL ON FUNCTION gist_index_parent_check(regclass) FROM PUBLIC;
+
+--
+-- gin_index_parent_check()
+--
+CREATE FUNCTION gin_index_parent_check(index regclass)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'gin_index_parent_check'
+LANGUAGE C STRICT;
+
+REVOKE ALL ON FUNCTION gin_index_parent_check(regclass) FROM PUBLIC;
diff --git a/contrib/amcheck/expected/check_gin.out b/contrib/amcheck/expected/check_gin.out
new file mode 100644
index 0000000000..ff415ecf9f
--- /dev/null
+++ b/contrib/amcheck/expected/check_gin.out
@@ -0,0 +1,18 @@
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE gin_check AS SELECT point(random(),s) c FROM generate_series(1,10000) s;
+INSERT INTO gin_check SELECT point(random(),s) c FROM generate_series(1,100000) s;
+CREATE INDEX gin_check_idx ON gin_check USING gin(c);
+SELECT gin_index_parent_check('gin_check_idx');
+ gin_index_parent_check
+-------------------------
+
+(1 row)
+
+-- cleanup
+DROP TABLE gin_check;
diff --git a/contrib/amcheck/sql/check_gin.sql b/contrib/amcheck/sql/check_gin.sql
new file mode 100644
index 0000000000..789813accf
--- /dev/null
+++ b/contrib/amcheck/sql/check_gin.sql
@@ -0,0 +1,9 @@
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+CREATE TABLE "gin_check"("Column1" int[]);
+INSERT INTO gin_check select array_agg(round(random()*100)) from generate_series(1, 1000000) as i group by i % 100000;
+CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_idx');
+
+-- cleanup
+DROP TABLE gin_check;
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
new file mode 100644
index 0000000000..0d5816e7c9
--- /dev/null
+++ b/contrib/amcheck/verify_gin.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_gin.c
+ * Verifies the integrity of GIN indexes based on invariants.
+ *
+ * Verification checks that all paths in GIN graph contain
+ * consistent keys: tuples on parent pages consistently include tuples
+ * from children pages. Also, verification checks graph invariants:
+ * internal page must have at least one downlinks, internal page can
+ * reference either only leaf pages or only internal pages.
+ *
+ *
+ * Copyright (c) 2017-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/verify_gin.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/gin_private.h"
+#include "amcheck.h"
+#include "catalog/pg_am.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/*
+ * GinScanItem represents one item of depth-first scan of GIN index.
+ */
+typedef struct GinScanItem
+{
+ int depth;
+ IndexTuple parenttup;
+ BlockNumber parentblk;
+ XLogRecPtr parentlsn;
+ BlockNumber blkno;
+ struct GinScanItem *next;
+} GinScanItem;
+
+PG_FUNCTION_INFO_V1(gin_index_parent_check);
+
+static void gin_index_checkable(Relation rel);
+static void gin_check_parent_keys_consistency(Relation rel);
+static void check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo);
+static IndexTuple gin_refind_parent(Relation rel, BlockNumber parentblkno,
+ BlockNumber childblkno,
+ BufferAccessStrategy strategy);
+
+/*
+ * gin_index_parent_check(index regclass)
+ *
+ * Verify integrity of GIN index.
+ *
+ * Acquires AccessShareLock on heap & index relations.
+ */
+Datum gin_index_parent_check(PG_FUNCTION_ARGS)
+{
+ Oid indrelid = PG_GETARG_OID(0);
+ Relation indrel;
+ Relation heaprel;
+ LOCKMODE lockmode = AccessShareLock;
+
+ /* lock table and index with neccesary level */
+ amcheck_lock_relation(indrelid, &indrel, &heaprel, lockmode);
+
+ /* verify that this is GIN eligible for check */
+ gin_index_checkable(indrel);
+
+ if (amcheck_index_mainfork_expected(indrel))
+ gin_check_parent_keys_consistency(indrel);
+
+ /* Unlock index and table */
+ amcheck_unlock_relation(indrelid, indrel, heaprel, lockmode);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Check that relation is eligible for GIN verification
+ */
+static void
+gin_index_checkable(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_INDEX ||
+ rel->rd_rel->relam != GIN_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only GIN indexes are supported as targets for this verification"),
+ errdetail("Relation \"%s\" is not a GIN index.",
+ RelationGetRelationName(rel))));
+
+ if (RELATION_IS_OTHER_TEMP(rel))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot access temporary tables of other sessions"),
+ errdetail("Index \"%s\" is associated with temporary relation.",
+ RelationGetRelationName(rel))));
+
+ if (!rel->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot check index \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Index is not valid")));
+}
+
+/*
+ * Main entry point for GIN check. Allocates memory context and scans through
+ * GIN graph. This function verifies that tuples of internal pages cover all
+ * the key space of each tuple on leaf page. To do this we invoke
+ * gin_check_internal_page() for every internal page.
+ *
+ * giin_check_internal_page() in it's turn takes every tuple and tries to
+ * adjust it by tuples on referenced child page. Parent gin tuple should
+ * never require any adjustments.
+ */
+static void
+gin_check_parent_keys_consistency(Relation rel)
+{
+ BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
+ GinScanItem *stack;
+ MemoryContext mctx;
+ MemoryContext oldcontext;
+ int leafdepth;
+
+ mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(mctx);
+
+ /*
+ * We don't know the height of the tree yet, but as soon as we encounter a
+ * leaf page, we will set 'leafdepth' to its depth.
+ */
+ leafdepth = -1;
+
+ /* Start the scan at the root page */
+ stack = (GinScanItem *) palloc0(sizeof(GinScanItem));
+ stack->depth = 0;
+ stack->parenttup = NULL;
+ stack->parentblk = InvalidBlockNumber;
+ stack->parentlsn = InvalidXLogRecPtr;
+ stack->blkno = GIN_ROOT_BLKNO;
+
+ while (stack)
+ {
+ GinScanItem *stack_next;
+ Buffer buffer;
+ Page page;
+ OffsetNumber i, maxoff;
+ XLogRecPtr lsn;
+
+ CHECK_FOR_INTERRUPTS();
+
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
+ RBM_NORMAL, strategy);
+ LockBuffer(buffer, GIN_SHARE);
+ page = (Page) BufferGetPage(buffer);
+ lsn = BufferGetLSNAtomic(buffer);
+
+ /* Do basic sanity checks on the page headers */
+ check_index_page(rel, buffer, stack->blkno);
+
+ /* Check that the tree has the same height in all branches */
+ if (GinPageIsLeaf(page))
+ {
+ if (leafdepth == -1)
+ leafdepth = stack->depth;
+ else if (stack->depth != leafdepth)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ }
+
+ /*
+ * Check that each tuple looks valid, and is consistent with the
+ * downlink we followed when we stepped on this page.
+ */
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+ {
+ ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, i, sizeof(GinPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+
+ if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent tuple sizes, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+
+ /* If this is an internal page, recurse into the child */
+ if (!GinPageIsLeaf(page))
+ {
+ GinScanItem *ptr;
+
+ ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
+ ptr->depth = stack->depth + 1;
+ ptr->parenttup = CopyIndexTuple(idxtuple);
+ ptr->parentblk = stack->blkno;
+ ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
+ ptr->parentlsn = lsn;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+ }
+
+ LockBuffer(buffer, GIN_UNLOCK);
+ ReleaseBuffer(buffer);
+
+ /* Step to next item in the queue */
+ stack_next = stack->next;
+ if (stack->parenttup)
+ pfree(stack->parenttup);
+ pfree(stack);
+ stack = stack_next;
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(mctx);
+}
+
+static void
+check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo)
+{
+ Page page = BufferGetPage(buffer);
+
+// gistcheckpage(rel, buffer);
+
+ if (GinPageIsDeleted(page))
+ {
+ if (!GinPageIsLeaf(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has deleted internal page %d",
+ RelationGetRelationName(rel), blockNo)));
+ if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has deleted page %d with tuples",
+ RelationGetRelationName(rel), blockNo)));
+ }
+ else if (PageGetMaxOffsetNumber(page) > MaxIndexTuplesPerPage)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has page %d with exceeding count of tuples",
+ RelationGetRelationName(rel), blockNo)));
+}
--
2.17.1
From 7faf8fc5946e0573de3191d1f0ff49e03dd7adf3 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sun, 29 Mar 2020 15:10:18 +0500
Subject: [PATCH 03/15] Add split detection
---
contrib/amcheck/verify_gin.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 0d5816e7c9..335c7eb6d3 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -163,6 +163,28 @@ gin_check_parent_keys_consistency(Relation rel)
/* Do basic sanity checks on the page headers */
check_index_page(rel, buffer, stack->blkno);
+ /*
+ * It's possible that the page was split since we looked at the
+ * parent, so that we didn't missed the downlink of the right sibling
+ * when we scanned the parent. If so, add the right sibling to the
+ * stack now.
+ */
+ if (!GinPageRightMost(page) &&
+ ginCompareItemPointers(&stack->parenttup->t_tid, GinDataPageGetRightBound(page)) >= 0)
+ {
+ /* split page detected, install right link to the stack */
+ GinScanItem *ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
+
+ ptr->depth = stack->depth;
+ ptr->parenttup = CopyIndexTuple(stack->parenttup);
+ ptr->parentblk = stack->parentblk;
+ ptr->parentlsn = stack->parentlsn;
+ ptr->blkno = GinPageGetOpaque(page)->rightlink;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+
+
/* Check that the tree has the same height in all branches */
if (GinPageIsLeaf(page))
{
--
2.17.1
From 3428181ec49090fee4235df2743e83ef2a154302 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sun, 29 Mar 2020 16:45:35 +0500
Subject: [PATCH 04/15] Add gincheckpage (copy from gistcheckpage)
---
contrib/amcheck/verify_gin.c | 2 +-
src/backend/access/gin/ginutil.c | 34 ++++++++++++++++++++++++++++++++
src/include/access/gin_private.h | 1 +
3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 335c7eb6d3..263cabe3d1 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -249,7 +249,7 @@ check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo)
{
Page page = BufferGetPage(buffer);
-// gistcheckpage(rel, buffer);
+ gincheckpage(rel, buffer);
if (GinPageIsDeleted(page))
{
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8b345fe88a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -704,3 +704,37 @@ ginUpdateStats(Relation index, const GinStatsData *stats, bool is_build)
END_CRIT_SECTION();
}
+
+/*
+ * Verify that a freshly-read page looks sane.
+ */
+void
+gincheckpage(Relation rel, Buffer buf)
+{
+ Page page = BufferGetPage(buf);
+
+ /*
+ * ReadBuffer verifies that every newly-read page passes
+ * PageHeaderIsValid, which means it either contains a reasonably sane
+ * page header or is all-zero. We have to defend against the all-zero
+ * case, however.
+ */
+ if (PageIsNew(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" contains unexpected zero page at block %u",
+ RelationGetRelationName(rel),
+ BufferGetBlockNumber(buf)),
+ errhint("Please REINDEX it.")));
+
+ /*
+ * Additionally check that the special area looks sane.
+ */
+ if (PageGetSpecialSize(page) != MAXALIGN(sizeof(GinPageOpaqueData)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" contains corrupted page at block %u",
+ RelationGetRelationName(rel),
+ BufferGetBlockNumber(buf)),
+ errhint("Please REINDEX it.")));
+}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 71eeac205c..23e539b059 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -113,6 +113,7 @@ extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
extern IndexBuildResult *ginbuild(Relation heap, Relation index,
struct IndexInfo *indexInfo);
extern void ginbuildempty(Relation index);
+extern void gincheckpage(Relation rel, Buffer buf);
extern bool gininsert(Relation index, Datum *values, bool *isnull,
ItemPointer ht_ctid, Relation heapRel,
IndexUniqueCheck checkUnique,
--
2.17.1
From 897bf557eb98c5932ba04199c674e6aa7448e82e Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Mon, 6 Apr 2020 22:51:11 +0500
Subject: [PATCH 05/15] Check order of elements on page + check parent
consistency
---
contrib/amcheck/verify_gin.c | 137 +++++++++++++++++++++++++++++++----
1 file changed, 122 insertions(+), 15 deletions(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 263cabe3d1..e400652a10 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -108,13 +108,7 @@ gin_index_checkable(Relation rel)
/*
* Main entry point for GIN check. Allocates memory context and scans through
- * GIN graph. This function verifies that tuples of internal pages cover all
- * the key space of each tuple on leaf page. To do this we invoke
- * gin_check_internal_page() for every internal page.
- *
- * giin_check_internal_page() in it's turn takes every tuple and tries to
- * adjust it by tuples on referenced child page. Parent gin tuple should
- * never require any adjustments.
+ * GIN graph.
*/
static void
gin_check_parent_keys_consistency(Relation rel)
@@ -123,17 +117,21 @@ gin_check_parent_keys_consistency(Relation rel)
GinScanItem *stack;
MemoryContext mctx;
MemoryContext oldcontext;
- int leafdepth;
+ GinState *state;
+
+ int leafdepth;
mctx = AllocSetContextCreate(CurrentMemoryContext,
"amcheck context",
ALLOCSET_DEFAULT_SIZES);
oldcontext = MemoryContextSwitchTo(mctx);
+ state = (GinState *) palloc(sizeof(GinState));
+ initGinState(state, rel);
- /*
- * We don't know the height of the tree yet, but as soon as we encounter a
- * leaf page, we will set 'leafdepth' to its depth.
- */
+ /*
+ * We don't know the height of the tree yet, but as soon as we encounter a
+ * leaf page, we will set 'leafdepth' to its depth.
+ */
leafdepth = -1;
/* Start the scan at the root page */
@@ -198,14 +196,15 @@ gin_check_parent_keys_consistency(Relation rel)
}
/*
- * Check that each tuple looks valid, and is consistent with the
- * downlink we followed when we stepped on this page.
+ * Check that tuples in each page are properly ordered and consistent with parent high key
*/
maxoff = PageGetMaxOffsetNumber(page);
+ IndexTuple prev_tuple = NULL;
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
{
ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, i, sizeof(GinPageOpaqueData));
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+ OffsetNumber attnum = gintuple_get_attrnum(state, idxtuple);
if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
ereport(ERROR,
@@ -213,7 +212,66 @@ gin_check_parent_keys_consistency(Relation rel)
errmsg("index \"%s\" has inconsistent tuple sizes, block %u, offset %u",
RelationGetRelationName(rel), stack->blkno, i)));
- /* If this is an internal page, recurse into the child */
+ GinNullCategory prev_key_category;
+ Datum prev_key;
+ GinNullCategory current_key_category;
+ Datum current_key = gintuple_get_key(state, idxtuple, ¤t_key_category);
+
+ if (i != FirstOffsetNumber) {
+ prev_key = gintuple_get_key(state, prev_tuple, &prev_key_category);
+
+ if (ginCompareEntries(state, attnum, prev_key, current_key, prev_key_category, current_key_category) <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has wrong tuple order, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+
+ }
+
+ /*
+ * Check if this tuple is consistent with the downlink in the
+ * parent.
+ */
+ if (stack->parenttup &&
+ i == maxoff) {
+ GinNullCategory parent_key_category;
+ Datum parent_key = gintuple_get_key(state, stack->parenttup, &parent_key_category);
+
+ if (ginCompareEntries(state, attnum, current_key, parent_key, current_key_category, parent_key_category) <= 0) {
+
+
+ /*
+ * There was a discrepancy between parent and child tuples.
+ * We need to verify it is not a result of concurrent call of
+ * gistplacetopage(). So, lock parent and try to find downlink
+ * for current page. It may be missing due to concurrent page
+ * split, this is OK.
+ */
+ pfree(stack->parenttup);
+ stack->parenttup = gin_refind_parent(rel, stack->parentblk,
+ stack->blkno, strategy);
+
+ /* We found it - make a final check before failing */
+ if (!stack->parenttup)
+ elog(NOTICE, "Unable to find parent tuple for block %u on block %u due to concurrent split",
+ stack->blkno, stack->parentblk);
+ else {
+ parent_key = gintuple_get_key(state, stack->parenttup, &parent_key_category);
+ if (ginCompareEntries(state, attnum, current_key, parent_key, current_key_category, parent_key_category) <=0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent records on page %u offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+ else {
+ /*
+ * But now it is properly adjusted - nothing to do here.
+ */
+ }
+ }
+ }
+ }
+
+ /* If this is an internal page, recurse into the child */
if (!GinPageIsLeaf(page))
{
GinScanItem *ptr;
@@ -227,6 +285,8 @@ gin_check_parent_keys_consistency(Relation rel)
ptr->next = stack->next;
stack->next = ptr;
}
+
+ prev_tuple = idxtuple;
}
LockBuffer(buffer, GIN_UNLOCK);
@@ -270,3 +330,50 @@ check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo)
errmsg("index \"%s\" has page %d with exceeding count of tuples",
RelationGetRelationName(rel), blockNo)));
}
+
+/*
+ * Try to re-find downlink pointing to 'blkno', in 'parentblkno'.
+ *
+ * If found, returns a palloc'd copy of the downlink tuple. Otherwise,
+ * returns NULL.
+ */
+static IndexTuple
+gin_refind_parent(Relation rel, BlockNumber parentblkno,
+ BlockNumber childblkno, BufferAccessStrategy strategy)
+{
+ Buffer parentbuf;
+ Page parentpage;
+ OffsetNumber o,
+ parent_maxoff;
+ IndexTuple result = NULL;
+
+ parentbuf = ReadBufferExtended(rel, MAIN_FORKNUM, parentblkno, RBM_NORMAL,
+ strategy);
+
+ LockBuffer(parentbuf, GIN_SHARE);
+ parentpage = BufferGetPage(parentbuf);
+
+ if (GinPageIsLeaf(parentpage))
+ {
+ UnlockReleaseBuffer(parentbuf);
+ return result;
+ }
+
+ parent_maxoff = PageGetMaxOffsetNumber(parentpage);
+ for (o = FirstOffsetNumber; o <= parent_maxoff; o = OffsetNumberNext(o))
+ {
+ ItemId p_iid = PageGetItemIdCareful(rel, parentblkno, parentpage, o, sizeof(GinPageOpaqueData));
+ IndexTuple itup = (IndexTuple) PageGetItem(parentpage, p_iid);
+
+ if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
+ {
+ /* Found it! Make copy and return it */
+ result = CopyIndexTuple(itup);
+ break;
+ }
+ }
+
+ UnlockReleaseBuffer(parentbuf);
+
+ return result;
+}
\ No newline at end of file
--
2.17.1
From 68552701e4fdb86e9c780545481a1c2286482f1a Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sun, 12 Apr 2020 16:31:52 +0500
Subject: [PATCH 06/15] Fix split detection
---
contrib/amcheck/verify_gin.c | 38 +++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index e400652a10..88866aad48 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -167,22 +167,32 @@ gin_check_parent_keys_consistency(Relation rel)
* when we scanned the parent. If so, add the right sibling to the
* stack now.
*/
- if (!GinPageRightMost(page) &&
- ginCompareItemPointers(&stack->parenttup->t_tid, GinDataPageGetRightBound(page)) >= 0)
- {
- /* split page detected, install right link to the stack */
- GinScanItem *ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
-
- ptr->depth = stack->depth;
- ptr->parenttup = CopyIndexTuple(stack->parenttup);
- ptr->parentblk = stack->parentblk;
- ptr->parentlsn = stack->parentlsn;
- ptr->blkno = GinPageGetOpaque(page)->rightlink;
- ptr->next = stack->next;
- stack->next = ptr;
+ if (stack -> parenttup != NULL ) {
+ GinNullCategory parent_key_category;
+ Datum parent_key = gintuple_get_key(state, stack->parenttup, &parent_key_category);
+ maxoff = PageGetMaxOffsetNumber(page);
+ ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, maxoff, sizeof(GinPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+ OffsetNumber attnum = gintuple_get_attrnum(state, idxtuple);
+ GinNullCategory page_max_key_category;
+ Datum page_max_key = gintuple_get_key(state, idxtuple, &page_max_key_category);
+
+ if (GinPageGetOpaque(page)->rightlink != InvalidBlockNumber &&
+ ginCompareEntries(state, attnum, page_max_key, parent_key, page_max_key_category, parent_key_category) <= 0 ) {
+ elog(INFO, "split detected");
+ /* split page detected, install right link to the stack */
+ GinScanItem *ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
+
+ ptr->depth = stack->depth;
+ ptr->parenttup = CopyIndexTuple(stack->parenttup);
+ ptr->parentblk = stack->parentblk;
+ ptr->parentlsn = stack->parentlsn;
+ ptr->blkno = GinPageGetOpaque(page)->rightlink;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
}
-
/* Check that the tree has the same height in all branches */
if (GinPageIsLeaf(page))
{
--
2.17.1
From e520ee72f96ad72a10d3546c9df85d8fc95c3d62 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sat, 2 May 2020 16:41:48 +0500
Subject: [PATCH 07/15] Add stub leaf validation
---
contrib/amcheck/verify_gin.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 88866aad48..45e00e5dfb 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -106,6 +106,20 @@ gin_index_checkable(Relation rel)
errdetail("Index is not valid")));
}
+static void validate_leaf(Page page, Relation rel, BlockNumber blkno) {
+ OffsetNumber i, maxoff;
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i)) {
+ ItemId iid = PageGetItemIdCareful(rel, blkno, page, i, sizeof(GinPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+ if (GinIsPostingTree(idxtuple)) {
+ elog(INFO, "validating posting tree on page %u, block %u, offset %u", page, blkno, i);
+ } else {
+ elog(INFO, "validating posting list on page %u, block %u, offset %u", page, blkno, i);
+ }
+ }
+}
+
/*
* Main entry point for GIN check. Allocates memory context and scans through
* GIN graph.
@@ -294,6 +308,8 @@ gin_check_parent_keys_consistency(Relation rel)
ptr->parentlsn = lsn;
ptr->next = stack->next;
stack->next = ptr;
+ } else {
+ validate_leaf(page, rel, stack->blkno);
}
prev_tuple = idxtuple;
--
2.17.1
From 769b1f5a1e9134f86ce099dd5de15a7ecb98ee64 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sat, 2 May 2020 16:59:36 +0500
Subject: [PATCH 08/15] Add posting list validation
---
contrib/amcheck/verify_gin.c | 50 ++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 45e00e5dfb..1a95f08be3 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -77,6 +77,44 @@ Datum gin_index_parent_check(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Read item pointers from leaf entry tuple.
+ *
+ * Returns a palloc'd array of ItemPointers. The number of items is returned
+ * in *nitems.
+ */
+ItemPointer
+ginReadTupleWithoutState( IndexTuple itup, int *nitems)
+{
+ Pointer ptr = GinGetPosting(itup);
+ int nipd = GinGetNPosting(itup);
+ ItemPointer ipd;
+ int ndecoded;
+
+ if (GinItupIsCompressed(itup))
+ {
+ if (nipd > 0)
+ {
+ ipd = ginPostingListDecode((GinPostingList *) ptr, &ndecoded);
+ if (nipd != ndecoded)
+ elog(ERROR, "number of items mismatch in GIN entry tuple, %d in tuple header, %d decoded",
+ nipd, ndecoded);
+ }
+ else
+ {
+ ipd = palloc(0);
+ }
+ }
+ else
+ {
+ ipd = (ItemPointer) palloc(sizeof(ItemPointerData) * nipd);
+ memcpy(ipd, ptr, sizeof(ItemPointerData) * nipd);
+ }
+ *nitems = nipd;
+ return ipd;
+}
+
+
/*
* Check that relation is eligible for GIN verification
*/
@@ -116,6 +154,18 @@ static void validate_leaf(Page page, Relation rel, BlockNumber blkno) {
elog(INFO, "validating posting tree on page %u, block %u, offset %u", page, blkno, i);
} else {
elog(INFO, "validating posting list on page %u, block %u, offset %u", page, blkno, i);
+
+ ItemPointer ipd;
+ int nipd;
+
+ ipd = ginReadTupleWithoutState(idxtuple, &nipd);
+ if (!OffsetNumberIsValid(ItemPointerGetOffsetNumber(&ipd[nipd-1]))){
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": posting list contains invalid heap pointer on block %u",
+ RelationGetRelationName(rel), blkno)));
+ }
+ pfree(ipd);
}
}
}
--
2.17.1
From e854aee17ae8c4eab0ab1024e419b1e22349deb7 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sun, 3 May 2020 19:23:37 +0500
Subject: [PATCH 09/15] Add iteration over posting tree
---
contrib/amcheck/amcheck.c | 21 ++--
contrib/amcheck/sql/check_gin.sql | 1 +
contrib/amcheck/verify_gin.c | 167 +++++++++++++++++++++++++++++-
3 files changed, 178 insertions(+), 11 deletions(-)
diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
index 5bc3f831bc..b0fe77c681 100644
--- a/contrib/amcheck/amcheck.c
+++ b/contrib/amcheck/amcheck.c
@@ -145,16 +145,17 @@ PageGetItemIdCareful(Relation rel, BlockNumber block, Page page,
* never uses either. Verify that line pointer has storage, too, since
* even LP_DEAD items should.
*/
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
- ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+// TODO adapt for gin and uncomment
+// if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
+// ItemIdGetLength(itemid) == 0)
+// ereport(ERROR,
+// (errcode(ERRCODE_INDEX_CORRUPTED),
+// errmsg("invalid line pointer storage in index \"%s\"",
+// RelationGetRelationName(rel)),
+// errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
+// block, offset, ItemIdGetOffset(itemid),
+// ItemIdGetLength(itemid),
+// ItemIdGetFlags(itemid))));
return itemid;
}
\ No newline at end of file
diff --git a/contrib/amcheck/sql/check_gin.sql b/contrib/amcheck/sql/check_gin.sql
index 789813accf..c685d73aed 100644
--- a/contrib/amcheck/sql/check_gin.sql
+++ b/contrib/amcheck/sql/check_gin.sql
@@ -2,6 +2,7 @@
SELECT setseed(1);
CREATE TABLE "gin_check"("Column1" int[]);
INSERT INTO gin_check select array_agg(round(random()*100)) from generate_series(1, 1000000) as i group by i % 100000;
+INSERT INTO gin_check select array_agg(100 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
SELECT gin_index_parent_check('gin_check_idx');
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 1a95f08be3..dd1b506dd1 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -39,6 +39,19 @@ typedef struct GinScanItem
struct GinScanItem *next;
} GinScanItem;
+/*
+ * GinPostingTreeScanItem represents one item of depth-first scan of GIN posting tree.
+ */
+typedef struct GinPostingTreeScanItem
+{
+ int depth;
+ PostingItem *parenttup;
+ BlockNumber parentblk;
+ BlockNumber blkno;
+ struct GinPostingTreeScanItem *next;
+} GinPostingTreeScanItem;
+
+
PG_FUNCTION_INFO_V1(gin_index_parent_check);
static void gin_index_checkable(Relation rel);
@@ -144,6 +157,154 @@ gin_index_checkable(Relation rel)
errdetail("Index is not valid")));
}
+/*
+ * Allocates memory context and scans through postigTree graph
+ *
+ */
+static void
+gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting_tree_root)
+{
+ BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
+ GinPostingTreeScanItem *stack;
+ MemoryContext mctx;
+ MemoryContext oldcontext;
+
+ int leafdepth;
+
+ mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(mctx);
+
+ /*
+ * We don't know the height of the tree yet, but as soon as we encounter a
+ * leaf page, we will set 'leafdepth' to its depth.
+ */
+ leafdepth = -1;
+
+ /* Start the scan at the root page */
+ stack = (GinPostingTreeScanItem *) palloc0(sizeof(GinPostingTreeScanItem));
+ stack->depth = 0;
+ stack->parenttup = NULL;
+ stack->parentblk = InvalidBlockNumber;
+ stack->blkno = posting_tree_root;
+
+ while (stack)
+ {
+// elog(INFO, "processing block %u", stack->blkno);
+ GinPostingTreeScanItem *stack_next;
+ Buffer buffer;
+ Page page;
+ OffsetNumber i, maxoff;
+
+ CHECK_FOR_INTERRUPTS();
+
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
+ RBM_NORMAL, strategy);
+ LockBuffer(buffer, GIN_SHARE);
+ page = (Page) BufferGetPage(buffer);
+// elog(INFO, "before assert");
+ Assert(GinPageIsData(page));
+// elog(INFO, "after assert");
+
+ /* Check that the tree has the same height in all branches */
+ if (GinPageIsLeaf(page))
+ {
+// elog(INFO, "page is leaf");
+ if (leafdepth == -1)
+ leafdepth = stack->depth;
+ else if (stack->depth != leafdepth)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ }
+// elog(INFO, "after checking for leaf assert");
+
+ if (stack->parenttup) {
+// elog(INFO, "parent block u% , parent offset u%", ItemPointerGetBlockNumberNoCheck(&stack->parenttup->key), ItemPointerGetOffsetNumberNoCheck(&stack->parenttup->key) );
+ }
+// elog(INFO, "after printing parent info");
+
+ /*
+ * Check that tuples in each page are properly ordered and consistent with parent high key
+ */
+ maxoff = PageGetMaxOffsetNumber(page);
+// elog(INFO, "maxoff %u", (unsigned int)maxoff);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+ {
+ PostingItem* posting_item = GinDataPageGetPostingItem(page, i);
+ if (ItemPointerGetBlockNumberNoCheck(&posting_item->key) == 0 || ItemPointerGetOffsetNumberNoCheck(&posting_item->key) == 0)
+ {
+ continue;
+ }
+// elog(INFO, "block %u offset %u", ItemPointerGetBlockNumber(&posting_item->key), ItemPointerGetOffsetNumber(&posting_item->key) );
+// if (i!= FirstOffsetNumber){
+// PostingItem* previous_posting_item = GinDataPageGetPostingItem(page, i-1);
+// if (ItemPointerCompare(&posting_item->key, &previous_posting_item->key) < 0 ) {
+// ereport(ERROR,
+// (errcode(ERRCODE_INDEX_CORRUPTED),
+// errmsg("index \"%s\" has wrong tuple order in posting tree, block %u, offset %u",
+// RelationGetRelationName(rel), stack->blkno, i)));
+// }
+//
+// }
+// elog(INFO, "Check if this tuple is consistent with the downlink in the");
+
+ /*
+ * Check if this tuple is consistent with the downlink in the
+ * parent.
+ */
+// if (stack->parenttup && i == maxoff) {
+// if (ItemPointerCompare(&stack->parenttup->key, &posting_item->key) < 0) {
+// ereport(ERROR,
+// (errcode(ERRCODE_INDEX_CORRUPTED),
+// errmsg("index \"%s\" has inconsistent records on page %u offset %u",
+// RelationGetRelationName(rel), stack->blkno, i)));
+//
+// }
+// }
+
+// elog(INFO, " If this is an internal page, recurse into the child ");
+ /* If this is an internal page, recurse into the child */
+ if (!GinPageIsLeaf(page))
+ {
+ GinPostingTreeScanItem *ptr;
+
+ ptr = (GinPostingTreeScanItem *) palloc(sizeof(GinPostingTreeScanItem));
+ ptr->depth = stack->depth + 1;
+ ptr->parenttup = posting_item;
+ ptr->parentblk = stack->blkno;
+ ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+
+ }
+
+// elog(INFO, " Step to next item in the queue ");
+ LockBuffer(buffer, GIN_UNLOCK);
+ ReleaseBuffer(buffer);
+
+ /* Step to next item in the queue */
+ stack_next = stack->next;
+// elog(INFO, " before freeing parent tup");
+//TODO uncomment and fix
+// if (stack->parenttup)
+// pfree(stack->parenttup);
+
+// elog(INFO, " after freeing parent tup");
+ pfree(stack);
+// elog(INFO, " after freeing stack");
+
+ stack = stack_next;
+ }
+// elog(INFO, "while ended");
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(mctx);
+}
+
static void validate_leaf(Page page, Relation rel, BlockNumber blkno) {
OffsetNumber i, maxoff;
maxoff = PageGetMaxOffsetNumber(page);
@@ -152,6 +313,8 @@ static void validate_leaf(Page page, Relation rel, BlockNumber blkno) {
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
if (GinIsPostingTree(idxtuple)) {
elog(INFO, "validating posting tree on page %u, block %u, offset %u", page, blkno, i);
+ BlockNumber rootPostingTree = GinGetPostingTree(idxtuple);
+ gin_check_posting_tree_parent_keys_consistency(rel, rootPostingTree);
} else {
elog(INFO, "validating posting list on page %u, block %u, offset %u", page, blkno, i);
@@ -452,4 +615,6 @@ gin_refind_parent(Relation rel, BlockNumber parentblkno,
UnlockReleaseBuffer(parentbuf);
return result;
-}
\ No newline at end of file
+}
+
+
--
2.17.1
From 34579226ff9c550754e4f6678178aa573c16818e Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Mon, 4 May 2020 14:41:00 +0500
Subject: [PATCH 10/15] Fix non-leaves posting tree validation
---
contrib/amcheck/verify_gin.c | 127 +++++++++++++++++------------------
1 file changed, 62 insertions(+), 65 deletions(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index dd1b506dd1..e1b901ba60 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -191,7 +191,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
while (stack)
{
-// elog(INFO, "processing block %u", stack->blkno);
+ elog(INFO, "processing block %u", stack->blkno);
GinPostingTreeScanItem *stack_next;
Buffer buffer;
Page page;
@@ -210,7 +210,6 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
/* Check that the tree has the same height in all branches */
if (GinPageIsLeaf(page))
{
-// elog(INFO, "page is leaf");
if (leafdepth == -1)
leafdepth = stack->depth;
else if (stack->depth != leafdepth)
@@ -218,88 +217,86 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
RelationGetRelationName(rel), stack->blkno)));
- }
-// elog(INFO, "after checking for leaf assert");
-
- if (stack->parenttup) {
-// elog(INFO, "parent block u% , parent offset u%", ItemPointerGetBlockNumberNoCheck(&stack->parenttup->key), ItemPointerGetOffsetNumberNoCheck(&stack->parenttup->key) );
- }
-// elog(INFO, "after printing parent info");
+ } else {
- /*
- * Check that tuples in each page are properly ordered and consistent with parent high key
- */
- maxoff = PageGetMaxOffsetNumber(page);
-// elog(INFO, "maxoff %u", (unsigned int)maxoff);
- for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
- {
- PostingItem* posting_item = GinDataPageGetPostingItem(page, i);
- if (ItemPointerGetBlockNumberNoCheck(&posting_item->key) == 0 || ItemPointerGetOffsetNumberNoCheck(&posting_item->key) == 0)
- {
- continue;
- }
-// elog(INFO, "block %u offset %u", ItemPointerGetBlockNumber(&posting_item->key), ItemPointerGetOffsetNumber(&posting_item->key) );
-// if (i!= FirstOffsetNumber){
-// PostingItem* previous_posting_item = GinDataPageGetPostingItem(page, i-1);
-// if (ItemPointerCompare(&posting_item->key, &previous_posting_item->key) < 0 ) {
-// ereport(ERROR,
-// (errcode(ERRCODE_INDEX_CORRUPTED),
-// errmsg("index \"%s\" has wrong tuple order in posting tree, block %u, offset %u",
-// RelationGetRelationName(rel), stack->blkno, i)));
-// }
-//
-// }
-// elog(INFO, "Check if this tuple is consistent with the downlink in the");
/*
- * Check if this tuple is consistent with the downlink in the
- * parent.
+ * Check that tuples in each page are properly ordered and consistent with parent high key
*/
-// if (stack->parenttup && i == maxoff) {
-// if (ItemPointerCompare(&stack->parenttup->key, &posting_item->key) < 0) {
-// ereport(ERROR,
-// (errcode(ERRCODE_INDEX_CORRUPTED),
-// errmsg("index \"%s\" has inconsistent records on page %u offset %u",
-// RelationGetRelationName(rel), stack->blkno, i)));
-//
-// }
-// }
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i)) {
+ PostingItem *posting_item = GinDataPageGetPostingItem(page, i);
+ if (ItemPointerGetBlockNumberNoCheck(&posting_item->key) == 0 ||
+ ItemPointerGetOffsetNumberNoCheck(&posting_item->key) == 0) {
+ continue;
+ }
+ elog(INFO, "key block %u offset %u", ItemPointerGetBlockNumber(&posting_item->key),
+ ItemPointerGetOffsetNumber(&posting_item->key));
+
+ if (i != FirstOffsetNumber) {
+ PostingItem *previous_posting_item = GinDataPageGetPostingItem(page, i - 1);
+
+ if (ItemPointerCompare(&posting_item->key, &previous_posting_item->key) < 0) {
+
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has wrong tuple order in posting tree, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+ }
+
+ }
+// elog(INFO, "Check if this tuple is consistent with the downlink in the");
+
+ /*
+ * Check if this tuple is consistent with the downlink in the
+ * parent.
+ */
+ if (stack->parenttup && i == maxoff) {
+ elog(INFO, "max offset is reached");
+ if (ItemPointerCompare(&stack->parenttup->key, &posting_item->key) < 0) {
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent records on page %u offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+
+ }
+ }
// elog(INFO, " If this is an internal page, recurse into the child ");
- /* If this is an internal page, recurse into the child */
- if (!GinPageIsLeaf(page))
- {
- GinPostingTreeScanItem *ptr;
-
- ptr = (GinPostingTreeScanItem *) palloc(sizeof(GinPostingTreeScanItem));
- ptr->depth = stack->depth + 1;
- ptr->parenttup = posting_item;
- ptr->parentblk = stack->blkno;
- ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
- ptr->next = stack->next;
- stack->next = ptr;
- }
+ /* If this is an internal page, recurse into the child */
+ if (!GinPageIsLeaf(page)) {
+ GinPostingTreeScanItem *ptr;
+
+ ptr = (GinPostingTreeScanItem *) palloc(sizeof(GinPostingTreeScanItem));
+ ptr->depth = stack->depth + 1;
+ ptr->parenttup = posting_item;
+ ptr->parentblk = stack->blkno;
+ ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+ }
}
-
-// elog(INFO, " Step to next item in the queue ");
LockBuffer(buffer, GIN_UNLOCK);
ReleaseBuffer(buffer);
/* Step to next item in the queue */
stack_next = stack->next;
// elog(INFO, " before freeing parent tup");
-//TODO uncomment and fix
-// if (stack->parenttup)
-// pfree(stack->parenttup);
+ //TODO uncomment and fix
+// if (stack->parenttup)
+// pfree(stack->parenttup);
-// elog(INFO, " after freeing parent tup");
+ // elog(INFO, " after freeing parent tup");
pfree(stack);
-// elog(INFO, " after freeing stack");
+ // elog(INFO, " after freeing stack");
stack = stack_next;
}
-// elog(INFO, "while ended");
+
+
+ // elog(INFO, "while ended");
MemoryContextSwitchTo(oldcontext);
MemoryContextDelete(mctx);
--
2.17.1
From 6b853444e7ec4b384660e5a34f6e9548abd0e973 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Mon, 4 May 2020 19:39:13 +0500
Subject: [PATCH 11/15] Fix parent check in posting tree
---
contrib/amcheck/verify_gin.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index e1b901ba60..7b1b17bcc6 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -45,8 +45,7 @@ typedef struct GinScanItem
typedef struct GinPostingTreeScanItem
{
int depth;
- PostingItem *parenttup;
- BlockNumber parentblk;
+ ItemPointer parentkey;
BlockNumber blkno;
struct GinPostingTreeScanItem *next;
} GinPostingTreeScanItem;
@@ -185,8 +184,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
/* Start the scan at the root page */
stack = (GinPostingTreeScanItem *) palloc0(sizeof(GinPostingTreeScanItem));
stack->depth = 0;
- stack->parenttup = NULL;
- stack->parentblk = InvalidBlockNumber;
+ stack->parentkey = NULL;
stack->blkno = posting_tree_root;
while (stack)
@@ -208,8 +206,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
// elog(INFO, "after assert");
/* Check that the tree has the same height in all branches */
- if (GinPageIsLeaf(page))
- {
+ if (GinPageIsLeaf(page)) {
if (leafdepth == -1)
leafdepth = stack->depth;
else if (stack->depth != leafdepth)
@@ -217,6 +214,19 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
RelationGetRelationName(rel), stack->blkno)));
+ ItemPointerData minItem;
+ ItemPointerSetMin(&minItem);
+ int nlist;
+ ItemPointerData *list;
+ elog(INFO, "before GinDataLeafPageGetItems");
+ list = GinDataLeafPageGetItems(page, nlist, minItem);
+ elog(INFO, "after GinDataLeafPageGetItems");
+ if (stack->parentkey && ItemPointerCompare(stack->parentkey, &list[0]) < 0) {
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": tid exceeds parent's high key in postingTree leaf on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ }
} else {
@@ -251,9 +261,9 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
* Check if this tuple is consistent with the downlink in the
* parent.
*/
- if (stack->parenttup && i == maxoff) {
+ if (stack->parentkey && i == maxoff) {
elog(INFO, "max offset is reached");
- if (ItemPointerCompare(&stack->parenttup->key, &posting_item->key) < 0) {
+ if (ItemPointerCompare(stack->parentkey, &posting_item->key) < 0) {
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has inconsistent records on page %u offset %u",
@@ -269,8 +279,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
ptr = (GinPostingTreeScanItem *) palloc(sizeof(GinPostingTreeScanItem));
ptr->depth = stack->depth + 1;
- ptr->parenttup = posting_item;
- ptr->parentblk = stack->blkno;
+ ptr->parentkey = &posting_item->key;
ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
ptr->next = stack->next;
stack->next = ptr;
@@ -284,11 +293,6 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
/* Step to next item in the queue */
stack_next = stack->next;
// elog(INFO, " before freeing parent tup");
- //TODO uncomment and fix
-// if (stack->parenttup)
-// pfree(stack->parenttup);
-
- // elog(INFO, " after freeing parent tup");
pfree(stack);
// elog(INFO, " after freeing stack");
@@ -613,5 +617,3 @@ gin_refind_parent(Relation rel, BlockNumber parentblkno,
return result;
}
-
-
--
2.17.1
From 07e8bb45f446ee5ab2a513f451f23695632ba847 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Sun, 17 May 2020 16:43:13 +0500
Subject: [PATCH 12/15] WIP all known bugs fixed except for gitCompareEntries
---
contrib/amcheck/sql/check_gin.sql | 19 +++++++-
contrib/amcheck/verify_gin.c | 78 +++++++++++++++++--------------
src/include/access/nbtree.h | 1 +
3 files changed, 61 insertions(+), 37 deletions(-)
diff --git a/contrib/amcheck/sql/check_gin.sql b/contrib/amcheck/sql/check_gin.sql
index c685d73aed..f5c11ad02d 100644
--- a/contrib/amcheck/sql/check_gin.sql
+++ b/contrib/amcheck/sql/check_gin.sql
@@ -1,10 +1,25 @@
-- minimal test, basically just verifying that amcheck works with GiST
SELECT setseed(1);
CREATE TABLE "gin_check"("Column1" int[]);
-INSERT INTO gin_check select array_agg(round(random()*100)) from generate_series(1, 1000000) as i group by i % 100000;
-INSERT INTO gin_check select array_agg(100 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
+-- posting trees
+INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+-- INSERT INTO gin_check select array_agg(100 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
SELECT gin_index_parent_check('gin_check_idx');
-- cleanup
DROP TABLE gin_check;
+
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+CREATE TABLE "gin_check_text"("Column1" varchar[32]);
+-- posting trees
+INSERT INTO gin_check_text select array_agg(md5(round(random()*300)::text)) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+-- INSERT INTO gin_check select array_agg(100 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
+CREATE INDEX gin_check_idx_text on "gin_check_text" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_idx_text');
+
+-- cleanup
+DROP TABLE gin_check_text;
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 7b1b17bcc6..f8711f4e97 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -20,11 +20,13 @@
#include "postgres.h"
#include "access/gin_private.h"
+#include "access/nbtree.h"
#include "amcheck.h"
#include "catalog/pg_am.h"
#include "miscadmin.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "string.h"
/*
* GinScanItem represents one item of depth-first scan of GIN index.
@@ -201,9 +203,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
RBM_NORMAL, strategy);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
-// elog(INFO, "before assert");
Assert(GinPageIsData(page));
-// elog(INFO, "after assert");
/* Check that the tree has the same height in all branches */
if (GinPageIsLeaf(page)) {
@@ -218,9 +218,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
ItemPointerSetMin(&minItem);
int nlist;
ItemPointerData *list;
- elog(INFO, "before GinDataLeafPageGetItems");
list = GinDataLeafPageGetItems(page, nlist, minItem);
- elog(INFO, "after GinDataLeafPageGetItems");
if (stack->parentkey && ItemPointerCompare(stack->parentkey, &list[0]) < 0) {
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -255,14 +253,12 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
}
}
-// elog(INFO, "Check if this tuple is consistent with the downlink in the");
/*
* Check if this tuple is consistent with the downlink in the
* parent.
*/
if (stack->parentkey && i == maxoff) {
- elog(INFO, "max offset is reached");
if (ItemPointerCompare(stack->parentkey, &posting_item->key) < 0) {
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -272,7 +268,6 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
}
}
-// elog(INFO, " If this is an internal page, recurse into the child ");
/* If this is an internal page, recurse into the child */
if (!GinPageIsLeaf(page)) {
GinPostingTreeScanItem *ptr;
@@ -292,16 +287,10 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
/* Step to next item in the queue */
stack_next = stack->next;
-// elog(INFO, " before freeing parent tup");
pfree(stack);
- // elog(INFO, " after freeing stack");
-
stack = stack_next;
}
-
- // elog(INFO, "while ended");
-
MemoryContextSwitchTo(oldcontext);
MemoryContextDelete(mctx);
}
@@ -345,7 +334,7 @@ gin_check_parent_keys_consistency(Relation rel)
GinScanItem *stack;
MemoryContext mctx;
MemoryContext oldcontext;
- GinState *state;
+ GinState state;
int leafdepth;
@@ -353,8 +342,7 @@ gin_check_parent_keys_consistency(Relation rel)
"amcheck context",
ALLOCSET_DEFAULT_SIZES);
oldcontext = MemoryContextSwitchTo(mctx);
- state = (GinState *) palloc(sizeof(GinState));
- initGinState(state, rel);
+ initGinState(&state, rel);
/*
* We don't know the height of the tree yet, but as soon as we encounter a
@@ -372,6 +360,7 @@ gin_check_parent_keys_consistency(Relation rel)
while (stack)
{
+ elog(INFO, "processing block %u", stack->blkno);
GinScanItem *stack_next;
Buffer buffer;
Page page;
@@ -397,16 +386,19 @@ gin_check_parent_keys_consistency(Relation rel)
*/
if (stack -> parenttup != NULL ) {
GinNullCategory parent_key_category;
- Datum parent_key = gintuple_get_key(state, stack->parenttup, &parent_key_category);
+ Datum parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
maxoff = PageGetMaxOffsetNumber(page);
ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, maxoff, sizeof(GinPageOpaqueData));
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
- OffsetNumber attnum = gintuple_get_attrnum(state, idxtuple);
+ OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
GinNullCategory page_max_key_category;
- Datum page_max_key = gintuple_get_key(state, idxtuple, &page_max_key_category);
+ Datum page_max_key = gintuple_get_key(&state, idxtuple, &page_max_key_category);
if (GinPageGetOpaque(page)->rightlink != InvalidBlockNumber &&
- ginCompareEntries(state, attnum, page_max_key, parent_key, page_max_key_category, parent_key_category) <= 0 ) {
+ DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
+ state.supportCollation[attnum - 1],
+ parent_key, page_max_key_category)) <= 0) {
+// ginCompareEntries(&state, attnum, page_max_key, parent_key, page_max_key_category, parent_key_category) <= 0 ) {
elog(INFO, "split detected");
/* split page detected, install right link to the stack */
GinScanItem *ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
@@ -442,8 +434,7 @@ gin_check_parent_keys_consistency(Relation rel)
{
ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, i, sizeof(GinPageOpaqueData));
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
- OffsetNumber attnum = gintuple_get_attrnum(state, idxtuple);
-
+ OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -453,13 +444,18 @@ gin_check_parent_keys_consistency(Relation rel)
GinNullCategory prev_key_category;
Datum prev_key;
GinNullCategory current_key_category;
- Datum current_key = gintuple_get_key(state, idxtuple, ¤t_key_category);
-
- if (i != FirstOffsetNumber) {
- prev_key = gintuple_get_key(state, prev_tuple, &prev_key_category);
-
- if (ginCompareEntries(state, attnum, prev_key, current_key, prev_key_category, current_key_category) <= 0)
- ereport(ERROR,
+ Datum current_key = gintuple_get_key(&state, idxtuple, ¤t_key_category);
+
+ if (i != FirstOffsetNumber && stack->blkno != (BlockNumber) 1) {
+ prev_key = gintuple_get_key(&state, prev_tuple, &prev_key_category);
+// elog(INFO, "prev_key_category %u, current_key_category %u", prev_key_category, current_key_category);
+ elog(INFO, "prev_key %u, current_key %u", prev_key, current_key);
+ elog(INFO, "collation id %u", state.supportCollation[0]);
+ if (DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
+ state.supportCollation[attnum - 1],
+ prev_key, current_key)) >= 0)
+// if (ginCompareEntries(&state, attnum, prev_key, current_key, prev_key_category, current_key_category) >= 0)
+ ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has wrong tuple order, block %u, offset %u",
RelationGetRelationName(rel), stack->blkno, i)));
@@ -473,10 +469,18 @@ gin_check_parent_keys_consistency(Relation rel)
if (stack->parenttup &&
i == maxoff) {
GinNullCategory parent_key_category;
- Datum parent_key = gintuple_get_key(state, stack->parenttup, &parent_key_category);
-
- if (ginCompareEntries(state, attnum, current_key, parent_key, current_key_category, parent_key_category) <= 0) {
-
+ Datum parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
+// elog(INFO, "parent_key %lu, current_key %lu", parent_key, current_key);
+// elog(INFO, "parent_key %lu, current_key %lu", DatumGetInt32(parent_key), DatumGetInt32(current_key));
+// elog(INFO, "comparision result %i", ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category));
+// elog(INFO, "comparision result %i", DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
+// state.supportCollation[attnum - 1],
+// current_key, parent_key)));
+
+// if (ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category) < 0) {
+ if (DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
+ state.supportCollation[attnum - 1],
+ current_key, parent_key)) < 0){
/*
* There was a discrepancy between parent and child tuples.
@@ -494,8 +498,12 @@ gin_check_parent_keys_consistency(Relation rel)
elog(NOTICE, "Unable to find parent tuple for block %u on block %u due to concurrent split",
stack->blkno, stack->parentblk);
else {
- parent_key = gintuple_get_key(state, stack->parenttup, &parent_key_category);
- if (ginCompareEntries(state, attnum, current_key, parent_key, current_key_category, parent_key_category) <=0)
+ parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
+ elog(INFO, "parent_key %u, current_key %u after gin_refind_parent", parent_key, current_key);
+// if (ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category) <0)
+ if (DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
+ state.supportCollation[attnum - 1],
+ current_key, parent_key)) < 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has inconsistent records on page %u offset %u",
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 84a7e80e24..99efaff5f9 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1153,4 +1153,5 @@ extern IndexBuildResult *btbuild(Relation heap, Relation index,
struct IndexInfo *indexInfo);
extern void _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc);
+extern Datum btint4cmp(PG_FUNCTION_ARGS);
#endif /* NBTREE_H */
--
2.17.1
From 9aa53a73ae26e2a8113ce5351718187face55427 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Wed, 20 May 2020 22:03:54 +0500
Subject: [PATCH 13/15] Fixed ordering checks
---
contrib/amcheck/verify_gin.c | 35 +++++++++--------------------------
1 file changed, 9 insertions(+), 26 deletions(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index f8711f4e97..0ee903ad65 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -218,7 +218,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
ItemPointerSetMin(&minItem);
int nlist;
ItemPointerData *list;
- list = GinDataLeafPageGetItems(page, nlist, minItem);
+ list = GinDataLeafPageGetItems(page, &nlist, minItem);
if (stack->parentkey && ItemPointerCompare(stack->parentkey, &list[0]) < 0) {
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -395,10 +395,7 @@ gin_check_parent_keys_consistency(Relation rel)
Datum page_max_key = gintuple_get_key(&state, idxtuple, &page_max_key_category);
if (GinPageGetOpaque(page)->rightlink != InvalidBlockNumber &&
- DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
- state.supportCollation[attnum - 1],
- parent_key, page_max_key_category)) <= 0) {
-// ginCompareEntries(&state, attnum, page_max_key, parent_key, page_max_key_category, parent_key_category) <= 0 ) {
+ ginCompareEntries(&state, attnum, page_max_key, page_max_key_category, parent_key, parent_key_category) > 0 ) {
elog(INFO, "split detected");
/* split page detected, install right link to the stack */
GinScanItem *ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
@@ -448,18 +445,12 @@ gin_check_parent_keys_consistency(Relation rel)
if (i != FirstOffsetNumber && stack->blkno != (BlockNumber) 1) {
prev_key = gintuple_get_key(&state, prev_tuple, &prev_key_category);
-// elog(INFO, "prev_key_category %u, current_key_category %u", prev_key_category, current_key_category);
- elog(INFO, "prev_key %u, current_key %u", prev_key, current_key);
- elog(INFO, "collation id %u", state.supportCollation[0]);
- if (DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
- state.supportCollation[attnum - 1],
- prev_key, current_key)) >= 0)
-// if (ginCompareEntries(&state, attnum, prev_key, current_key, prev_key_category, current_key_category) >= 0)
+
+ if (ginCompareEntries(&state, attnum, prev_key, prev_key_category, current_key, current_key_category) >= 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has wrong tuple order, block %u, offset %u",
RelationGetRelationName(rel), stack->blkno, i)));
-
}
/*
@@ -473,15 +464,8 @@ gin_check_parent_keys_consistency(Relation rel)
// elog(INFO, "parent_key %lu, current_key %lu", parent_key, current_key);
// elog(INFO, "parent_key %lu, current_key %lu", DatumGetInt32(parent_key), DatumGetInt32(current_key));
// elog(INFO, "comparision result %i", ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category));
-// elog(INFO, "comparision result %i", DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
-// state.supportCollation[attnum - 1],
-// current_key, parent_key)));
-
-// if (ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category) < 0) {
- if (DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
- state.supportCollation[attnum - 1],
- current_key, parent_key)) < 0){
+ if (ginCompareEntries(&state, attnum, current_key, current_key_category, parent_key, parent_key_category) > 0) {
/*
* There was a discrepancy between parent and child tuples.
* We need to verify it is not a result of concurrent call of
@@ -500,10 +484,8 @@ gin_check_parent_keys_consistency(Relation rel)
else {
parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
elog(INFO, "parent_key %u, current_key %u after gin_refind_parent", parent_key, current_key);
-// if (ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category) <0)
- if (DatumGetInt32(DirectFunctionCall2Coll(btint4cmp,
- state.supportCollation[attnum - 1],
- current_key, parent_key)) < 0)
+
+ if (ginCompareEntries(&state, attnum, current_key, current_key_category, parent_key, parent_key_category) >0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has inconsistent records on page %u offset %u",
@@ -534,7 +516,8 @@ gin_check_parent_keys_consistency(Relation rel)
validate_leaf(page, rel, stack->blkno);
}
- prev_tuple = idxtuple;
+
+ prev_tuple = CopyIndexTuple(idxtuple);
}
LockBuffer(buffer, GIN_UNLOCK);
--
2.17.1
From 551340cf68f2c30e4eb6dd9760d2e85b173c86d3 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Wed, 20 May 2020 23:27:56 +0500
Subject: [PATCH 14/15] fix comparision
---
contrib/amcheck/sql/check_gin.sql | 42 ++++++++++++++++++++++++++-----
contrib/amcheck/verify_gin.c | 20 ++++++++-------
2 files changed, 47 insertions(+), 15 deletions(-)
diff --git a/contrib/amcheck/sql/check_gin.sql b/contrib/amcheck/sql/check_gin.sql
index f5c11ad02d..95f7036038 100644
--- a/contrib/amcheck/sql/check_gin.sql
+++ b/contrib/amcheck/sql/check_gin.sql
@@ -4,7 +4,7 @@ CREATE TABLE "gin_check"("Column1" int[]);
-- posting trees
INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
-- posting leaves
--- INSERT INTO gin_check select array_agg(100 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
+INSERT INTO gin_check select array_agg(255 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
SELECT gin_index_parent_check('gin_check_idx');
@@ -13,13 +13,43 @@ DROP TABLE gin_check;
-- minimal test, basically just verifying that amcheck works with GiST
SELECT setseed(1);
-CREATE TABLE "gin_check_text"("Column1" varchar[32]);
+CREATE TABLE "gin_check"("Column1" int[]);
+CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
+ALTER INDEX gin_check_idx SET (fastupdate = false);
+-- posting trees
+INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check select array_agg(100 + round(random()*255)) from generate_series(1, 100) as i group by i % 100;
+
+SELECT gin_index_parent_check('gin_check_idx');
+
+-- cleanup
+DROP TABLE gin_check;
+
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+CREATE TABLE "gin_check_text_array"("Column1" text[]);
-- posting trees
-INSERT INTO gin_check_text select array_agg(md5(round(random()*300)::text)) from generate_series(1, 1000000) as i group by i % 100000;
+INSERT INTO gin_check_text_array select array_agg(md5(round(random()*300)::text)::text) from generate_series(1, 1000000) as i group by i % 100000;
-- posting leaves
--- INSERT INTO gin_check select array_agg(100 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
-CREATE INDEX gin_check_idx_text on "gin_check_text" USING GIN("Column1");
-SELECT gin_index_parent_check('gin_check_idx_text');
+INSERT INTO gin_check_text_array select array_agg(md5(round(random()*300 + 300)::text)::text) from generate_series(1, 10000) as i group by i % 100;
+CREATE INDEX gin_check_text_array_idx on "gin_check_text_array" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_text_array');
+
+-- cleanup
+DROP TABLE gin_check_text_array;
+
+
+-- minimal test, basically just verifying that amcheck works with GiST
+SELECT setseed(1);
+CREATE TABLE gin_check_text (
+ i text
+);
+CREATE INDEX gin_check_text_idx on "gin_check_text" USING GIN(i);
+ALTER INDEX gin_check_text_idx SET (fastupdate = false);
+INSERT INTO gin_check_text select md5(random()::text) from generate_series(1, 1000);
+
+SELECT gin_index_parent_check('gin_check_text_idx');
-- cleanup
DROP TABLE gin_check_text;
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 0ee903ad65..7e56bd2e58 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -302,11 +302,11 @@ static void validate_leaf(Page page, Relation rel, BlockNumber blkno) {
ItemId iid = PageGetItemIdCareful(rel, blkno, page, i, sizeof(GinPageOpaqueData));
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
if (GinIsPostingTree(idxtuple)) {
- elog(INFO, "validating posting tree on page %u, block %u, offset %u", page, blkno, i);
+// elog(INFO, "validating posting tree on page %u, block %u, offset %u", page, blkno, i);
BlockNumber rootPostingTree = GinGetPostingTree(idxtuple);
gin_check_posting_tree_parent_keys_consistency(rel, rootPostingTree);
} else {
- elog(INFO, "validating posting list on page %u, block %u, offset %u", page, blkno, i);
+// elog(INFO, "validating posting list on page %u, block %u, offset %u", page, blkno, i);
ItemPointer ipd;
int nipd;
@@ -442,10 +442,11 @@ gin_check_parent_keys_consistency(Relation rel)
Datum prev_key;
GinNullCategory current_key_category;
Datum current_key = gintuple_get_key(&state, idxtuple, ¤t_key_category);
+// elog(INFO, "current_key %s, depth %u", DatumGetCString(current_key), stack->depth);
+ // (apparently) first block is metadata, skip order check
if (i != FirstOffsetNumber && stack->blkno != (BlockNumber) 1) {
prev_key = gintuple_get_key(&state, prev_tuple, &prev_key_category);
-
if (ginCompareEntries(&state, attnum, prev_key, prev_key_category, current_key, current_key_category) >= 0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -461,9 +462,6 @@ gin_check_parent_keys_consistency(Relation rel)
i == maxoff) {
GinNullCategory parent_key_category;
Datum parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
-// elog(INFO, "parent_key %lu, current_key %lu", parent_key, current_key);
-// elog(INFO, "parent_key %lu, current_key %lu", DatumGetInt32(parent_key), DatumGetInt32(current_key));
-// elog(INFO, "comparision result %i", ginCompareEntries(&state, attnum, current_key, parent_key, current_key_category, parent_key_category));
if (ginCompareEntries(&state, attnum, current_key, current_key_category, parent_key, parent_key_category) > 0) {
/*
@@ -483,8 +481,6 @@ gin_check_parent_keys_consistency(Relation rel)
stack->blkno, stack->parentblk);
else {
parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
- elog(INFO, "parent_key %u, current_key %u after gin_refind_parent", parent_key, current_key);
-
if (ginCompareEntries(&state, attnum, current_key, current_key_category, parent_key, parent_key_category) >0)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -506,7 +502,13 @@ gin_check_parent_keys_consistency(Relation rel)
ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
ptr->depth = stack->depth + 1;
- ptr->parenttup = CopyIndexTuple(idxtuple);
+ // last tuple in layer has no high key
+ if (i != maxoff && !GinPageGetOpaque(page)->rightlink) {
+ ptr->parenttup = CopyIndexTuple(idxtuple);
+ }
+ else {
+ ptr->parenttup = NULL;
+ }
ptr->parentblk = stack->blkno;
ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
ptr->parentlsn = lsn;
--
2.17.1
From 07bb4d03e11ae0f9cf089eb27d7bcedef24e8ea9 Mon Sep 17 00:00:00 2001
From: eien <GSKryachko@gmail.com>
Date: Mon, 25 May 2020 21:18:48 +0500
Subject: [PATCH 15/15] remove debug-time change
---
src/include/access/nbtree.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 99efaff5f9..84a7e80e24 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1153,5 +1153,4 @@ extern IndexBuildResult *btbuild(Relation heap, Relation index,
struct IndexInfo *indexInfo);
extern void _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc);
-extern Datum btint4cmp(PG_FUNCTION_ARGS);
#endif /* NBTREE_H */
--
2.17.1
On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko@gmail.com> wrote:
Here is the patch which I (with Andrey as my advisor) built on the top of the last patch from this thread: https://commitfest.postgresql.org/25/1800/ .
It adds an ability to verify validity of GIN index. It is not polished yet, but it works and we wanted to show it to you so you can give us some feedback, and also let you know about this work if you have any plans of writing something like that yourselves, so that you do not redo what is already done.
Can you rebase this patch, please?
Also suggest breaking out the series into distinct patch files using
"git format-patch master".
--
Peter Geoghegan
On 07/08/2020 00:33, Peter Geoghegan wrote:
On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko@gmail.com> wrote:
Here is the patch which I (with Andrey as my advisor) built on the top of the last patch from this thread: https://commitfest.postgresql.org/25/1800/ .
It adds an ability to verify validity of GIN index. It is not polished yet, but it works and we wanted to show it to you so you can give us some feedback, and also let you know about this work if you have any plans of writing something like that yourselves, so that you do not redo what is already done.Can you rebase this patch, please?
Also suggest breaking out the series into distinct patch files using
"git format-patch master".
I rebased the GIN parts of this patch, see attached. I also ran pgindent
and made some other tiny cosmetic fixes, but I didn't review the patch,
only rebased it in the state it was.
I was hoping that this would be useful to track down the bug we're
discussing here:
/messages/by-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw@mail.gmail.com.
But now that I look what checks this performs, I doubt this will catch
the kind of corruption that's happened there. I suspect it's more subtle
than an inconsistencies between parent and child pages, because only a
few rows are affected. But doesn't hurt to try.
- Heikki
Attachments:
v2-0001-Amcheck-for-GIN.patchtext/x-patch; charset=UTF-8; name=v2-0001-Amcheck-for-GIN.patchDownload
From 38ef30684b0248bae0d0e82e3919118ed629613f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 15 Jul 2021 08:51:35 +0300
Subject: [PATCH v2 1/1] Amcheck for GIN
Author: Grigory Kryachko
Discussion: https://www.postgresql.org/message-id/CAF3eApa07-BajjG8%2BRYx-Dr_cq28ZA0GsZmUQrGu5b2ayRhB5A%40mail.gmail.com
---
contrib/amcheck/Makefile | 8 +-
contrib/amcheck/amcheck--1.3--1.4.sql | 15 +
contrib/amcheck/amcheck.c | 161 ++++++
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 23 +
contrib/amcheck/expected/check_gin.out | 60 +++
contrib/amcheck/sql/check_gin.sql | 40 ++
contrib/amcheck/verify_gin.c | 695 +++++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 19 +
9 files changed, 1020 insertions(+), 3 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.3--1.4.sql
create mode 100644 contrib/amcheck/amcheck.c
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_gin.out
create mode 100644 contrib/amcheck/sql/check_gin.sql
create mode 100644 contrib/amcheck/verify_gin.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index b82f221e50b..3418c30c50b 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,14 +3,18 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ amcheck.o \
+ verify_gin.o \
verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.3--1.4.sql amcheck--1.2--1.3.sql \
+ amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql \
+ amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree check_heap
+REGRESS = check check_btree check_heap check_gin
TAP_TESTS = 1
diff --git a/contrib/amcheck/amcheck--1.3--1.4.sql b/contrib/amcheck/amcheck--1.3--1.4.sql
new file mode 100644
index 00000000000..60d9dd1ab94
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.3--1.4.sql
@@ -0,0 +1,15 @@
+/* contrib/amcheck/amcheck--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.4'" to load this file. \quit
+
+
+--
+-- gin_index_parent_check()
+--
+CREATE FUNCTION gin_index_parent_check(index regclass)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'gin_index_parent_check'
+LANGUAGE C STRICT;
+
+REVOKE ALL ON FUNCTION gin_index_parent_check(regclass) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
new file mode 100644
index 00000000000..e068b31b431
--- /dev/null
+++ b/contrib/amcheck/amcheck.c
@@ -0,0 +1,161 @@
+/*-------------------------------------------------------------------------
+ *
+ * amcheck.c
+ * Utility functions common to all access methods.
+ *
+ * Copyright (c) 2017-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "amcheck.h"
+#include "catalog/index.h"
+#include "commands/tablecmds.h"
+
+
+/*
+ * Lock acquisition reused across different am types
+ */
+void
+amcheck_lock_relation(Oid indrelid, Relation *indrel,
+ Relation *heaprel, LOCKMODE lockmode)
+{
+ Oid heapid;
+
+ /*
+ * We must lock table before index to avoid deadlocks. However, if the
+ * passed indrelid isn't an index then IndexGetRelation() will fail.
+ * Rather than emitting a not-very-helpful error message, postpone
+ * complaining, expecting that the is-it-an-index test below will fail.
+ *
+ * In hot standby mode this will raise an error when lockmode is
+ * AccessExclusiveLock.
+ */
+ heapid = IndexGetRelation(indrelid, true);
+ if (OidIsValid(heapid))
+ *heaprel = table_open(heapid, lockmode);
+ else
+ *heaprel = NULL;
+
+ /*
+ * Open the target index relations separately (like relation_openrv(), but
+ * with heap relation locked first to prevent deadlocking). In hot
+ * standby mode this will raise an error when lockmode is
+ * AccessExclusiveLock.
+ *
+ * There is no need for the usual indcheckxmin usability horizon test
+ * here, even in the nbtree heapallindexed case, because index undergoing
+ * verification only needs to have entries for a new transaction snapshot.
+ * (If caller is about to do nbtree parentcheck verification, there is no
+ * question about committed or recently dead heap tuples lacking index
+ * entries due to concurrent activity.)
+ */
+ *indrel = index_open(indrelid, lockmode);
+
+ /*
+ * Since we did the IndexGetRelation call above without any lock, it's
+ * barely possible that a race against an index drop/recreation could have
+ * netted us the wrong table.
+ */
+ if (*heaprel == NULL || heapid != IndexGetRelation(indrelid, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_TABLE),
+ errmsg("could not open parent table of index %s",
+ RelationGetRelationName(*indrel))));
+}
+
+/*
+ * Unlock index and heap relations early for amcheck_lock_relation() caller.
+ *
+ * This is ok because nothing in the called routines will trigger shared cache
+ * invalidations to be sent, so we can relax the usual pattern of only
+ * releasing locks after commit.
+ */
+void
+amcheck_unlock_relation(Oid indrelid, Relation indrel, Relation heaprel,
+ LOCKMODE lockmode)
+{
+ index_close(indrel, lockmode);
+ if (heaprel)
+ table_close(heaprel, lockmode);
+}
+
+/*
+ * Check if index relation should have a file for its main relation
+ * fork. Verification uses this to skip unlogged indexes when in hot standby
+ * mode, where there is simply nothing to verify.
+ *
+ * NB: Caller should call btree_index_checkable() or gist_index_checkable()
+ * before calling here.
+ */
+bool
+amcheck_index_mainfork_expected(Relation rel)
+{
+ if (rel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED ||
+ !RecoveryInProgress())
+ return true;
+
+ ereport(NOTICE,
+ (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+ errmsg("cannot verify unlogged index \"%s\" during recovery, skipping",
+ RelationGetRelationName(rel))));
+
+ return false;
+}
+
+/*
+ * PageGetItemId() wrapper that validates returned line pointer.
+ *
+ * Buffer page/page item access macros generally trust that line pointers are
+ * not corrupt, which might cause problems for verification itself. For
+ * example, there is no bounds checking in PageGetItem(). Passing it a
+ * corrupt line pointer can cause it to return a tuple/pointer that is unsafe
+ * to dereference.
+ *
+ * Validating line pointers before tuples avoids undefined behavior and
+ * assertion failures with corrupt indexes, making the verification process
+ * more robust and predictable.
+ */
+ItemId
+PageGetItemIdCareful(Relation rel, BlockNumber block, Page page,
+ OffsetNumber offset, size_t opaquesize)
+{
+ ItemId itemid = PageGetItemId(page, offset);
+
+ if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
+ BLCKSZ - opaquesize)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("line pointer points past end of tuple space in index \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid))));
+
+ /*
+ * Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree and gist
+ * never uses either. Verify that line pointer has storage, too, since
+ * even LP_DEAD items should.
+ */
+// TODO adapt for gin and uncomment
+// if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
+// ItemIdGetLength(itemid) == 0)
+// ereport(ERROR,
+// (errcode(ERRCODE_INDEX_CORRUPTED),
+// errmsg("invalid line pointer storage in index \"%s\"",
+// RelationGetRelationName(rel)),
+// errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
+// block, offset, ItemIdGetOffset(itemid),
+// ItemIdGetLength(itemid),
+// ItemIdGetFlags(itemid))));
+
+ return itemid;
+}
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index ab50931f754..e67ace01c99 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.3'
+default_version = '1.4'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 00000000000..b19e6171778
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * amcheck.h
+ * Shared routines for amcheck verifications.
+ *
+ * Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/amcheck.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "storage/lockdefs.h"
+#include "utils/relcache.h"
+
+extern void amcheck_lock_relation(Oid indrelid, Relation *indrel,
+ Relation *heaprel, LOCKMODE lockmode);
+extern void amcheck_unlock_relation(Oid indrelid, Relation indrel,
+ Relation heaprel, LOCKMODE lockmode);
+extern bool amcheck_index_mainfork_expected(Relation rel);
+
+extern ItemId PageGetItemIdCareful(Relation rel, BlockNumber block,
+ Page page, OffsetNumber offset, size_t opaquesize);
\ No newline at end of file
diff --git a/contrib/amcheck/expected/check_gin.out b/contrib/amcheck/expected/check_gin.out
new file mode 100644
index 00000000000..35e3f0dc330
--- /dev/null
+++ b/contrib/amcheck/expected/check_gin.out
@@ -0,0 +1,60 @@
+-- minimal test, basically just verifying that amcheck works with GIN
+SELECT setseed(1);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE "gin_check"("Column1" int[]);
+-- posting trees
+INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check select array_agg(255 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
+CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_idx');
+ gin_index_parent_check
+------------------------
+
+(1 row)
+
+-- cleanup
+DROP TABLE gin_check;
+-- minimal test, basically just verifying that amcheck works with GIN
+SELECT setseed(1);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE "gin_check"("Column1" int[]);
+CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
+ALTER INDEX gin_check_idx SET (fastupdate = false);
+-- posting trees
+INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check select array_agg(100 + round(random()*255)) from generate_series(1, 100) as i group by i % 100;
+SELECT gin_index_parent_check('gin_check_idx');
+ gin_index_parent_check
+------------------------
+
+(1 row)
+
+-- cleanup
+DROP TABLE gin_check;
+-- minimal test, basically just verifying that amcheck works with GIN
+SELECT setseed(1);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE "gin_check_text_array"("Column1" text[]);
+-- posting trees
+INSERT INTO gin_check_text_array select array_agg(md5(round(random()*300)::text)::text) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check_text_array select array_agg(md5(round(random()*300 + 300)::text)::text) from generate_series(1, 10000) as i group by i % 100;
+CREATE INDEX gin_check_text_array_idx on "gin_check_text_array" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_text_array');
+ERROR: "gin_check_text_array" is not an index
+-- cleanup
+DROP TABLE gin_check_text_array;
diff --git a/contrib/amcheck/sql/check_gin.sql b/contrib/amcheck/sql/check_gin.sql
new file mode 100644
index 00000000000..c81db477f11
--- /dev/null
+++ b/contrib/amcheck/sql/check_gin.sql
@@ -0,0 +1,40 @@
+-- minimal test, basically just verifying that amcheck works with GIN
+SELECT setseed(1);
+CREATE TABLE "gin_check"("Column1" int[]);
+-- posting trees
+INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check select array_agg(255 + round(random()*100)) from generate_series(1, 100) as i group by i % 100;
+CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_idx');
+
+-- cleanup
+DROP TABLE gin_check;
+
+-- minimal test, basically just verifying that amcheck works with GIN
+SELECT setseed(1);
+CREATE TABLE "gin_check"("Column1" int[]);
+CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
+ALTER INDEX gin_check_idx SET (fastupdate = false);
+-- posting trees
+INSERT INTO gin_check select array_agg(round(random()*255) ) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check select array_agg(100 + round(random()*255)) from generate_series(1, 100) as i group by i % 100;
+
+SELECT gin_index_parent_check('gin_check_idx');
+
+-- cleanup
+DROP TABLE gin_check;
+
+-- minimal test, basically just verifying that amcheck works with GIN
+SELECT setseed(1);
+CREATE TABLE "gin_check_text_array"("Column1" text[]);
+-- posting trees
+INSERT INTO gin_check_text_array select array_agg(md5(round(random()*300)::text)::text) from generate_series(1, 1000000) as i group by i % 100000;
+-- posting leaves
+INSERT INTO gin_check_text_array select array_agg(md5(round(random()*300 + 300)::text)::text) from generate_series(1, 10000) as i group by i % 100;
+CREATE INDEX gin_check_text_array_idx on "gin_check_text_array" USING GIN("Column1");
+SELECT gin_index_parent_check('gin_check_text_array');
+
+-- cleanup
+DROP TABLE gin_check_text_array;
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
new file mode 100644
index 00000000000..65104f09ce6
--- /dev/null
+++ b/contrib/amcheck/verify_gin.c
@@ -0,0 +1,695 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_gin.c
+ * Verifies the integrity of GIN indexes based on invariants.
+ *
+ * Verification checks that all paths in GIN graph contain
+ * consistent keys: tuples on parent pages consistently include tuples
+ * from children pages. Also, verification checks graph invariants:
+ * internal page must have at least one downlinks, internal page can
+ * reference either only leaf pages or only internal pages.
+ *
+ *
+ * Copyright (c) 2017-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/verify_gin.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/gin_private.h"
+#include "access/nbtree.h"
+#include "amcheck.h"
+#include "catalog/pg_am.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "string.h"
+
+/*
+ * GinScanItem represents one item of depth-first scan of GIN index.
+ */
+typedef struct GinScanItem
+{
+ int depth;
+ IndexTuple parenttup;
+ BlockNumber parentblk;
+ XLogRecPtr parentlsn;
+ BlockNumber blkno;
+ struct GinScanItem *next;
+} GinScanItem;
+
+/*
+ * GinPostingTreeScanItem represents one item of depth-first scan of GIN posting tree.
+ */
+typedef struct GinPostingTreeScanItem
+{
+ int depth;
+ ItemPointer parentkey;
+ BlockNumber blkno;
+ struct GinPostingTreeScanItem *next;
+} GinPostingTreeScanItem;
+
+
+PG_FUNCTION_INFO_V1(gin_index_parent_check);
+
+static void gin_index_checkable(Relation rel);
+static void gin_check_parent_keys_consistency(Relation rel);
+static void check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo);
+static IndexTuple gin_refind_parent(Relation rel, BlockNumber parentblkno,
+ BlockNumber childblkno,
+ BufferAccessStrategy strategy);
+
+/*
+ * gin_index_parent_check(index regclass)
+ *
+ * Verify integrity of GIN index.
+ *
+ * Acquires AccessShareLock on heap & index relations.
+ */
+Datum
+gin_index_parent_check(PG_FUNCTION_ARGS)
+{
+ Oid indrelid = PG_GETARG_OID(0);
+ Relation indrel;
+ Relation heaprel;
+ LOCKMODE lockmode = AccessShareLock;
+
+ /* lock table and index with neccesary level */
+ amcheck_lock_relation(indrelid, &indrel, &heaprel, lockmode);
+
+ /* verify that this is GIN eligible for check */
+ gin_index_checkable(indrel);
+
+ if (amcheck_index_mainfork_expected(indrel))
+ gin_check_parent_keys_consistency(indrel);
+
+ /* Unlock index and table */
+ amcheck_unlock_relation(indrelid, indrel, heaprel, lockmode);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Read item pointers from leaf entry tuple.
+ *
+ * Returns a palloc'd array of ItemPointers. The number of items is returned
+ * in *nitems.
+ */
+static ItemPointer
+ginReadTupleWithoutState(IndexTuple itup, int *nitems)
+{
+ Pointer ptr = GinGetPosting(itup);
+ int nipd = GinGetNPosting(itup);
+ ItemPointer ipd;
+ int ndecoded;
+
+ if (GinItupIsCompressed(itup))
+ {
+ if (nipd > 0)
+ {
+ ipd = ginPostingListDecode((GinPostingList *) ptr, &ndecoded);
+ if (nipd != ndecoded)
+ elog(ERROR, "number of items mismatch in GIN entry tuple, %d in tuple header, %d decoded",
+ nipd, ndecoded);
+ }
+ else
+ {
+ ipd = palloc(0);
+ }
+ }
+ else
+ {
+ ipd = (ItemPointer) palloc(sizeof(ItemPointerData) * nipd);
+ memcpy(ipd, ptr, sizeof(ItemPointerData) * nipd);
+ }
+ *nitems = nipd;
+ return ipd;
+}
+
+
+/*
+ * Check that relation is eligible for GIN verification
+ */
+static void
+gin_index_checkable(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_INDEX ||
+ rel->rd_rel->relam != GIN_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only GIN indexes are supported as targets for this verification"),
+ errdetail("Relation \"%s\" is not a GIN index.",
+ RelationGetRelationName(rel))));
+
+ if (RELATION_IS_OTHER_TEMP(rel))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot access temporary tables of other sessions"),
+ errdetail("Index \"%s\" is associated with temporary relation.",
+ RelationGetRelationName(rel))));
+
+ if (!rel->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot check index \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Index is not valid")));
+}
+
+/*
+ * Allocates memory context and scans through postigTree graph
+ *
+ */
+static void
+gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting_tree_root)
+{
+ BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
+ GinPostingTreeScanItem *stack;
+ MemoryContext mctx;
+ MemoryContext oldcontext;
+
+ int leafdepth;
+
+ mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(mctx);
+
+ /*
+ * We don't know the height of the tree yet, but as soon as we encounter a
+ * leaf page, we will set 'leafdepth' to its depth.
+ */
+ leafdepth = -1;
+
+ /* Start the scan at the root page */
+ stack = (GinPostingTreeScanItem *) palloc0(sizeof(GinPostingTreeScanItem));
+ stack->depth = 0;
+ stack->parentkey = NULL;
+ stack->blkno = posting_tree_root;
+
+ while (stack)
+ {
+ GinPostingTreeScanItem *stack_next;
+ Buffer buffer;
+ Page page;
+ OffsetNumber i,
+ maxoff;
+
+ //elog(INFO, "processing block %u", stack->blkno);
+
+ CHECK_FOR_INTERRUPTS();
+
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
+ RBM_NORMAL, strategy);
+ LockBuffer(buffer, GIN_SHARE);
+ page = (Page) BufferGetPage(buffer);
+ Assert(GinPageIsData(page));
+
+ /* Check that the tree has the same height in all branches */
+ if (GinPageIsLeaf(page))
+ {
+ ItemPointerData minItem;
+ int nlist;
+ ItemPointerData *list;
+
+ ItemPointerSetMin(&minItem);
+
+ if (leafdepth == -1)
+ leafdepth = stack->depth;
+ else if (stack->depth != leafdepth)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ list = GinDataLeafPageGetItems(page, &nlist, minItem);
+ if (stack->parentkey && ItemPointerCompare(stack->parentkey, &list[0]) < 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": tid exceeds parent's high key in postingTree leaf on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ }
+ }
+ else
+ {
+
+
+ /*
+ * Check that tuples in each page are properly ordered and
+ * consistent with parent high key
+ */
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+ {
+ PostingItem *posting_item = GinDataPageGetPostingItem(page, i);
+
+ if (ItemPointerGetBlockNumberNoCheck(&posting_item->key) == 0 ||
+ ItemPointerGetOffsetNumberNoCheck(&posting_item->key) == 0)
+ {
+ continue;
+ }
+ elog(INFO, "key block %u offset %u", ItemPointerGetBlockNumber(&posting_item->key),
+ ItemPointerGetOffsetNumber(&posting_item->key));
+
+ if (i != FirstOffsetNumber)
+ {
+ PostingItem *previous_posting_item = GinDataPageGetPostingItem(page, i - 1);
+
+ if (ItemPointerCompare(&posting_item->key, &previous_posting_item->key) < 0)
+ {
+
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has wrong tuple order in posting tree, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+ }
+
+ }
+
+ /*
+ * Check if this tuple is consistent with the downlink in the
+ * parent.
+ */
+ if (stack->parentkey && i == maxoff)
+ {
+ if (ItemPointerCompare(stack->parentkey, &posting_item->key) < 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent records on page %u offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+
+ }
+ }
+
+ /* If this is an internal page, recurse into the child */
+ if (!GinPageIsLeaf(page))
+ {
+ GinPostingTreeScanItem *ptr;
+
+ ptr = (GinPostingTreeScanItem *) palloc(sizeof(GinPostingTreeScanItem));
+ ptr->depth = stack->depth + 1;
+ ptr->parentkey = &posting_item->key;
+ ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+
+ }
+ }
+ LockBuffer(buffer, GIN_UNLOCK);
+ ReleaseBuffer(buffer);
+
+ /* Step to next item in the queue */
+ stack_next = stack->next;
+ pfree(stack);
+ stack = stack_next;
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(mctx);
+}
+
+static void
+validate_leaf(Page page, Relation rel, BlockNumber blkno)
+{
+ OffsetNumber i,
+ maxoff;
+
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+ {
+ ItemId iid = PageGetItemIdCareful(rel, blkno, page, i, sizeof(GinPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+
+ if (GinIsPostingTree(idxtuple))
+ {
+/* elog(INFO, "validating posting tree on page %u, block %u, offset %u", page, blkno, i); */
+ BlockNumber rootPostingTree = GinGetPostingTree(idxtuple);
+
+ gin_check_posting_tree_parent_keys_consistency(rel, rootPostingTree);
+ }
+ else
+ {
+/* elog(INFO, "validating posting list on page %u, block %u, offset %u", page, blkno, i); */
+
+ ItemPointer ipd;
+ int nipd;
+
+ ipd = ginReadTupleWithoutState(idxtuple, &nipd);
+ if (!OffsetNumberIsValid(ItemPointerGetOffsetNumber(&ipd[nipd - 1])))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": posting list contains invalid heap pointer on block %u",
+ RelationGetRelationName(rel), blkno)));
+ }
+ pfree(ipd);
+ }
+ }
+}
+
+/*
+ * Main entry point for GIN check. Allocates memory context and scans through
+ * GIN graph.
+ */
+static void
+gin_check_parent_keys_consistency(Relation rel)
+{
+ BufferAccessStrategy strategy = GetAccessStrategy(BAS_BULKREAD);
+ GinScanItem *stack;
+ MemoryContext mctx;
+ MemoryContext oldcontext;
+ GinState state;
+
+ int leafdepth;
+
+ mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(mctx);
+ initGinState(&state, rel);
+
+ /*
+ * We don't know the height of the tree yet, but as soon as we encounter a
+ * leaf page, we will set 'leafdepth' to its depth.
+ */
+ leafdepth = -1;
+
+ /* Start the scan at the root page */
+ stack = (GinScanItem *) palloc0(sizeof(GinScanItem));
+ stack->depth = 0;
+ stack->parenttup = NULL;
+ stack->parentblk = InvalidBlockNumber;
+ stack->parentlsn = InvalidXLogRecPtr;
+ stack->blkno = GIN_ROOT_BLKNO;
+
+ while (stack)
+ {
+ GinScanItem *stack_next;
+ Buffer buffer;
+ Page page;
+ OffsetNumber i,
+ maxoff;
+ XLogRecPtr lsn;
+ IndexTuple prev_tuple;
+
+ //elog(INFO, "processing block %u", stack->blkno);
+
+ CHECK_FOR_INTERRUPTS();
+
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
+ RBM_NORMAL, strategy);
+ LockBuffer(buffer, GIN_SHARE);
+ page = (Page) BufferGetPage(buffer);
+ lsn = BufferGetLSNAtomic(buffer);
+
+ /* Do basic sanity checks on the page headers */
+ check_index_page(rel, buffer, stack->blkno);
+
+ /*
+ * It's possible that the page was split since we looked at the
+ * parent, so that we didn't missed the downlink of the right sibling
+ * when we scanned the parent. If so, add the right sibling to the
+ * stack now.
+ */
+ if (stack->parenttup != NULL)
+ {
+ GinNullCategory parent_key_category;
+ Datum parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, maxoff, sizeof(GinPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+ OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
+ GinNullCategory page_max_key_category;
+ Datum page_max_key = gintuple_get_key(&state, idxtuple, &page_max_key_category);
+
+ if (GinPageGetOpaque(page)->rightlink != InvalidBlockNumber &&
+ ginCompareEntries(&state, attnum, page_max_key, page_max_key_category, parent_key, parent_key_category) > 0)
+ {
+ /* split page detected, install right link to the stack */
+ GinScanItem *ptr;
+
+ elog(INFO, "split detected");
+
+ ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
+ ptr->depth = stack->depth;
+ ptr->parenttup = CopyIndexTuple(stack->parenttup);
+ ptr->parentblk = stack->parentblk;
+ ptr->parentlsn = stack->parentlsn;
+ ptr->blkno = GinPageGetOpaque(page)->rightlink;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+ }
+
+ /* Check that the tree has the same height in all branches */
+ if (GinPageIsLeaf(page))
+ {
+ if (leafdepth == -1)
+ leafdepth = stack->depth;
+ else if (stack->depth != leafdepth)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\": internal pages traversal encountered leaf page unexpectedly on block %u",
+ RelationGetRelationName(rel), stack->blkno)));
+ }
+
+ /*
+ * Check that tuples in each page are properly ordered and consistent
+ * with parent high key
+ */
+ maxoff = PageGetMaxOffsetNumber(page);
+ prev_tuple = NULL;
+ for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+ {
+ ItemId iid = PageGetItemIdCareful(rel, stack->blkno, page, i, sizeof(GinPageOpaqueData));
+ IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
+ OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
+ GinNullCategory prev_key_category;
+ Datum prev_key;
+ GinNullCategory current_key_category;
+ Datum current_key;
+
+ if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent tuple sizes, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+
+ current_key = gintuple_get_key(&state, idxtuple, ¤t_key_category);
+/* elog(INFO, "current_key %s, depth %u", DatumGetCString(current_key), stack->depth); */
+
+ /* (apparently) first block is metadata, skip order check */
+ if (i != FirstOffsetNumber && stack->blkno != (BlockNumber) 1)
+ {
+ prev_key = gintuple_get_key(&state, prev_tuple, &prev_key_category);
+ if (ginCompareEntries(&state, attnum, prev_key, prev_key_category, current_key, current_key_category) >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has wrong tuple order, block %u, offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+ }
+
+ /*
+ * Check if this tuple is consistent with the downlink in the
+ * parent.
+ */
+ if (stack->parenttup &&
+ i == maxoff)
+ {
+ GinNullCategory parent_key_category;
+ Datum parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
+
+ if (ginCompareEntries(&state, attnum, current_key, current_key_category, parent_key, parent_key_category) > 0)
+ {
+ /*
+ * There was a discrepancy between parent and child
+ * tuples. We need to verify it is not a result of
+ * concurrent call of gistplacetopage(). So, lock parent
+ * and try to find downlink for current page. It may be
+ * missing due to concurrent page split, this is OK.
+ */
+ pfree(stack->parenttup);
+ stack->parenttup = gin_refind_parent(rel, stack->parentblk,
+ stack->blkno, strategy);
+
+ /* We found it - make a final check before failing */
+ if (!stack->parenttup)
+ elog(NOTICE, "Unable to find parent tuple for block %u on block %u due to concurrent split",
+ stack->blkno, stack->parentblk);
+ else
+ {
+ parent_key = gintuple_get_key(&state, stack->parenttup, &parent_key_category);
+ if (ginCompareEntries(&state, attnum, current_key, current_key_category, parent_key, parent_key_category) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has inconsistent records on page %u offset %u",
+ RelationGetRelationName(rel), stack->blkno, i)));
+ else
+ {
+ /*
+ * But now it is properly adjusted - nothing to do
+ * here.
+ */
+ }
+ }
+ }
+ }
+
+ /* If this is an internal page, recurse into the child */
+ if (!GinPageIsLeaf(page))
+ {
+ GinScanItem *ptr;
+
+ ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
+ ptr->depth = stack->depth + 1;
+ /* last tuple in layer has no high key */
+ if (i != maxoff && !GinPageGetOpaque(page)->rightlink)
+ {
+ ptr->parenttup = CopyIndexTuple(idxtuple);
+ }
+ else
+ {
+ ptr->parenttup = NULL;
+ }
+ ptr->parentblk = stack->blkno;
+ ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
+ ptr->parentlsn = lsn;
+ ptr->next = stack->next;
+ stack->next = ptr;
+ }
+ else
+ {
+ validate_leaf(page, rel, stack->blkno);
+ }
+
+
+ prev_tuple = CopyIndexTuple(idxtuple);
+ }
+
+ LockBuffer(buffer, GIN_UNLOCK);
+ ReleaseBuffer(buffer);
+
+ /* Step to next item in the queue */
+ stack_next = stack->next;
+ if (stack->parenttup)
+ pfree(stack->parenttup);
+ pfree(stack);
+ stack = stack_next;
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(mctx);
+}
+
+/*
+ * Verify that a freshly-read page looks sane.
+ */
+static void
+gincheckpage(Relation rel, Buffer buf)
+{
+ Page page = BufferGetPage(buf);
+
+ /*
+ * ReadBuffer verifies that every newly-read page passes
+ * PageHeaderIsValid, which means it either contains a reasonably sane
+ * page header or is all-zero. We have to defend against the all-zero
+ * case, however.
+ */
+ if (PageIsNew(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" contains unexpected zero page at block %u",
+ RelationGetRelationName(rel),
+ BufferGetBlockNumber(buf)),
+ errhint("Please REINDEX it.")));
+
+ /*
+ * Additionally check that the special area looks sane.
+ */
+ if (PageGetSpecialSize(page) != MAXALIGN(sizeof(GinPageOpaqueData)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" contains corrupted page at block %u",
+ RelationGetRelationName(rel),
+ BufferGetBlockNumber(buf)),
+ errhint("Please REINDEX it.")));
+}
+
+static void
+check_index_page(Relation rel, Buffer buffer, BlockNumber blockNo)
+{
+ Page page = BufferGetPage(buffer);
+
+ gincheckpage(rel, buffer);
+
+ if (GinPageIsDeleted(page))
+ {
+ if (!GinPageIsLeaf(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has deleted internal page %d",
+ RelationGetRelationName(rel), blockNo)));
+ if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has deleted page %d with tuples",
+ RelationGetRelationName(rel), blockNo)));
+ }
+ else if (PageGetMaxOffsetNumber(page) > MaxIndexTuplesPerPage)
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index \"%s\" has page %d with exceeding count of tuples",
+ RelationGetRelationName(rel), blockNo)));
+}
+
+/*
+ * Try to re-find downlink pointing to 'blkno', in 'parentblkno'.
+ *
+ * If found, returns a palloc'd copy of the downlink tuple. Otherwise,
+ * returns NULL.
+ */
+static IndexTuple
+gin_refind_parent(Relation rel, BlockNumber parentblkno,
+ BlockNumber childblkno, BufferAccessStrategy strategy)
+{
+ Buffer parentbuf;
+ Page parentpage;
+ OffsetNumber o,
+ parent_maxoff;
+ IndexTuple result = NULL;
+
+ parentbuf = ReadBufferExtended(rel, MAIN_FORKNUM, parentblkno, RBM_NORMAL,
+ strategy);
+
+ LockBuffer(parentbuf, GIN_SHARE);
+ parentpage = BufferGetPage(parentbuf);
+
+ if (GinPageIsLeaf(parentpage))
+ {
+ UnlockReleaseBuffer(parentbuf);
+ return result;
+ }
+
+ parent_maxoff = PageGetMaxOffsetNumber(parentpage);
+ for (o = FirstOffsetNumber; o <= parent_maxoff; o = OffsetNumberNext(o))
+ {
+ ItemId p_iid = PageGetItemIdCareful(rel, parentblkno, parentpage, o, sizeof(GinPageOpaqueData));
+ IndexTuple itup = (IndexTuple) PageGetItem(parentpage, p_iid);
+
+ if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
+ {
+ /* Found it! Make copy and return it */
+ result = CopyIndexTuple(itup);
+ break;
+ }
+ }
+
+ UnlockReleaseBuffer(parentbuf);
+
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a2571d33ae6..ff3ce555f12 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -343,6 +343,25 @@ SET client_min_messages = DEBUG1;
</variablelist>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>gin_index_parent_check(index regclass) returns void</function>
+ <indexterm>
+ <primary>gin_index_parent_check</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <function>gin_index_parent_check</function> tests that its target GIN index
+ has consistent parent-child tuples relations (no parent tuples
+ require tuple adjustement) and page graph respects balanced-tree
+ invariants (internal pages reference only leaf page or only internal
+ pages).
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
--
2.30.2
Hello,
First of all, thank you all -- Andrey, Peter, Heikki and others -- for this
work, GIN support in amcheck is *really* needed, especially for OS upgrades
such as from Ubuntu 16.04 (which is EOL now) to 18.04 or 20.04
I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any OS/glibc
changes.
On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how to
create a synthetic demo of this. Any suggestions?
And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).
Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5) snippet used to
run amcheck:
https://gitlab.com/-/snippets/2001962 (see file #3)
Before reindex:
INFO: [2021-07-29 17:44:42.525+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 117935; heap tuples: ~379793)...
ERROR: index "index_XXX_trigram" has wrong tuple order, block 65754, offset
232
test=# reindex index index_XXX_trigram;
REINDEX
After REINDEX:
INFO: [2021-07-29 18:01:23.339+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 70100; heap tuples: ~379793)...
ERROR: index "index_XXX_trigram" has wrong tuple order, block 70048, offset
253
On Thu, Jul 15, 2021 at 00:03 Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Show quoted text
On 07/08/2020 00:33, Peter Geoghegan wrote:
On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko@gmail.com>
wrote:
Here is the patch which I (with Andrey as my advisor) built on the top
of the last patch from this thread:
https://commitfest.postgresql.org/25/1800/ .It adds an ability to verify validity of GIN index. It is not polished
yet, but it works and we wanted to show it to you so you can give us some
feedback, and also let you know about this work if you have any plans of
writing something like that yourselves, so that you do not redo what is
already done.Can you rebase this patch, please?
Also suggest breaking out the series into distinct patch files using
"git format-patch master".I rebased the GIN parts of this patch, see attached. I also ran pgindent
and made some other tiny cosmetic fixes, but I didn't review the patch,
only rebased it in the state it was.I was hoping that this would be useful to track down the bug we're
discussing here:/messages/by-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw@mail.gmail.com.
But now that I look what checks this performs, I doubt this will catch
the kind of corruption that's happened there. I suspect it's more subtle
than an inconsistencies between parent and child pages, because only a
few rows are affected. But doesn't hurt to try.- Heikki
On 29/07/2021 21:34, Nikolay Samokhvalov wrote:
I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any
OS/glibc changes.On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how
to create a synthetic demo of this. Any suggestions?And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5
<https://gitlab.com/postgres/postgres/-/merge_requests/5>) snippet used
to run amcheck:
https://gitlab.com/-/snippets/2001962
<https://gitlab.com/-/snippets/2001962> (see file #3)
Almost certainly the tool is wrong. We went back and forth a few times
with Pawel, fixing various bugs in the amcheck patch at this thread:
/messages/by-id/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf@iki.fi.
Can you try again with the latest patch version from that thread,
please? That's v5-0001-Amcheck-for-GIN-13stable.patch.
- Heikki
Thank you, v5 didn't find any issues at all. One thing: for my 29 indexes,
the tool generated output 3.5 GiB. I guess many INFO messages should be
downgraded to something like DEBUG1?
On Fri, Jul 30, 2021 at 2:35 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Show quoted text
On 29/07/2021 21:34, Nikolay Samokhvalov wrote:
I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any
OS/glibc changes.On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how
to create a synthetic demo of this. Any suggestions?And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5
<https://gitlab.com/postgres/postgres/-/merge_requests/5>) snippet used
to run amcheck:
https://gitlab.com/-/snippets/2001962
<https://gitlab.com/-/snippets/2001962> (see file #3)Almost certainly the tool is wrong. We went back and forth a few times
with Pawel, fixing various bugs in the amcheck patch at this thread:/messages/by-id/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf@iki.fi.
Can you try again with the latest patch version from that thread,
please? That's v5-0001-Amcheck-for-GIN-13stable.patch.- Heikki